Alex Lai and Nols Ebersohn | 27 January 2026
One of the great strengths of Data Vault is that it gives us a complete, auditable, and historised representation of enterprise data. By separating business keys (Hubs), relationships (Links), and descriptive context (Satellites), we gain flexibility, resilience to change, and strong lineage by design.
But there is a common, very practical challenge:
New analysts, data scientists, testers, and even experienced engineers often struggle to “see” the data in a familiar shape. Information is distributed across hubs, links, and satellites, and even simple questions can require multiple joins and careful time logic.
At the same time, there is a deeper question we should be asking:
If our Data Vault truly represents the source data, shouldn’t we be able to reconstruct the source from it?
This is where Source-on-Raw Views (SoRV) come in.
A Source-on-Raw View (SoRV) is a view that:
Reconstructs a source dataset in its original, source-like shape
Is built entirely from Raw Data Vault tables (Hubs, Links, Satellites)
Represents the current state of the source data
In simple terms:
A Source-on-Raw View reverses the Data Vault back into the shape of the source.
SoRV does not replace your information marts, Business Vault, or semantic layers. Instead, it serves three critical purposes:
It proves your Data Vault model is complete and correct
It dramatically simplifies testing and reconciliation
It makes the Raw Vault far more approachable for humans familiar with source systems
We often describe Data Vault as a system of record. But what does that really mean?
If your Raw Vault is truly the system of record, then it must:
Contain all business keys
Contain all relationships
Contain all descriptive context
Contain enough information to recreate what the source looked like
If you cannot reconstruct the source from the vault, then one of the following is true:
A relationship is missing or incorrectly modelled
A business key is missing or incorrectly identified
A satellite is attached at the wrong grain
Important attributes were not captured
Building Source-on-Raw Views turns reversibility into a structural validation technique.
If you can reconstruct the source, your model is structurally sound.
If you can’t, something fundamental is wrong.
Once you have a Source-on-Raw View, testing becomes almost trivial.
You can directly compare:
The original source extract
Versus the output of the SoRV
Row counts, keys, attribute values, duplicates, missing records — all of this can be validated automatically.
Instead of testing dozens of hubs, links, and satellites independently, you test one simple question:
This approach is especially powerful for:
Regression testing
Reprocessing scenarios
Source system changes
Platform or tooling migrations
Most people do not naturally think in hubs, links, and satellites — especially those new to Data Vault.
A Source-on-Raw View provides:
Analysts with a familiar structure to explore
Data scientists with access to “raw but usable” data
Engineers with a clean surface for debugging and inspection
All of this happens without bypassing governance or encouraging direct access to unmanaged source extracts.
A Source-on-Raw View follows a simple and repeatable pattern:
Start from the Link (if present) that defines the grain
Join Hubs to retrieve business keys
Join Satellites to retrieve descriptive attributes
Apply “latest” logic to produce a current-state view
Note: This assumes strong metadata management and consistent modelling practices.
Below is a real Source-on-Raw View that reconstructs a “service usage” style dataset from the Raw Vault:
SELECT
h_service.bkcc AS bkcc_service,
h_service.bk_service AS serviceid,
h_customer.bkcc AS bkcc_customer,
h_customer.bk_customer AS customerid,
h_registration.bkcc AS bkcc_registration,
h_registration.bk_registration AS registration,
s_service_cost_service_servicessystem.serviceprice AS serviceprice,
s_service_cost_service_servicessystem.servicedate AS servicedate
FROM vault.l_customer_registration_service l_customer_registration_service
JOIN vault.s_service_cost_service_servicessystem s_service_cost_service_servicessystem
ON l_customer_registration_service.hk_l_customer_registration_service =
COALESCE(s_service_cost_service_servicessystem.hk_l_customer_registration_service, '0xFF')
AND s_service_cost_service_servicessystem.iss_extract_date = (
SELECT MAX(s.iss_extract_date)
FROM vault.s_service_cost_service_servicessystem s
WHERE s.hk_l_customer_registration_service =
l_customer_registration_service.hk_l_customer_registration_service
)
JOIN vault.h_customer h_customer
ON l_customer_registration_service.hk_h_customer =
COALESCE(h_customer.hk_h_customer, '0xFF')
JOIN vault.h_registration h_registration
ON l_customer_registration_service.hk_h_registration =
COALESCE(h_registration.hk_h_registration, '0xFF')
JOIN vault.h_service h_service
ON l_customer_registration_service.hk_h_service =
COALESCE(h_service.hk_h_service, '0xFF')
WHERE l_customer_registration_service.hk_l_customer_registration_service != '0xFF';
The Link defines the grain: one customer, one registration, one service
The Hubs provide business keys
The Satellite provides descriptive attributes
Latest-record logic converts historised data into a current snapshot
The result is a source-shaped dataset, reconstructed entirely from the Raw Vault.
This is not a mart.
This is not a business transformation.
This is the source, rebuilt from the vault.
If the reconstruction is wrong, the model probably is too.
With IRiS, Source-on-Raw Views are not an afterthought or a manual exercise.
They are generated automatically from the same metadata that defines the Data Vault itself.
As part of its standard code output, IRiS generates:
Source-on-Raw Views for every modelled source
Table definitions and load procedures
All required Data Vault metadata, including:
Business key hashing
Hash differences
Load date and record source handling
Support for multiple satellite patterns (CDC, multi-active, dependent child, and more)
This means:
Reversibility is enforced by design
SoRV generation is standardised and repeatable
Every IRiS-generated Data Vault model is provably capable of reconstructing its sources
SoRV moves from a “nice idea” to a core platform capability.
Source-on-Raw Views significantly enhance the value and usability of a Data Vault:
They turn the Raw Vault into a truly reversible system of record
They provide structural proof of model correctness
They simplify testing, reconciliation, and onboarding
They make the Vault usable without compromising its principles
"If you can’t reconstruct the source, you don’t fully control it. If you can — you truly own your data."
If you’re exploring Data Vault automation, reconciliation strategies, or want to understand how IRiS enforces reversibility by design, get in touch with the IRiS team or explore our resources to see how Source-on-Raw Views are generated automatically as part of the platform.