Proving Your Data Vault Works: The Power of Source‑on‑Raw Views (SoRV)

One of the great strengths of Data Vault is that it gives us a complete, auditable, and historised representation of enterprise data. By separating business keys (Hubs), relationships (Links), and descriptive context (Satellites), we gain flexibility, resilience to change, and strong lineage by design.

But there is a common, very practical challenge:

Data Vault models are correct, but they are not always easy for humans to consume.

New analysts, data scientists, testers, and even experienced engineers often struggle to “see” the data in a familiar shape. Information is distributed across hubs, links, and satellites, and even simple questions can require multiple joins and careful time logic.

At the same time, there is a deeper question we should be asking:

If our Data Vault truly represents the source data, shouldn’t we be able to reconstruct the source from it?

This is where Source-on-Raw Views (SoRV) come in.

What Is a Source-on-Raw View?

A Source-on-Raw View (SoRV) is a view that:

Reconstructs a source dataset in its original, source-like shape
Is built entirely from Raw Data Vault tables (Hubs, Links, Satellites)
Represents the current state of the source data

In simple terms:

A Source-on-Raw View reverses the Data Vault back into the shape of the source.

SoRV does not replace your information marts, Business Vault, or semantic layers. Instead, it serves three critical purposes:

It proves your Data Vault model is complete and correct
It dramatically simplifies testing and reconciliation
It makes the Raw Vault far more approachable for humans familiar with source systems

Why Reversibility Matters

We often describe Data Vault as a system of record. But what does that really mean?

If your Raw Vault is truly the system of record, then it must:

Contain all business keys
Contain all relationships
Contain all descriptive context
Contain enough information to recreate what the source looked like

If you cannot reconstruct the source from the vault, then one of the following is true:

A relationship is missing or incorrectly modelled
A business key is missing or incorrectly identified
A satellite is attached at the wrong grain
Important attributes were not captured

Building Source-on-Raw Views turns reversibility into a structural validation technique.

If you can reconstruct the source, your model is structurally sound.
If you can’t, something fundamental is wrong.

A Huge Secondary Benefit: Testing

Once you have a Source-on-Raw View, testing becomes almost trivial.

You can directly compare:

The original source extract
Versus the output of the SoRV

Row counts, keys, attribute values, duplicates, missing records — all of this can be validated automatically.

Instead of testing dozens of hubs, links, and satellites independently, you test one simple question:

Does the vault reproduce the source?

This approach is especially powerful for:

Regression testing
Reprocessing scenarios
Source system changes
Platform or tooling migrations

Human-Friendly Access to the Raw Vault

Most people do not naturally think in hubs, links, and satellites — especially those new to Data Vault.

A Source-on-Raw View provides:

Analysts with a familiar structure to explore
Data scientists with access to “raw but usable” data
Engineers with a clean surface for debugging and inspection

All of this happens without bypassing governance or encouraging direct access to unmanaged source extracts.

The Basic Pattern

A Source-on-Raw View follows a simple and repeatable pattern:

Start from the Link (if present) that defines the grain
Join Hubs to retrieve business keys
Join Satellites to retrieve descriptive attributes
Apply “latest” logic to produce a current-state view

Note: This assumes strong metadata management and consistent modelling practices.

A Real Example

Below is a real Source-on-Raw View that reconstructs a “service usage” style dataset from the Raw Vault:

CREATE VIEW im.v_service_servicessystem_current AS

SELECT
    h_service.bkcc AS bkcc_service,
    h_service.bk_service AS serviceid,
    h_customer.bkcc AS bkcc_customer,
    h_customer.bk_customer AS customerid,
    h_registration.bkcc AS bkcc_registration,
    h_registration.bk_registration AS registration,
    s_service_cost_service_servicessystem.serviceprice AS serviceprice,
    s_service_cost_service_servicessystem.servicedate AS servicedate
FROM vault.l_customer_registration_service l_customer_registration_service
JOIN vault.s_service_cost_service_servicessystem s_service_cost_service_servicessystem
  ON l_customer_registration_service.hk_l_customer_registration_service =
     COALESCE(s_service_cost_service_servicessystem.hk_l_customer_registration_service, '0xFF')
 AND s_service_cost_service_servicessystem.iss_extract_date = (
     SELECT MAX(s.iss_extract_date)
     FROM vault.s_service_cost_service_servicessystem s
     WHERE s.hk_l_customer_registration_service =
           l_customer_registration_service.hk_l_customer_registration_service
 )
JOIN vault.h_customer h_customer
  ON l_customer_registration_service.hk_h_customer =
     COALESCE(h_customer.hk_h_customer, '0xFF')
JOIN vault.h_registration h_registration
  ON l_customer_registration_service.hk_h_registration =
     COALESCE(h_registration.hk_h_registration, '0xFF')
JOIN vault.h_service h_service
  ON l_customer_registration_service.hk_h_service =
     COALESCE(h_service.hk_h_service, '0xFF')
WHERE l_customer_registration_service.hk_l_customer_registration_service != '0xFF';

What This View Does

The Link defines the grain: one customer, one registration, one service
The Hubs provide business keys
The Satellite provides descriptive attributes
Latest-record logic converts historised data into a current snapshot

The result is a source-shaped dataset, reconstructed entirely from the Raw Vault.

This is not a mart.
This is not a business transformation.

This is the source, rebuilt from the vault.

If the reconstruction is wrong, the model probably is too.

Where IRiS Fits In

With IRiS, Source-on-Raw Views are not an afterthought or a manual exercise.

They are generated automatically from the same metadata that defines the Data Vault itself.

As part of its standard code output, IRiS generates:

Source-on-Raw Views for every modelled source
Table definitions and load procedures
All required Data Vault metadata, including:
- Business key hashing
- Hash differences
- Load date and record source handling
- Support for multiple satellite patterns (CDC, multi-active, dependent child, and more)

This means:

Reversibility is enforced by design
SoRV generation is standardised and repeatable
Every IRiS-generated Data Vault model is provably capable of reconstructing its sources

SoRV moves from a “nice idea” to a core platform capability.

Conclusion

Source-on-Raw Views significantly enhance the value and usability of a Data Vault:

They turn the Raw Vault into a truly reversible system of record
They provide structural proof of model correctness
They simplify testing, reconciliation, and onboarding
They make the Vault usable without compromising its principles

"If you can’t reconstruct the source, you don’t fully control it. If you can — you truly own your data."

Want to see this in action?

If you’re exploring Data Vault automation, reconciliation strategies, or want to understand how IRiS enforces reversibility by design, get in touch with the IRiS team or explore our resources to see how Source-on-Raw Views are generated automatically as part of the platform.