IRiS is developed by Ignition - Visit the Ignition website here.

Proving Your Data Vault Works: The Power of Source‑on‑Raw Views (SoRV)

 |  27 January 2026

IRiS blog-img SoRV Proving Your Data Vault Works-min

One of the great strengths of Data Vault is that it gives us a complete, auditable, and historised representation of enterprise data. By separating business keys (Hubs), relationships (Links), and descriptive context (Satellites), we gain flexibility, resilience to change, and strong lineage by design.

But there is a common, very practical challenge:

Data Vault models are correct, but they are not always easy for humans to consume.

New analysts, data scientists, testers, and even experienced engineers often struggle to “see” the data in a familiar shape. Information is distributed across hubs, links, and satellites, and even simple questions can require multiple joins and careful time logic.

At the same time, there is a deeper question we should be asking:

If our Data Vault truly represents the source data, shouldn’t we be able to reconstruct the source from it?

This is where Source-on-Raw Views (SoRV) come in.

What Is a Source-on-Raw View?

A Source-on-Raw View (SoRV) is a view that:

  • Reconstructs a source dataset in its original, source-like shape

  • Is built entirely from Raw Data Vault tables (Hubs, Links, Satellites)

  • Represents the current state of the source data


In simple terms:

A Source-on-Raw View reverses the Data Vault back into the shape of the source.

SoRV does not replace your information marts, Business Vault, or semantic layers. Instead, it serves three critical purposes:

  • It proves your Data Vault model is complete and correct

  • It dramatically simplifies testing and reconciliation

  • It makes the Raw Vault far more approachable for humans familiar with source systems

 

Why Reversibility Matters

We often describe Data Vault as a system of record. But what does that really mean?

If your Raw Vault is truly the system of record, then it must:

  • Contain all business keys

  • Contain all relationships

  • Contain all descriptive context

  • Contain enough information to recreate what the source looked like


If you cannot reconstruct the source from the vault, then one of the following is true:

  • A relationship is missing or incorrectly modelled

  • A business key is missing or incorrectly identified

  • A satellite is attached at the wrong grain

  • Important attributes were not captured


Building Source-on-Raw Views turns reversibility into a structural validation technique.

If you can reconstruct the source, your model is structurally sound.
If you can’t, something fundamental is wrong.

A Huge Secondary Benefit: Testing

Once you have a Source-on-Raw View, testing becomes almost trivial.

You can directly compare:

  • The original source extract

  • Versus the output of the SoRV

Row counts, keys, attribute values, duplicates, missing records — all of this can be validated automatically.

Instead of testing dozens of hubs, links, and satellites independently, you test one simple question:

Does the vault reproduce the source?

This approach is especially powerful for:

  • Regression testing

  • Reprocessing scenarios

  • Source system changes

  • Platform or tooling migrations



Human-Friendly Access to the Raw Vault

Most people do not naturally think in hubs, links, and satellites — especially those new to Data Vault.

A Source-on-Raw View provides:

  • Analysts with a familiar structure to explore

  • Data scientists with access to “raw but usable” data

  • Engineers with a clean surface for debugging and inspection


All of this happens without bypassing governance or encouraging direct access to unmanaged source extracts.

The Basic Pattern

A Source-on-Raw View follows a simple and repeatable pattern:

  1. Start from the Link (if present) that defines the grain

  2. Join Hubs to retrieve business keys

  3. Join Satellites to retrieve descriptive attributes

  4. Apply “latest” logic to produce a current-state view

Note: This assumes strong metadata management and consistent modelling practices.

A Real Example

Below is a real Source-on-Raw View that reconstructs a “service usage” style dataset from the Raw Vault:

CREATE VIEW im.v_service_servicessystem_current AS
SELECT
h_service.bkcc AS bkcc_service,
h_service.bk_service AS serviceid,
h_customer.bkcc AS bkcc_customer,
h_customer.bk_customer AS customerid,
h_registration.bkcc AS bkcc_registration,
h_registration.bk_registration AS registration,
s_service_cost_service_servicessystem.serviceprice AS serviceprice,
s_service_cost_service_servicessystem.servicedate AS servicedate
FROM vault.l_customer_registration_service l_customer_registration_service
JOIN vault.s_service_cost_service_servicessystem s_service_cost_service_servicessystem
ON l_customer_registration_service.hk_l_customer_registration_service =
COALESCE(s_service_cost_service_servicessystem.hk_l_customer_registration_service, '0xFF')
AND s_service_cost_service_servicessystem.iss_extract_date = (
SELECT MAX(s.iss_extract_date)
FROM vault.s_service_cost_service_servicessystem s
WHERE s.hk_l_customer_registration_service =
l_customer_registration_service.hk_l_customer_registration_service
)
JOIN vault.h_customer h_customer
ON l_customer_registration_service.hk_h_customer =
COALESCE(h_customer.hk_h_customer, '0xFF')
JOIN vault.h_registration h_registration
ON l_customer_registration_service.hk_h_registration =
COALESCE(h_registration.hk_h_registration, '0xFF')
JOIN vault.h_service h_service
ON l_customer_registration_service.hk_h_service =
COALESCE(h_service.hk_h_service, '0xFF')
WHERE l_customer_registration_service.hk_l_customer_registration_service != '0xFF';

 

What This View Does

  • The Link defines the grain: one customer, one registration, one service

  • The Hubs provide business keys

  • The Satellite provides descriptive attributes

  • Latest-record logic converts historised data into a current snapshot

The result is a source-shaped dataset, reconstructed entirely from the Raw Vault.

This is not a mart.
This is not a business transformation.

This is the source, rebuilt from the vault.

If the reconstruction is wrong, the model probably is too.

Where IRiS Fits In

With IRiS, Source-on-Raw Views are not an afterthought or a manual exercise.

They are generated automatically from the same metadata that defines the Data Vault itself.

As part of its standard code output, IRiS generates:

  • Source-on-Raw Views for every modelled source

  • Table definitions and load procedures

  • All required Data Vault metadata, including:

    • Business key hashing

    • Hash differences

    • Load date and record source handling

    • Support for multiple satellite patterns (CDC, multi-active, dependent child, and more)


This means:

  • Reversibility is enforced by design

  • SoRV generation is standardised and repeatable

  • Every IRiS-generated Data Vault model is provably capable of reconstructing its sources

SoRV moves from a “nice idea” to a core platform capability.

Conclusion

Source-on-Raw Views significantly enhance the value and usability of a Data Vault:

  • They turn the Raw Vault into a truly reversible system of record

  • They provide structural proof of model correctness

  • They simplify testing, reconciliation, and onboarding

  • They make the Vault usable without compromising its principles

"If you can’t reconstruct the source, you don’t fully control it. If you can — you truly own your data."

Want to see this in action?

If you’re exploring Data Vault automation, reconciliation strategies, or want to understand how IRiS enforces reversibility by design, get in touch with the IRiS team or explore our resources to see how Source-on-Raw Views are generated automatically as part of the platform.

Continue Reading

Start your IRiS journey

Experience the smarter, faster way to automate your Data Vault.

iris_teal_cta_BG