From Semantic Foundation to Working AI: Ontologies, Knowledge Graphs, Context Engineering, and Where Data Vault Fits

Ontologies, Knowledge Graphs, Context Engineering, and Where Data Vault Fits.

The first article in this series explained why semantic foundations matter for enterprise AI, and how Data Vault 2 encodes those foundations structurally. It was written for a broad audience: data leaders, program sponsors, and architects evaluating whether to invest in a Data Integration foundation before scaling their AI stack.

This article goes deeper. It is written for the people who have to build the thing: data engineers, solution architects, and technical leads who need to understand not just that Data Vault aligns with ontological thinking, but precisely how and where, and where the alignment breaks down. It also covers two concepts that the previous article deliberately simplified: knowledge graphs and context engineering. Understanding both is essential for anyone building a complete enterprise AI stack on a Data Vault foundation.

We will also engage directly with Milan Mosny’s recent article “Ontology, Taxonomy, Data Model, Context Graph and Friends” (Response42, February 2026), which is an excellent practitioner-level treatment of these concepts (and worth reading alongside this one). Where our perspectives converge, we will say so. Where they diverge or where Data Vault adds a dimension Mosny’s framing does not cover, we will be explicit about that too.

WHO THIS IS FOR: Data engineers, solution architects, and technical leads building Data Vault foundations for AI and analytical consumption. Assumes familiarity with Data Vault 2.0 structures and a working understanding of LLMs and RAG.

1. The Precise Relationship Between Data Vault and an Ontology

The previous article described Data Vault as encoding an “implicit ontology” and that framing is valid and useful for a general audience. For an implementer, it needs unpacking. Data Vault and a formal ontology are not the same thing, but they are structurally congruent in specific and mappable ways. Understanding exactly where they align and where they diverge determines how you should design the connection between them.

Mosny’s article defines an ontology precisely: concepts, relationships, constraints, and rules of existence, typically expressed in RDF/OWL for machine reasoning. He uses the same Customer-Order-Product universe throughout, which makes our comparison straightforward. The table below maps each ontological concept to its Data Vault equivalent:

Ontology concept	DV 2 structure	Alignment	What’s different
Class / Entity (e.g. Customer)	Hub	Strong - both represent a named, distinct business concept	Ontology classes carry axioms and constraints; Hubs carry a Business Key and load metadata only. No attributes on the Hub itself.
Object property / relationship (e.g. places, contains)	Link	Strong - both model relationships as first-class citizens	OWL properties have cardinality constraints and domain/range restrictions. Links record the fact of the relationship with temporal context; they do not enforce cardinality at the structure level.
Data property / attribute (e.g. name, status, date)	Satellite	Strong - both attach descriptive properties to entities or relationships	Satellites are temporally versioned (load date, record source). Ontology attributes are typically static definitions without built-in history. This is where DV adds significant value over the ‘current view' ontology.
Derived class / rule (e.g. ActiveCustomer ≡ Customer ∩ hasRecentOrder)	Business Vault rule	Aligned - both define computed or interpreted concepts from base facts	OWL equivalent Class uses formal logic (DL). Business Vault rules are SQL-based transforms. Same intent, different formalism. DV rules are versioned at the record level with effective dates; updating an ontology rule requires releasing a new version of the model. OWL rules enable automated inference.
Individual / instance (e.g. Customer C42)	Hub record + Satellite snapshot	Aligned - both represent a specific real-world instance of a concept. The Hub record provides the identity; Satellites capture its state at any point in time.	Ontology individuals can participate in reasoning. DV records are queryable data rows. A knowledge graph materialises the ontology’s individuals; DV stores them in relational form.
Annotation / metadata (rdfs:label, rdfs:comment)	Business Glossary / Satellite descriptor	Partial - intent is the same; formalism differs	Ontology annotations are machine-readable and participate in reasoning. DV metadata is typically documentation or operational metadata captured in Satellites. IRiS Assistant bridges this gap by capturing definitions during design.

The key practical takeaway

A well-designed Data Vault is structurally congruent with a business ontology at the level of entities, relationships, and attributes. If you have an ontology (or a conceptual model that functions as one), your Hub design should correspond directly to its classes, your Link design to its object properties, and your Satellite design to its data properties.

Where the ontology has formal axioms and inference rules that the Data Vault does not encode, cardinality constraints, domain and range restrictions, and equivalence rules, those need to be implemented either in the Business Vault as explicit rules, as reference tables, dimensions, or information mart structures, or (for entity equivalence across source systems) as same-as Links. They do not disappear; they just live in a different layer.

The other critical difference is temporality. A standard OWL ontology has no built-in concept of historical state. It defines what things are, not what they were at a previous point in time. Satellites (in Raw Vault or Business Vault) provide exactly this, which means Data Vault adds material value over a pure ontological representation for any AI use case that requires temporal reasoning.

The ontology tells you what things are and how they relate. The Data Vault records every version of those facts across time. For AI that needs to reason historically, trends, before/after comparisons, point-in-time reporting, both layers are required.

DESIGN NOTE: If your project begins with a conceptual or logical model rather than a formal OWL ontology, the mapping still holds. Mosny’s conceptual model (Customer → places → Order → contains → Product) maps directly to Hubs and Links. The formal ontology is the rigorous version of the same structure; the conceptual model is the practical starting point. IRiS Assistant supports this by walking you through entity and relationship definition in plain language and generating the DV design from it.

2. Knowledge Graphs: The Missing Piece in Our Earlier Framing

The previous article and the solution brief did not discuss knowledge graphs. This was a deliberate simplification for a general audience, but it leaves a gap that needs addressing for a technical reader, because knowledge graphs represent the most direct alternative and complement to a Data Vault-based approach, and your technical stakeholders will likely ask about them.

What a knowledge graph is precisely:

Mosny’s definition is clean and accurate: a knowledge graph is the instantiated data connected according to the ontology. Where the ontology defines the structure (Customer is a class; places is an object property linking Customer to Order), the knowledge graph contains the actual instances (Customer C42 placed Order O9001). It is the “living facts” layer.

In practice, knowledge graphs are typically stored in graph databases (Neo4j, Amazon Neptune, Azure Cosmos DB for Gremlin) and queried with SPARQL or Cypher. They represent data as nodes (entities) and edges (relationships) rather than rows and columns, which makes relationship traversal fast and natural, and makes the graph intrinsically queryable by structure rather than requiring explicit joins.

For AI applications, knowledge graphs are particularly valuable as the grounding layer for retrieval. Because relationships are first-class citizens in the graph structure, an AI agent can traverse from Customer to Order to Product to Supplier in a single graph query, assembling context that would require multiple joins in a relational system.

How Data Vault relates to a knowledge graph

Data Vault and a knowledge graph are not competing approaches. They operate at different levels of the stack and serve different purposes:

Data Vault is the integration and historical record layer. It stores every fact, from every source system, with full temporal context. It is optimised for loading, historisation, and auditability. It is the system of record.
A knowledge graph is typically a serving layer, a purpose-built representation of the current (or a specific point-in-time) state of business facts, optimised for relationship traversal and AI context retrieval. It is derived from the system of record.

In a mature AI stack, the knowledge graph is often built on top of the Data Vault: the Raw Vault provides the clean, reconciled, source-separated facts; the Business Vault provides the interpreted definitions; and a graph projection of those facts is maintained as the retrieval layer for AI workloads.

This is a more sophisticated architecture than a simple RAG pipeline, and it is the architecture that scales to complex, multi-entity AI reasoning tasks. The churn prediction example from the solution brief illustrates this: traversing from a customer to their product holdings, their complaint history, and their campaign responses is a natural graph traversal. Doing that with SQL joins across a Raw Vault is possible; doing it via a graph index is faster, and more importantly, it makes the structure of the reasoning explicit and auditable.

Data Vault is where facts are stored and governed. A knowledge graph is where those facts are made traversable for AI. The two are complementary layers of the same stack, not alternatives. As with dimensional models, the ability to build and execute accurate natural language queries with an LLM is possible but works best on top of a well-described, consistently structured model, which is exactly what the Data Vault foundation provides to the graph layer above it.

Practical implications for implementation

If your AI use cases involve simple, single-entity questions (what is the active customer count?), a RAG pipeline drawing directly from Business Vault views is sufficient. If your use cases involve multi-hop reasoning (which customers hold product X, have an open complaint, and have not responded to the last two campaigns?), you are describing a graph traversal, and a knowledge graph serving layer will significantly simplify your retrieval pipeline.

The good news is that the Data Vault foundation makes a knowledge graph easier to build correctly: because Hub Business Keys provide stable, source-agnostic entity identifiers, graph nodes can be keyed reliably. Because Links record relationships with temporal context, graph edges can be projected with correct effective dates. The investment in the Data Vault foundation is not wasted when you add a knowledge graph layer: it is what makes the knowledge graph trustworthy.

TECHNOLOGY NOTE: Platforms like Microsoft Fabric, Databricks, and Snowflake all support approaches to graph-like querying, either natively (Fabric’s GraphQL layer, Databricks Graph Frames) or via integration patterns. A full graph database is not always required; for many enterprise use cases, a well-structured set of Business Vault views with consistent Business Keys provides sufficient graph-like traversability for RAG pipelines.

TEXT-TO-SQL NOTE: For text-to-SQL use cases, the serving layer design is as important as the retrieval design. MotherDuck’s analysis of the BIRD Benchmark (February 2026) shows that join depth is the primary structural variable affecting LLM SQL generation accuracy: two to three joins achieve significantly better results than five or more. The Raw Vault’s normalised structure will produce deep join chains for even simple business questions. The correct pattern is pre-joined Gold layer views built from the Business Vault, one view per subject area, descriptively named columns, maximum two to three joins. These views give the LLM a clean, accurately queryable schema while the Raw Vault handles all the complexity of integration, historisation, and source separation underneath.

3. Context Engineering: The Pipeline That Makes AI Reasoning Reliable

Context engineering is the concept that our earlier articles touched on but did not name explicitly. Mosny’s treatment of it is the most practically useful part of his article, and it is worth engaging with directly.

Mosny defines context engineering as designing the pipeline that assembles the right slice of reality for an LLM to reason over, drawing from ontology, taxonomy, knowledge graph, semantic layer, and policies simultaneously, packaging just enough for the model to think without hallucinating. His context graph is the output of that pipeline: the purpose-built, decision-specific information package the agent needs.

This is more precise and more powerful than the RAG framing we used in the earlier articles. RAG (Retrieval Augmented Generation) is a specific mechanism within context engineering - the retrieval step. Context engineering is the broader discipline of designing which information gets retrieved, from where, in what form, and how it is packaged for the model. Getting this right is the difference between an AI that gives reliable, auditable answers and one that gives confident but untrustworthy ones.

The six-step context engineering pipeline, and where Data Vault contributes

Using Mosny’s Customer-Order-Product universe extended to the business question “Should we offer a discount to this customer on this product?”, the pipeline operates as follows:

CE Step	What happens	Where DV contributes	Gap if DV is absent
1. Identify concepts	Parse the question to extract the business entities and relationships being asked about (Customer, Order, ActiveCustomer)	Hub names and Business Vault definitions provide the canonical vocabulary the concept identifier maps against	Concept resolution maps against inconsistent field names across source systems. “Customer” may resolve to three different tables.
2. Resolve definitions	Apply the ontology / business glossary to determine what each concept means precisely — including which version of a definition applies for the time period in scope	Business Vault versioned rules provide the definition. Satellite effective dates determine which version applies for the query period.	Definitions are in a glossary document, not in the data. The pipeline reads from the database, the definitions live elsewhere. Misalignment is silent.
3. Retrieve instances	Fetch the actual data records relevant to the question - the specific customers, orders, products that match the resolved concepts and the query scope	Hub Business Keys provide stable entity identifiers. Links give the traversal path. Satellites provide the attribute values at the correct point in time.	Instance retrieval pulls from whichever source system is most accessible. Same entity appears multiple times with different IDs. Deduplication is manual or absent.
4. Apply semantic metrics	Compute derived measures — LTV, churn risk score, active status - using governed, consistently defined calculations	Business Vault rules provide versioned metric definitions. The same rule applied in the semantic layer and in the DV ensures the AI and the BI layer agree.	Metrics are defined in the BI tool or semantic layer, not in the data. Recalculating them for AI context requires either duplicating logic or accepting inconsistency.
5. Consult policies	Apply business rules, constraints, and eligibility conditions relevant to the question - discount policies, regulatory restrictions, approval thresholds	Business Vault can encode business rules as versioned, auditable artefacts. The same rules that govern data processing govern AI context assembly.	Policies live in documents, code comments, or the heads of domain experts. The AI cannot access them unless they are explicitly injected into the prompt each time.
6. Package context	Assemble the minimal coherent context package the LLM needs to reason correctly — the context graph for this specific decision	Because all upstream steps resolved correctly, the context package is semantically coherent: no conflicting definitions, no ambiguous identities, no missing history.	Context is assembled from best-effort retrieval. The LLM receives inconsistent signals, resolves them probabilistically, and may produce confident but incorrect output.

Why this matters for Data Vault implementers

The table above makes the dependency explicit: context engineering works reliably only when steps 1 through 5 resolve correctly. Each of those steps has a Data Vault analogue that provides the structural guarantee the pipeline needs.

Without the Data Vault foundation, context engineering pipelines degrade to best-effort retrieval: the system finds something that looks like the right answer, packages it, and hands it to the LLM. The LLM produces a fluent, confident response. The response may be wrong, and there is no reliable way to know.

With the Data Vault foundation, the context engineering pipeline can be deterministic at the data layer: concept resolution maps to canonical Hub definitions, temporal scoping maps to Satellite effective dates, entity resolution maps to Business Keys, metric computation maps to Business Vault rules. The LLM’s job becomes reasoning over a coherent, trustworthy context package rather than compensating for an ambiguous one.

Context engineering is not a replacement for a semantic data foundation; it is what sits on top of one. The better the foundation, the more reliably context engineering works. The worse the foundation, the more the LLM is being asked to compensate for data problems that should have been solved upstream.

The context graph in practice

Mosny’s context graph example is worth examining closely. For the question “Should we offer a discount to Customer C42 on Product P991?”, the context package includes:

Customer attributes (status, LTV, region): from Satellite data; segment (typically a Business Vault-derived classification): from Business Vault
Product attributes (category, margin, inventory, promo flag): from Satellite data
Recent behavioural events (page views, cart abandonment, support cases): from Satellite history
Applicable policies (discount eligibility rules, inventory constraints): from Business Vault rules
Comparable examples (similar customers, previous decisions): from Raw Vault history
Derived facts (eligibility = true, allowed discount = 10%): computed from Business Vault rules

Every item in that list has a direct Data Vault source. The Satellites provide the attribute values and behavioural history. The Business Vault provides the policy rules and the derived eligibility facts. The Hub provides the stable customer and product identifiers that allow all these records to be joined reliably. The Link between Customer and Product provides the relationship context that determines which policies apply.

This is not a coincidence of design. It is the reason that building the Data Vault foundation before the context engineering pipeline, rather than after, is the correct sequencing. Retrofitting reliable context assembly onto a poorly structured data foundation is the hardest and most expensive version of this problem.

4. What This Means for IRiS Implementation

The three sections above establish the theoretical picture. This section translates it into practical implications for teams implementing Data Vault with IRiS, covering the extensible data model decisions that affect AI quality, entity and object definition capture, and the sequencing choices that compound or undermine the investment in the foundation.

Designing with the full stack in mind

Hub and Link design decisions made during the IRiS implementation phase have direct consequences for context engineering quality later. Specifically:

Business Key selection for Hubs determines the stability of entity identifiers across the full stack. A Business Key that is source-system-specific (e.g. CRM customer ID) will break context assembly when the same entity appears in a different system with a different ID. Business Keys should be chosen to survive source system changes, natural keys, composite keys, or assigned enterprise identifiers.
Link granularity determines the relationship traversal paths available to the context engineering pipeline. A Link that records only the existence of a relationship loses the relationship attributes that context engineering needs. Where relationships have meaningful properties (e.g. the date a customer first acquired a product, the channel through which an order was placed), those belong in a Link Satellite, not just the Link itself.
Satellite source system separation matters for context assembly. If attribute values from different source systems are merged into a single Satellite, the context pipeline cannot distinguish which system’s version of the truth to use. Keeping source systems separate in Satellites preserves the optionality to apply source-specific resolution rules later.
Business Vault definition versioning is the foundation of temporal context resolution. Business rules in the Business Vault should carry an effective date and an expiry date. The context engineering pipeline uses these to select the correct definition version for any query period, which means a question about customer churn behaviour 18 months ago can be answered using the definition of “churned” that was in effect at that time, not today’s definition.

A sneak preview of what's coming: Using the IRiS Assistant for definition capture

The IRiS Assistant's definition capture function is directly relevant to context engineering quality. The definitions captured during the design phase: what “customer” means, what “active” means, what the difference between “lapsed” and “churned” is, become the business glossary that underpins step 2 of the context engineering pipeline (definition resolution).

Teams that capture definitions carefully during IRiS implementation are building the context engineering dictionary at the same time as they build the data structures. Teams that skip this step create a gap between the data and its meaning that will need to be filled, manually, expensively, and after the fact when the context engineering pipeline is built.

IMPLEMENTATION RECOMMENDATION: Treat IRiS Assistant definition capture as a first-class deliverable, not a documentation afterthought. The definitions captured here become the business glossary that context engineering pipelines resolve against. Incomplete definitions at this stage translate directly to ambiguous context assembly later.

Sequencing the full stack build

The phased roadmap in the solution brief recommended building the Data Vault foundation before the retrieval and LLM layers. This section adds the reasoning from a context engineering perspective:

Phase 1 (Raw Vault + Business Vault): establishes the entity resolution, temporal history, and definition versioning that all downstream context engineering steps depend on. Without this, context assembly is best effort.
Phase 2 (additional subject areas): extends the traversal graph. Each new Hub and Link adds a node and edge to the logical knowledge graph that context engineering can traverse. AI reasoning capabilities expand with every increment.
Phase 3 (RAG / context engineering pipeline): this is where the investment in Phases 1 and 2 compounds. If Hubs have stable Business Keys, Business Vault rules are versioned, and Satellites are source-separated, the context engineering pipeline can be built reliably and quickly. If those foundations are absent, Phase 3 becomes a data remediation project disguised as an AI project.
Optional: graph projection layer: for use cases requiring multi-hop relationship traversal, a graph projection of Business Vault facts, maintained as a serving layer in the same platform (Fabric, Snowflake, Databricks) can significantly improve retrieval performance and simplify context assembly logic.
Gold layer views for AI query performance: for text-to-SQL and RAG use cases, build pre-joined Gold layer views from the Business Vault: one view per subject area, descriptive column names, maximum two to three joins. Do not point the LLM or RAG engine at the Raw Vault directly. The Raw Vault’s normalised structure is optimised for loading and historisation, not for AI query performance. Research on LLM SQL generation (MotherDuck BIRD Benchmark analysis, 2026) confirms that join depth is the primary structural variable affecting accuracy. The Gold layer is where that complexity is resolved so the AI does not have to navigate it.

5. Where We Agree with Mosny, and Where Data Vault Adds a Dimension

Milan Mosny wrote a great article on this topic, including the use of knowledge graphs. His article is accurate, well-structured, and worth reading in full. Rather than summarise it here, we want to be explicit about where our perspectives converge and where Data Vault adds something his framing does not cover.

Full agreement

The taxonomy–ontology–semantic model layering is exactly right. These are not competing alternatives; they are complementary layers. Our Article 1 uses the same framing for the same reason.
The knowledge graph is the “living facts” layer: instantiated data connected according to the ontology. This is precise, and our treatment in Section 2 above is consistent with it.
Context engineering is more precise and more useful than “RAG” as a framing for what AI retrieval pipelines actually do. The six-step pipeline is a valuable structure for any team building an enterprise AI stack.
The observation that information in these layers “overlaps heavily” and can be partially derived from each other is important. A good conceptual model gets you close to an ontology; a business glossary gets you close to both. The goal is a short path through the stack, not full formal implementation of every layer.

Where Data Vault adds a dimension Mosny’s framing does not cover

Temporality. This is the most significant addition. Mosny’s ontology, knowledge graph, and context graph are largely presented as representing current state. Data Vault’s Satellite structure provides complete historical state, every attribute change, timestamped, for every entity and relationship. For AI use cases that require temporal reasoning (trend analysis, before/after comparisons, point-in-time reporting, audit trails), this is not an optional addition to the ontological structure. It is what makes the difference between an AI that can answer “what was the customer’s status 18 months ago?” and one that cannot.

Multi-source integration. Mosny’s article works with a single, coherent universe of data. In enterprise reality, Customer exists in five systems with five different IDs and five subtly different definitions. Data Vault’s Hub structure, specifically the Business Key and the record source, provides the integration layer that reconciles those representations into a single, consistent entity. This is a prerequisite for reliable knowledge graph construction, and it is not addressed by ontological modelling alone. Note that Hub Business Keys alone are not sufficient for a trustworthy knowledge graph projection: Satellite values from different source systems also need to be cross-mapped and aligned through Business Vault rules before being exposed. Projecting all source Satellites directly under a Hub without that reconciliation step would surface the same multi-source conflicts at the graph layer that the Raw Vault was designed to preserve separately.

Auditability as a structural property. Mosny’s context graph example includes the reasoning path (eligibility = true, derived from POL-10), but the derivation is assembled at query time. For most enterprise reporting and AI use cases this is entirely adequate, and for real-time decision contexts, where the latest state must be reflected now of the query, it can be the preferred approach. In a Data Vault, the derivation is recorded at load time: every Business Vault record carries its source, the rule applied, and the timestamp. This means auditability is not a feature you add to the AI stack, it is a property of the data foundation. In regulated industries, this distinction matters enormously.

Mosny’s framework is an excellent map of the semantic landscape. Data Vault adds the temporal dimension, the multi-source integration layer, and the structural auditability that enterprise AI in regulated contexts requires.

Where to Go from Here

The theoretical connection between Data Vault, ontologies, knowledge graphs, and context engineering is not just intellectually interesting, it has direct practical implications for how to design your implementation, how you capture definitions, how you sequence the build, and how you architect the serving layer for AI retrieval.

The short version for implementers:

Design Hubs and Links against your conceptual or ontological model. If the model says Customer places Order, your Link should reflect that relationship explicitly. Do not infer it from foreign keys.
Capture definitions during IRiS Assistant sessions as a first-class deliverable. These become your context engineering dictionary.
Version Business Vault rules with effective dates. Temporal definition resolution is the difference between point-in-time-accurate AI and AI that silently applies today’s rules to historical data.
Keep Satellites source separated. Merged Satellites lose the source provenance that context assembly needs.
Consider a graph projection layer for multi-hop retrieval. A knowledge graph serving layer built on top of the Business Vault gives context engineering pipelines a traversal-optimised view of the same data.

The Data Vault foundation does not make context engineering easy. Nothing does. But it makes it tractable, by ensuring that when the pipeline asks, “what does this entity mean, what is its history, how does it relate to everything else, and which rule applies?”, there are clean, consistent, auditable answers available.

Blogs and Articles

Data Vault Q&A Sessions

Data Intelligence Series

Implementer's Guide: From Semantic Foundation to Working AI

1. The Precise Relationship Between Data Vault and an Ontology

The key practical takeaway

2. Knowledge Graphs: The Missing Piece in Our Earlier Framing

What a knowledge graph is precisely:

How Data Vault relates to a knowledge graph

Practical implications for implementation

3. Context Engineering: The Pipeline That Makes AI Reasoning Reliable

The six-step context engineering pipeline, and where Data Vault contributes

Why this matters for Data Vault implementers

The context graph in practice

4. What This Means for IRiS Implementation

Designing with the full stack in mind

A sneak preview of what's coming: Using the IRiS Assistant for definition capture

Sequencing the full stack build

5. Where We Agree with Mosny, and Where Data Vault Adds a Dimension

Full agreement

Where Data Vault adds a dimension Mosny’s framing does not cover

Where to Go from Here

Further reading

Continue Reading

The Model Is the Message

Why Your Enterprise AI Project Stalled and How the Right Data Foundation Fixes It

Your Data Architecture Is Holding Your AI Back

Your AI Is Only as Trustworthy as Your Data

From Taxonomy to Trusted AI: Understanding the Semantic Foundations That Make AI Work

Start your IRiS journey

Experience the smarter, faster way to automate your Data Vault.

What we do

Data Vault 2

Who we work with

Company