You have the AI models. You have the budget. You have leadership buy-in.
And yet the AI is not delivering what the business expected. The answers are inconsistent. The outputs cannot be explained. Different teams are getting different results from what should be the same data. Confidence in the initiative is quietly eroding.
This is not an unusual story. It plays out across industries and organisation sizes, and the diagnosis is almost always the same: the problem is not the AI. The models are capable. The issue lies in the data they are reasoning over — data that lacks a shared, consistent, machine-readable understanding.
The pressure to adopt AI is real and accelerating. Boards are mandating it. Competitors are investing in it. Customers expect intelligent, personalised experiences that only AI can deliver at scale.
This article explains three foundational concepts: taxonomies, ontologies, and semantic models that sit at the heart of trustworthy AI. It then shows how Data Vault 2 encodes those concepts into your data architecture, and why that matters for every AI initiative your organisation is building.
The question for most enterprises is no longer whether to pursue A-driven outcomes. It is how to do so in a way that is reliable, auditable and trusted by the business.
That question has a technical answer and it starts with the meaning embedded in your data. It starts with Semantics
When an AI initiative underperforms, the instinct is to look at the model.
Perhaps it needs more training data. Perhaps a different model would perform better. Perhaps the prompt engineering needs refinement.
These are rarely the root cause.
The more common failure is structural, and it sits in the data layer.
Consider a straightforward business question:
How many active customers do we have, and what is their average product holding?
An AI assistant asked this question will return an answer. The problem is which answer and whether it can be trusted.
In a typical enterprise, “customer” exists in the CRM, the billing system, the data warehouse, and several operational applications.
Each system has its own definition of “active.”
The CRM counts anyone who logged in within 90 days.
The billing system counts any account with a non-zero balance.
The warehouse uses a status flag that has not been updated since a product rationalisation two years ago changed what “active” meant to the business.
The AI draws on all three sources. Because nothing in the data tells it which definition to use — or that the definitions conflict, it returns a confident, precise number.
That number is wrong.
Nobody knows it is wrong, because the AI flagged no uncertainty. Decisions about retention campaigns, revenue forecasts, and resource allocation are made on a figure that blends three incompatible definitions.
Example: The same question asked of three different data sources returns three different customer counts — 42,000, 38,500, and 51,200. The AI averages across them silently.
The finance team is working from the 38,500 figure.
The marketing team is working from the 51,200.
Neither knows the other is using a different number.
The AI's answer sits somewhere in between and matches neither.
This is the semantic problem.
And it is not solved by a better model, more compute, or better prompt engineering. It is solved by ensuring your data has a consistent, explicitly defined, machine-readable semantic foundation before you point AI at it.
The concepts that underpin a proper semantic foundation are not new. They have been developed over decades in knowledge engineering and data management.
Three layers typically work together:
Together, they provide the structure that allows machines to interpret business concepts consistently.
A taxonomy is the simplest layer. It organises concepts into a hierarchy — a structured tree of parent-child relationships that answers the question: what category does this belong to?
Most people encounter taxonomies without realising it.
The Linnaean system of biological classification is a taxonomy. Your product catalogue with its Category → Sub-category → SKU structure is a taxonomy. Your organisational chart is also a taxonomy.
In data management, taxonomies classify data domains, organise metadata, structure subject areas, and define business vocabulary.
They create a shared language. Everyone agrees that a Labrador is a Dog, which is a Mammal, which is an Animal.
What a taxonomy cannot do is express how things relate to each other beyond the parent-child hierarchy.
It cannot say that a Customer places an Order, that a Product belongs to multiple categories, or that two concepts from different systems represent the same thing.
An ontology goes further. It formally defines the concepts in a domain, the relationships between them, and the rules that govern those relationships.
Where a taxonomy is a tree, an ontology is a network, capturing not just what things are, but how they connect.
A simple business ontology might define:
A Customer is a type of Party
A Customer places one or more Orders
An Order contains one or more Order Lines
Each Order Line references a Product
A Product is supplied by a Supplier
This network of concepts and relationships allows machines and AI systems to reason about how data entities relate, not just how they are classified.
Ontologies are the backbone of knowledge graphs, enterprise data catalogues, and the grounding layers that make large language models reliable in business contexts.
In healthcare, SNOMED CT and LOINC are ontologies that allow different systems to agree on what a clinical observation means across hospitals, countries, and software platforms.
In financial services, ontologies define the relationships between legal entities, accounts, transactions, and risk exposures, enabling regulatory reporting that is consistent and auditable.
A semantic model is the practical application of taxonomy and ontology thinking to your actual data assets.
It defines what your data means consistently, explicitly, and in a way that can be shared across systems, teams, and use cases.
Where an ontology defines concepts in the abstract, a semantic model connects those concepts to real data: tables, columns, fields, and values.
It answers not just “what is a Customer?” but also:
Where does Customer data live?
What does it mean in each context?
How do we resolve conflicts between definitions?
How does it connect to everything else?
Semantic models underpin data catalogues, Master Data Management, data governance frameworks, and AI grounding layers.
A mature semantic model means that when an AI system asks “who is our most valuable customer?” it is working from a shared, business-aligned understanding of what “customer” and “value” mean, rather than inferring those meanings from field names in a database.
These three concepts are not competing alternatives. They are complementary layers that together build a complete semantic foundation.
A taxonomy gives you a shared vocabulary — a structured way to classify and categorise concepts
An ontology gives you a knowledge structure — a formal definition of how those concepts relate and what rules govern them
A semantic model gives you contextualised meaning — connecting the knowledge structure to your actual data, across systems, at enterprise scale
Every taxonomy can be expressed as a simple ontology.
Every ontology can be incorporated into a semantic model.
The layers build upward in richness and power and each layer makes AI reasoning more reliable than the one below it.
A taxonomy tells you what things are called.
An ontology tells you how they relate.
A semantic model tells you what they mean in the context of your data.
AI needs all three to reason correctly.
Data Vault 2 is, at its core, a methodology for building semantic foundations into your data architecture from the ground up. This is not a retrospective claim — it is the reason the methodology was designed the way it was..
The three structural components of a Data Vault map directly to ontological concepts:
Hubs represent the core business concepts — Customer, Product, Account, Employee. They are the nouns of your business ontology. A Hub provides a single, stable, system-agnostic identifier for each entity, regardless of how many source systems refer to it differently.
Links represent the relationships between those concepts — Customer places Order, Product supplied by Supplier. They are the verbs of your business ontology. Relationships are first-class structural elements, not implicit foreign keys.
Satellites capture the attributes and context of both entities and relationships — every change, with a timestamp, preserved indefinitely. The Satellite is your temporal history layer: the complete record of what every entity looked like at every point in time.
A well-designed Data Vault is not just a storage mechanism. It is a formal, structured representation of your business knowledge, an ontology made operational in your data platform.
When your data architecture reflects the ontology of your business, data becomes self-describing.
Lineage is traceable. Changes are auditable. New sources can be integrated without breaking existing structures.
AI systems can reason over the data with confidence because the relationships and semantics are structural properties, not inferences, assumptions, or conventions that live only in someone’s head.
Return to the “active customer” example from earlier. The same question, asked of an AI grounded in a properly structured Data Vault, produces a fundamentally different result.
“Active customer” is defined once, formally, in the Business Vault — the result of a deliberate business decision, captured as a versioned rule and applied consistently across every source system.
The Hub provides a single, stable identifier for each customer, regardless of which system they appear in. The Satellite records every status change with a timestamp, so the AI applies the correct definition for any time period being queried.
The answer is derived from one reconciled, semantically coherent source of truth.
The AI can also explain exactly how it arrived at that answer, which definition it applied, and when that definition came into effect.
The question was identical. The AI model was the same.
The difference, the only difference was the semantic foundation underneath it.
This plays out across every AI use case:
Full history preserved. Satellites give AI complete temporal context. It can reason across time without data loss — comparing this quarter to three years ago with the same confidence.
Source system separation. Conflicting data from different systems is traceable, not silently merged. The AI can surface the conflict or apply a defined resolution rule — it is never working from an invisible blend.
Consistent entity resolution. “Customer” means the same thing everywhere because the Hub resolves all source system representations into a single business identity.
Auditability. When AI gives a wrong answer, you can trace exactly why — which data was retrieved, which definition was applied, and which source it came from. In regulated industries, this is not optional.
Data Vault does not just store your data. It encodes the ontology of your business — giving AI the stable identity, relationship structure, temporal history, and traceable definitions it needs to reason correctly.
If semantic foundations are this important, why do so many enterprise data platforms still lack them?
The honest answer is that building a proper semantic model — capturing business definitions, translating them into a formal ontology, and implementing that ontology consistently across a multi-source data platform has historically been slow, expensive, and dependent on specialist expertise that is scarce.
The typical result is a platform where the semantic intent exists in documents, whiteboards, point solutions, and the heads of the data team, but is never fully encoded into the architecture itself.
Definitions drift. Naming conventions diverge. New sources get integrated using whatever conventions the engineer at the time thought best.
Over time, the gap between what the platform was supposed to mean and what it actually encodes grows silently and continuously.
When AI is pointed at this platform, it inherits every inconsistency, every implicit assumption, and every undocumented convention.
The outputs reflect the state of the data, not the intent of the business.
The organisations succeeding with enterprise AI are not the ones with the most sophisticated models. They are the ones that invested in the semantic foundation first and are now reaping the benefits of AI that reasons from data it can actually understand.
Taxonomies, ontologies, and semantic models are not abstract theoretical concepts. They are the practical foundation of every trustworthy AI system. Without them, AI reasons over data it does not truly understand — producing outputs that are plausible but unreliable.
Data Vault 2 provides those foundations structurally, encoding the ontology of your business into the architecture of your data platform, with the auditability and temporal completeness that AI reasoning requires.
Summary:
A semantic foundation is the prerequisite for delivering trusted AI solutions. Data Vault provides stable entity identity, full temporal history, explicit relationships, and versioned business definitions.
This creates the foundation for retrieval pipelines that connect data to an AI model, an LLM or Copilot interface that interprets questions and generates responses, and the consumption layer that surfaces results to business users.
What changes when the foundation is sound is that these systems actually work consistently, traceably, and in a way the business can trust.
The next article in this series addresses a question that naturally follows: what does implementing a semantic foundation on Data Vault actually look like in practice and how can you do it without waiting two years for a complete platform before any AI initiative can proceed?
This article is part of the Data Vault Intelligence Series.