Your Data Architecture Is Holding Your AI Back

Why Star Schemas, Raw Medallion, Semantic Layers, and Ontology Platforms Are Not Enough and What to Do About It.

If you are investing in AI, someone in your organisation has almost certainly said one of the following:

"Our data warehouse is fine, we just need to connect the AI to it."

"We already have a Medallion Architecture — Bronze, Silver, Gold — we’re good."

"We use dbt and a semantic layer. That’s our semantic foundation sorted."

"We’ve been running star schemas for fifteen years. They work. Why change?"

"We’re on Palantir Foundry. We have an ontology layer. AI is handled."

These statements are not made in bad faith. The people making them have built real systems that deliver real value. Star schemas power dashboards that run businesses. Medallion architectures process enormous volumes of data. dbt has genuinely transformed how data teams work. Palantir Foundry’s ontology layer is a serious, sophisticated piece of engineering. These approaches are not worthless.

But none of them, individually or combined, are sufficient foundations for trusted, production-grade AI-enabled queries. And the gap between “sufficient for reporting” and “sufficient for AI” is wider than most organisations realise until they have already committed significant budget and found out the hard way.

This article names the problem directly, for each approach, without softening the edges. If that makes for uncomfortable reading, good. Better to be uncomfortable now than to be explaining to your board in twelve months why your AI initiative is producing unreliable outputs.

The question is not whether your current architecture works. The question is whether it gives AI what AI needs to reason correctly, with proper boundaries in place. For most architectures, the honest answer is no.

What AI Actually Needs From Your Data

Before examining each architecture in turn, it is worth being precise about what AI systems require from the data they reason over. This is not a vague or theoretical requirement — it is specific, testable, and largely unmet by conventional data architectures.

For an AI system to produce outputs that can be trusted, the underlying data must provide:

Stable entity identity. AI needs to know, unambiguously, what a “Customer” is — the same customer, consistently identified, across every system and every time period. Inconsistent or system-specific identifiers break entity reasoning entirely.
Formal relationship structure. AI needs to understand how entities relate to each other — not inferred from column names or foreign keys, but explicitly encoded in the data model. A Customer places Orders. An Order contains Products. These connections need to be first-class citizens in the architecture, not implicit assumptions.
Complete historical context. AI reasoning about trends, patterns, and changes requires full temporal history. Not just current state. Not just monthly snapshots. Every change, timestamped, traceable, and preserved.
Separation of raw and interpreted data. AI needs to be able to distinguish between what the source system said and what the business has interpreted it to mean. Conflating the two produces models that cannot be audited or corrected when interpretations change.
Consistent semantics across sources. When the same concept exists in multiple source systems — and it always does — AI needs a single, reconciled representation. Not three different versions of “Customer” from three different systems with three different definitions. If multiple versions exist, the relationships between those versions must be known and honoured.

Without these properties, your AI will be guessing, and will almost certainly generate incorrect results from questions, and may choose a different path each time. With that framework established, let us look at each common architecture honestly.

The Star Schema: Great for Reports, Dangerous for AI

The star schema — fact tables surrounded by dimension tables — is one of the most successful patterns in the history of data analytics. Ralph Kimball’s methodology built careers, businesses, and data platforms that have served organisations reliably for decades. This is not in dispute.

What is also not in dispute, for anyone willing to look at it clearly, is that the star schema was designed to answer predefined business questions quickly, not to provide a flexible, complete, semantically rich foundation for AI reasoning.

The history problem

Star schemas are optimised for current state. Slowly Changing Dimensions (SCDs) exist to manage history, but in practice most organisations implement SCD Type 1, which overwrites history, because Type 2 is operationally complex. The result is a data model where AI cannot ask “what did this customer look like six months ago?” — because that information has been overwritten, or “how has this relationship evolved over time?” — because no relational context exists beyond what is hinted at within the context of the fact. For trend analysis, anomaly detection, and any AI use case that requires temporal reasoning, especially within the context of relationships, this is a fundamental structural failure.

The pre-aggregation problem

Fact tables contain pre-aggregated, pre-modelled data shaped around specific reporting needs. The grain of the fact table is a design decision made at build time, based on the questions the business was asking when the warehouse was built. AI does not ask predefined questions. It explores. Pre-aggregated data destroys the low-level detail that AI needs to find unexpected patterns. You cannot reconstruct the original data from a summarised fact table.

The semantic rigidity problem

A star schema embeds business interpretations directly into the data model. The definitions of “sale”, “customer”, “revenue” are baked into the fact table design. When those definitions change and in every business, they change, the model has to be rebuilt. AI systems grounded on a star schema inherit whatever definitional choices were made when that schema was designed. When the business changes its definition of a metric, the AI’s understanding does not automatically update.

THE AI RISK: Without vigilant maintenance, an AI reasoning over a star schema is reasoning over a model of your business as it was understood when the schema was designed — not as your business may actually operate today. It is also unable to navigate the most fundamental business process changes over time as expressed in the relationships between stable entities.

The Normalised Relational Model: Semantically Rich, Temporally Blind

Normalised relational models — third normal form (3NF) database designs — do encode relationships more faithfully than star schemas. Foreign keys represent real relationships. Entities are separated. The structure can be closer to an ontology than a dimensional model. So why is it insufficient for AI?

The history problem — again, but worse

Operational relational databases are designed to reflect current state. When a customer changes their address, the old address is overwritten. When an account is closed, the record is updated or deleted. This is exactly what you want from an operational system. It is the opposite of what you want from an AI foundation. AI needs to reason about what was true, when, and how things changed. A normalised relational model gives AI a snapshot of current reality with no temporal depth. Every historical signal — the patterns of change that are often the most valuable inputs to AI models — is gone.

The scale and performance problem

Normalised models are fundamentally at odds with the query patterns that AI workloads prefer. AI exploration requires joining many tables across large datasets, exactly the kind of query that normalised relational designs handle poorly at scale. The model that works well for transactional processing becomes a bottleneck for the analytical and AI workloads that need to traverse it broadly and repeatedly.

The integration problem

Normalised relational models reflect the semantics of a single system — the application they were designed to serve. Integrating multiple source systems into a single normalised model requires resolving conflicts between different systems’ definitions, identifiers, and structures. This integration logic typically lives in ETL code, undocumented and unversioned, rather than in the data model itself. AI has no access to that logic and no way to reason about the integration decisions that were made.

THE AI RISK: A normalised relational model tells AI what your data looks like right now in one system. It cannot tell AI what happened, how things changed, or how concepts relate across the rest of your business — and suffers similarly to dimensional models in that relationship temporal details are largely absent.

Raw Medallion Architecture: Necessary Infrastructure, Insufficient Semantics

Medallion Architecture — Bronze (raw ingestion), Silver (cleansed and conformed), Gold (business-ready aggregations) — has become the default pattern for Lakehouse platforms built on Databricks, Snowflake, Microsoft Fabric, and similar technologies. It is a genuinely useful high-level categorisation framework for organising data processing pipelines, and it is not going away.

The problem is not Medallion Architecture itself. The problem is the widespread assumption that a Medallion Architecture is, on its own, a semantic foundation for AI. It is not. It is a processing framework. Semantics are not a property of the framework — they are a property of what you put beside it.

The Silver layer problem

The Silver layer is where Medallion Architecture most commonly fails AI. “Cleansed and conformed” sounds reassuring, but in practice Silver layers are often implemented as lightly transformed copies of source data — column renaming, data type casting, basic deduplication. The semantic work of defining what entities are, how they relate, and how they should be reconciled across sources is deferred, or never done at all. An AI system reasoning over a Silver layer that is essentially relabelled Bronze data is not reasoning over a semantic model. It is reasoning over raw operational data with better formatting.

The Gold layer problem

Gold layers have the same pre-aggregation problems as star schemas — the data has been shaped for specific known use cases, destroying the detail and flexibility that AI exploration requires. An AI is not going to get what it needs from a Gold layer designed to feed a Power BI dashboard. It also loses track of the derived data origins and versioning over time as the Gold layer employs design elements and their challenges already discussed.

The undifferentiated blob problem

Without a formal structural methodology in the Silver layer, Medallion Architectures tend to accumulate a growing collection of tables with no consistent naming conventions, no formal entity model, and no documented relationships. Teams know what each table does because they built it. New team members, auditors, and AI systems have no such context. As the platform grows, it becomes progressively harder to understand what data means without asking the person who built it, or how it changed over time. This leads to the AI picking what seems closest and presenting it as fact.

THE AI RISK: A Medallion Architecture without a formal semantic layer in Silver is a well-organised data lake — not an AI-ready data platform. Bronze and Gold do not compensate for a Silver layer with no semantic rigour.

dbt and Semantic Layer Tools: The Right Problem, the Wrong Depth

Of the approaches discussed in this article, the dbt-plus-semantic-layer combination deserves the most nuanced treatment — because it is addressing the right problem. Tools like Cube, MetricFlow, and dbt’s native semantic layer exist precisely because data teams recognised that their analytical models lacked consistent business definitions and shareable metric logic. That diagnosis is correct. The treatment, however, does not go deep enough.

The definition-without-structure problem

Semantic layer tools allow you to define metrics and dimensions in a centralised, reusable way. “Revenue” means the same thing in every downstream tool because it is defined once. This is genuinely valuable for reporting consistency. But a semantic layer built on top of a poorly structured underlying model is a veneer, not a foundation. It defines what your metrics mean. It does not define what your entities are, how they relate to each other, or how those relationships have evolved over time. An AI system that needs to reason about Customer-Order-Product relationships cannot get that from a metric definition file.

The history problem — a recurring theme

dbt transforms are typically run on current-state data. The models dbt produces reflect the current shape of your data, with whatever history the underlying tables happen to contain. dbt has no native concept of full temporal tracking — of preserving every state change for every entity with a reliable timestamp. This is not a criticism of dbt; it was not designed for that purpose. But it means that an AI grounded on dbt models inherits whatever historical limitations exist in the underlying tables.

The source dependency problem

dbt models are defined on top of source data. The semantic layer sits above the dbt models. This creates a chain of dependencies running all the way down to whatever the source systems happened to provide. When source systems change their schemas, data types, or identifiers, every dbt model and every semantic layer definition built on top of them is at risk. The semantic layer is only as stable as its foundations — and if those foundations are direct source system connections, they are not very stable at all.

THE AI RISK: A semantic layer is better than no semantic layer. But “we have dbt and a semantic layer” is not equivalent to “we have a semantic foundation for AI.” It is a useful reporting abstraction built on top of an architecture that may still lack the entity stability, relationship structure, and temporal completeness that AI requires.

Palantir Foundry Ontology: Semantically Powerful, Structurally Constrained

Palantir Foundry’s ontology layer deserves its own treatment, because it is genuinely the most sophisticated approach on this list when it comes to semantic modelling. Unlike the other architectures discussed here, Palantir was explicitly designed around an ontological foundation: defining objects (entities), properties, and links (relationships) as first-class citizens of the platform. Every application, every AI query, every workflow in Foundry interacts with the world through the ontology layer. That is a meaningful architectural commitment that most platforms do not make.

The result, in environments where it is well implemented, is an AI that reasons over business concepts rather than database schemas — a Customer is a Customer, an Order is an Order, and the relationship between them is formally defined and queryable. Palantir AIP (Artificial Intelligence Platform) is built directly on this ontology, giving AI agents a structured, governed view of the business.

So why is it on this list? Because the strengths of the ontology layer come with a set of structural constraints that create real problems in enterprise data environments — particularly the pipeline complexity that emerges when trying to map multi-source, multi-system enterprise data into the Foundry object model.

The pipeline complexity problem

The Palantir ontology works by mapping source data into object types through transformation pipelines. Each source system — the CRM, the ERP, the billing platform, the service desk — requires dedicated pipeline work to shape its data into the ontology’s object and link definitions. In a single-source or tightly scoped environment, this is manageable. In a typical enterprise with dozens of source systems, evolving schemas, and conflicting definitions of shared concepts, it becomes what practitioners describe as data pipeline spaghetti.

The problem is architectural: the Foundry ontology is the target model, and every source system must be transformed to meet it. When source systems disagree about what a “Customer” is — which they always do — the resolution logic has to be built into the pipeline, maintained in the pipeline, and re-engineered every time a source system changes. There is no structural home for integration logic analogous to Data Vault’s separation of raw and business-interpreted layers. The pipeline carries the integration burden that the architecture does not distribute.

The temporal history problem

Palantir’s ontology is fundamentally a current-state model. Object types represent what entities are now — their current properties, their current relationships, their current status. The platform supports some historical tracking through dataset versioning and point-in-time queries at the dataset level, but this is not the same as a structural, per-entity temporal history of the kind that Satellite tables provide in Data Vault. When an AI in Foundry asks about the history of a customer relationship, it is querying underlying datasets rather than a formal temporal record. The answer depends on what those datasets happen to have preserved — which varies by source system and pipeline design.

The proprietary lock-in problem

The Foundry ontology is a proprietary construct — it exists within Palantir’s platform and nowhere else. The semantic model your organisation builds is not portable. It cannot be exported to a standard ontology format, queried independently of the Foundry platform, or integrated with other semantic tooling. Organisations that invest deeply in a Foundry ontology are, by design, invested in Palantir’s ecosystem for the long term. This is not necessarily wrong — Palantir is explicit about this model — but it is a strategic dependency that data leaders should enter with clear eyes rather than discover after the ontology has become the central nervous system of the organisation.

The cost and accessibility problem

Palantir Foundry is a significant platform investment — licensing costs typically run into millions annually for enterprise deployments, and implementation requires specialist Foundry engineering expertise that is scarce and expensive. For organisations considering whether to build their semantic foundation on Foundry or on open, platform-native Data Vault methodology, the total cost of ownership comparison is significant. A Data Vault implementation with IRiS runs on the platforms the organisation has already invested in — Microsoft Fabric, Snowflake, Databricks — and generates standard SQL artefacts that are readable, portable, and independent of any single vendor.

Palantir’s ontology gets the semantic argument right — AI should reason over business concepts, not raw tables. The problem is that mapping enterprise data into that ontology requires a pipeline complexity that the architecture does not structurally solve. Data Vault solves the integration problem first, and the semantic layer emerges from the structure.

THE AI RISK: A Foundry ontology built on top of poorly integrated, inconsistently mapped source pipelines inherits those inconsistencies in its object model. The ontology is only as trustworthy as the pipelines feeding it — and in complex multi-source environments, those pipelines are where the hard integration problems accumulate, invisible to the AI reasoning over the objects above.

THE HONEST COMPARISON: If your organisation is already invested in Palantir Foundry and building out the ontology carefully, Data Vault methodology can strengthen the pipeline layer that feeds the ontology — providing the source separation, entity resolution, and temporal history that Foundry’s pipeline layer typically handles inconsistently. The two are not mutually exclusive. But for organisations choosing their foundation, an open Data Vault approach on existing platform infrastructure delivers the semantic properties AI needs without the platform dependency or the pipeline integration burden that the Foundry model concentrates at the source mapping layer.

The Verdict at a Glance

Measured against what AI actually needs — stable entity identity, formal relationship structure, complete historical context, and consistent cross-source semantics — here is how the common approaches compare:

Architecture	Tracks history?	Encodes relationships?	AI-ready semantics?	AI-ready views?
Star Schema / Dimensional	X	X	X	X
Normalised Relational	X	✓	-	X
Raw Medallion (no DV)	-	X	X	X
dbt + Semantic Layer	X	-	-	-
Palantir Foundry Ontology	X	✓	-	-
Data Vault 2.0	✓	✓	✓	✓

✓ = Yes - = Partially X = No

Data Vault 2 was explicitly designed to provide all these properties simultaneously, and it can do so at enterprise scale, across multiple source systems, with full auditability — including across relationships and their temporal requirements. In essence providing the much-needed guardrails for AI to operate within, in addition to the semantic enablement.

The Honest Acknowledgement

None of this means you need to throw away your existing technical architecture tomorrow. Star schemas will continue to serve reporting workloads well. dbt will remain a valuable transformation tool. Medallion Architecture is a useful reference framework. Normalised relational models are the right choice for operational systems. Palantir Foundry’s ontology layer delivers real value for organisations that have made that platform investment.

The point is not that these approaches are failures. The point is that they do not provide sufficient foundations for the AI ambitions that most organisations now have — and pretending otherwise is expensive.

The most common and costly pattern we see is this: an organisation invests in a modern Lakehouse, deploys a Medallion Architecture, adds a semantic layer or ontology tool, connects an AI or LLM, and then discovers anything up to twelve months later that the AI outputs cannot be trusted because the underlying data lacks the structural properties — the guardrails — that reliable AI reasoning requires. The rework at that point — re-engineering the Silver layer, implementing proper entity models, recovering historical data across not only core entities but relationships — costs far more than doing it right the first time.

The cost of not having a semantic context or guardrails for AI is not paid when you build the platform. It is paid when the AI goes wrong — in front of the business, in front of the board, or in front of a regulator.

What Good Looks Like

Addressing these limitations does not require starting from scratch. It requires introducing Data Vault methodology into the Silver layer of your Lakehouse — the integration layer where entity resolution, relationship encoding, and temporal history need to live, for both raw and derived data.

A well-implemented Data Vault Silver layer:

Defines your core business entities as Hubs with stable, system-agnostic Business Keys — resolving the entity identity problem once and for all
Encodes relationships between entities as Links — making the ontology of your business a first-class structural property of the data platform
Preserves complete history as Satellites — every change, every source, every timestamp, indefinitely, including for relationships
Separates raw source data from business-interpreted data — giving AI both what the source said and what the business means
Integrates cleanly with Medallion Architecture — sitting in Silver, feeding Gold, without requiring you to abandon the framework you have already built, and without requiring Gold to be the proxy for derived or relationship data

This is precisely where IRiS, Ignition’s Data Vault automation platform, delivers its value. The structural patterns described above — Hubs, Links, Satellites, for raw and derived data — are what IRiS generates, consistently, at scale, and incrementally: one source table at a time, delivering AI-ready data within your Silver layer with each sprint rather than at the end of a multi-year programme.

Your star schemas and reporting layers do not need to change. Your dbt transformations remain useful. Your Medallion Architecture stays in place. What changes is the Silver layer underneath — from a collection of lightly transformed source copies to a formal, semantically coherent, AI-ready integration model.

That is not a rip-and-replace. It is an upgrade. And it is one that pays dividends not just for AI, but for every data consumer downstream.

You do not need to choose between the architecture you have and AI-readiness. You need to add the semantic foundation that your current architecture is missing.

This article is part of the Data Vault Intelligence Series.

Blogs and Articles

Data Vault Q&A Sessions

Data Vault Intelligence Series