MCP Connected Data Catalog: How Governance Decides AI Accuracy

Emily Winks profile picture
Data Governance Expert
Updated:06/04/2026
|
Published:04/16/2026
16 min read

Key takeaways

  • MCP standardizes how AI agents request context from data catalogs; it does not govern the quality of what they receive.
  • An ungoverned catalog connected via MCP delivers ungoverned context at AI speed: fast, formatted, and wrong.
  • Governed context requires certified assets, business glossary terms, complete lineage, and access controls.

Why do you need an MCP-connected data catalog in 2026?

An MCP-connected data catalog exposes governed metadata, including certification, glossary terms, lineage, and access control, to AI agents over the Model Context Protocol. A 2026 study of 7,973 live MCP servers found 40.55% expose their tools without authentication. Even an authenticated agent inherits whatever the catalog stores, so governance, not the protocol, decides whether the answer is right.

What defines a governed MCP-connected catalog:

  • Asset certification — agents know which assets are production-grade vs. draft or deprecated
  • Business glossary — agents retrieve organization-specific term definitions alongside schema
  • Complete lineage — agents trace data origin, transformation, and downstream impact
  • Access controls — RBAC and masking policies propagate through the MCP server to every agent query

Is your data ready for AI agents?

Assess Context Maturity

An MCP-connected data catalog exposes governed metadata, including certification, glossary terms, lineage, and access control, to AI agents over the Model Context Protocol. A 2026 study of 7,973 live MCP servers found 40.55% expose their tools without authentication. Even an authenticated agent inherits whatever the catalog stores, so governance, not the protocol, decides whether the answer is right.

An “MCP catalog” can mean two things. The first is the registry of MCP servers, and the second is a metadata catalog with a governed inventory of tables, columns, glossary terms, lineage, and quality signals that exposes those assets to AI agents over MCP. It’s the data catalog AI agents read from.

This article is about the latter meaning.



What is an MCP-connected data catalog?

Permalink to “What is an MCP-connected data catalog?”

An MCP-connected data catalog is a metadata catalog that serves its assets to AI agents through the Model Context Protocol. The catalog holds the metadata. MCP standardizes how an agent asks for it and how the answer comes back.

The Model Context Protocol is an open standard that Anthropic introduced in November 2024. It gives an agent one consistent interface to call tools and pull context, so teams stop hand-building a separate integration for every model and every source. A data catalog, by contrast, is the governed record of what data exists, what it means, where it came from, and who can use it.

“Connected” simply means an MCP server sits in front of that catalog and translates each agent request into a catalog query. However, what the agent receives depends entirely on the substrate underneath. This is why a data catalog built for AI agents is a different thing from a catalog that merely answers MCP calls.

What MCP actually does, and what it leaves to the catalog

Permalink to “What MCP actually does, and what it leaves to the catalog”

MCP is a delivery protocol. It moves structured context between an agent and a source. It does not judge whether that context is correct, current, or safe to act on.

David Soria Parra, the MCP co-creator at Anthropic who now leads the protocol’s roadmap, put the boundary plainly: The protocol puts information across the wire, and the client is responsible for dealing with it. That single sentence is the whole architecture. MCP defines the pipe, not the water quality.

How the connection works

Permalink to “How the connection works”

The protocol uses a standard request-response cycle built on JSON-RPC 2.0. An MCP server exposes three kinds of capability: tools the agent can call, resources it can read, and prompt templates it can reuse.

Because the interface is standard, the agent needs no catalog-specific code. It works with Claude, Cursor, and other compatible clients out of the box.

MCP defines two transports, and the difference sets the security boundary.

In the stdio transport, the server runs as a local subprocess inside the same trust boundary as the host. In the Streamable HTTP transport, the server runs as a network service and inherits every authentication and access-control requirement of a normal web application. The 2026 MCP roadmap names transport scalability as its first priority, evolving Streamable HTTP so servers can scale horizontally and advertise their capabilities through a .well-known metadata file.

Where the protocol stops

Permalink to “Where the protocol stops”

Here is what MCP cannot do, and the list is the reason this whole article exists.

  • It cannot verify that a table’s metadata is accurate or current.
  • It cannot tell an agent whether an asset is production-grade or a sandbox experiment.
  • It cannot detect lineage gaps from pipelines nobody documented.
  • It cannot flag context as uncertain when the underlying definition is contested.

Why connectivity without governance fails

Permalink to “Why connectivity without governance fails”

An MCP connection amplifies the catalog beneath it. Point it at stale, uncertified, and undefined metadata, and the agent does not visibly stumble. It answers quickly, formats the answer cleanly, and confidently gets it wrong.

┌─────────────────────────────────────────────────────────┐
│           AI Agent (Claude / Cursor / custom)           │
│                    ↓ MCP tool call                      │
│                   MCP Server                            │
│                 ↓ catalog API query                     │
│                  Data Catalog                           │
│           ↓ governed vs. ungoverned substrate           │
│  ┌─────────────────┐   ┌──────────────────────────────┐ │
│  │    GOVERNED      │   │        UNGOVERNED            │ │
│  │  Certified       │   │  Raw schema                  │ │
│  │  assets          │   │  Stale metadata              │ │
│  │  Complete        │   │  Incomplete lineage          │ │
│  │  lineage         │   │  Undefined terms             │ │
│  │  Business        │   │  No quality signals          │ │
│  │  glossary        │   │                              │ │
│  │  Quality scores  │   │                              │ │
│  └─────────────────┘   └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

This is the failure mode. Most MCP discourse is about transport security, who can reach the server. The quieter and more common failure is content trust: the agent reaches a perfectly authenticated server and still receives incorrect metadata.

A concrete example shows how silent the failure is. Atlan’s research team ran a Formula One dataset and asked an AI which drivers were eliminated in the first round. Seeing no column named “eliminated,” the model guessed that elimination meant missing data and wrote WHERE position IS NULL. The query ran. The answer was wrong.

With one piece of governed context, that “eliminated” means the slowest qualifying times, the model wrote the correct query instead.

The accuracy cost is measurable. dbt Labs reran its semantic-layer benchmark in 2026 against the ACME Insurance dataset and found that querying through a modeled semantic layer hit 98.2% to 100% accuracy, while text-to-SQL on the same modeled data reached 84.1% to 90%. dbt’s own conclusion is the sharpest line in the study: text-to-SQL failure looks like a plausible but incorrect answer, while semantic-layer failure looks like an error message. One tells you it cannot answer. The other cheerfully gives you a wrong number.

A human analyst catches these inconsistencies through experience. An agent querying at machine speed propagates them into every downstream decision that trusted its output. This is why “connected” and “governed” are not the same claim, and why context infrastructure for AI agents has become the real question behind the MCP buzz.



The two layers of MCP catalog risk: transport and content

Permalink to “The two layers of MCP catalog risk: transport and content”

Connectivity without governance fails in two distinct ways, and a serious evaluation has to separate them. Call them transport trust and content trust. Transport trust asks who can reach the server. The content question is harder: is what comes back actually true? Solving one does nothing for the other.

The transport risk: the door often has no lock

Permalink to “The transport risk: the door often has no lock”

The security data is blunt. Researchers at Fudan University and Central South University ran the first large measurement study of authentication in remote MCP servers in May 2026. Across 7,973 live servers, 40.55% exposed their tools without any authentication, meaning any client could invoke them without credentials. Of the rest, 30.45% used OAuth, and 29% relied on static tokens or API keys.

The OAuth deployments were not safe either. Among 119 fully testable OAuth servers, every single one had at least one authentication flaw, for a total of 325 flaws. One unauthenticated server alone exposed over 5,000 internal customer records.

Practitioners were saying the same thing out loud before the academic numbers landed. Eric Johnson, a principal developer advocate at AWS, summarized the posture of roughly 200,000 STDIO servers as “execute first, validate never.”

The content risk: the room is full of unverified furniture

Permalink to “The content risk: the room is full of unverified furniture”

Securing the door does nothing about the furniture inside the room. An agent can authenticate perfectly and still inherit:

  • Stale schema, from tables renamed or deprecated since the last catalog update.
  • Missing ownership, so the agent attributes data to the wrong team or no team.
  • Uncertified assets, so a sandbox table gets treated as production truth.
  • Undefined terms, so “revenue” resolves to three different definitions depending on which schema the agent happened to query.

dbt’s Stephen Rob described the result precisely: Without a structured, governed context, the model generates SQL that runs but returns the completely wrong answer with complete confidence. The transport risk is loud, and the content risk is silent, and an MCP catalog that closes only the first one is still dangerous.

What does “governed” actually mean?

Permalink to “What does “governed” actually mean?”

Governance is not a slogan you attach to a catalog. For MCP, it resolves into four concrete signals, what we will call the four governed signals, and each one maps to a field the agent receives in its response.

Signal What it tells the agent What the agent receives over MCP
Asset certification Which assets are production-grade versus draft or deprecated A trust status attached to each asset
Business glossary What does a term like “revenue” mean in this specific organization The definition alongside the schema, not just the column name
Data lineage Where a number came from and who depends on it downstream The upstream sources, transformations, and downstream consumers
Access control What this agent’s identity is permitted to see Only the assets its role allows, with masking applied

What metadata does an agent receive via MCP?

Permalink to “What metadata does an agent receive via MCP?”

The four signals are abstract until you watch one query. Picture an agent asking for the customer’s table through two different catalogs.

From an ungoverned catalog, the MCP server returns a schema dump: table name, column names, and types. The agent has no idea which columns are trustworthy, what the business terms mean, or whether the data is up to date. It fills the gaps with plausible assumptions, and “plausible” is not the same as “correct”.

From a governed catalog, the same request returns far more. The agent gets the certification status, the glossary definitions for ambiguous fields, the lineage edges showing where the data originated, the freshness and completeness scores, and the owning steward.

The accuracy payoff is documented. Atlan’s enhanced-metadata research tested 522 query evaluations and found a 38% relative improvement in AI SQL accuracy from richer metadata, statistically significant at p < 0.0001. The biggest gains landed on medium-complexity queries, the everyday workhorse questions, which improved by 2.15x. The lesson is not “more metadata.” It is the right metadata, structured for a machine to use.



Major MCP catalog comparison: what each one exposes

Permalink to “Major MCP catalog comparison: what each one exposes”

When you evaluate a vendor’s “MCP-enabled” claim, ask what the server exposes, not whether it speaks the protocol. Protocol compliance is table stakes. Governed exposure is the differentiator. The table below describes what each option surfaces, without endorsement.

Platform What it exposes over MCP Scope boundary
Google Knowledge Catalog (Dataplex) Catalog metadata via the MCP Toolbox GCP-centric
Databricks Unity Catalog Catalog objects and an MCP Catalog that registers and governs MCP servers, with a centralized audit table Databricks lakehouse
CData MCP servers that wrap data-source connectors Query passthrough, limited native metadata governance
Atlan Certified assets, glossary terms, lineage, quality scores, and ownership are governed across platforms Cross-platform

How to build a governed MCP-connected catalog

Permalink to “How to build a governed MCP-connected catalog”

Building a governed MCP connection is a two-phase job: govern the catalog first, then expose it. Most teams reverse the order. They wire the protocol in days, the governance backlog takes months, and in the gap the agents get fast access to bad context.

The right sequence puts a governance gate at every step.

  1. Audit what your catalog exposes today. Run the ratio of certified to uncertified assets. Find stale and deprecated tables still sitting in the catalog. Document what an agent would receive right now if MCP went live. That is your baseline gap.
  2. Set governance policies for AI-consumable assets. Define what “certified” means and who has the authority to certify. Assign stewards to high-priority domains. Set freshness rules and link glossary terms to physical assets, so an agent retrieves meaning alongside the schema.
  3. Stand up the MCP server on the governed layer. Configure it to expose only certified or steward-reviewed assets by default. Decide which object types are tool-callable, and test against a constrained query set before widening access.
  4. Scope what each agent can reach. Apply least privilege. A BI agent, a compliance agent, and a developer assistant should each see a different slice. Row-level and column-level security set at the catalog level propagate through the server.
  5. Monitor consumption and flag drift. Log every query and the asset it touched. Alert when agents repeatedly hit uncertified or stale assets, and watch quality scores so you catch degradation before it reaches a decision.

For the protocol-side mechanics, the MCP server implementation will be a better guide.

How Atlan delivers governed context

Permalink to “How Atlan delivers governed context”

Atlan approaches MCP from the governed side. Its MCP server does not just expose metadata; it exposes governed metadata, so certification, glossary terms, lineage, quality scores, and ownership travel with every agent query.

Atlan’s enhanced-metadata research measured a 38% relative lift in AI SQL accuracy from richer metadata across 522 evaluations, with a 2.15x gain on medium-complexity queries, at p < 0.0001. Separately, customers (like Workday) in Atlan AI Labs workshops saw a 5x improvement in query accuracy just by adding metadata.

The Atlan MCP server exposes a specific, governed set of capabilities to agents in Claude, Cursor, or a local environment:

  • Search and discovery across certified assets, so agents prioritize what has been reviewed.
  • Lineage traversal, so agents trace origin, transformation, and downstream impact.
  • Glossary retrieval, so a query for “revenue” returns the organization’s definition next to the table.
  • Metadata updates, including descriptions and certification status, so workflows are not read-only.

Atlan also governs metadata across Snowflake, Databricks, BigQuery, and dbt rather than within a single warehouse, and it manages AI and data governance for the MCP server out of the box. The same governed graph that powers human discovery and context engineering is the graph that the agents read, which removes the duplication of rebuilding context for every copilot.

"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server... as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."

Joe DosSantos, VP of Enterprise Data & Analytics, Workday

"Atlan is much more than a catalog of catalogs. It's more of a context operating system... Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."

Sridher Arumugham, Chief Data & Analytics Officer, DigiKey

See what the governed MCP context looks like. Explore the Atlan MCP server and how it exposes certified, lineage-aware metadata to any compatible AI agent.

FAQs about MCP connected data catalog

Permalink to “FAQs about MCP connected data catalog”

How is an MCP-connected catalog different from a standard database connection?

Permalink to “How is an MCP-connected catalog different from a standard database connection?”

A standard database connection gives an agent raw query access to tables and columns, nothing more. An MCP-connected catalog wraps that access in a governed layer through one standard interface, so the agent also receives certification status, glossary definitions, lineage, and access rules alongside the schema. The agent gets meaning and trust signals, not just data.

What is the difference between an MCP catalog and an MCP server registry?

Permalink to “What is the difference between an MCP catalog and an MCP server registry?”

An MCP server registry is a directory of available MCP servers, such as the Docker MCP Catalog or Microsoft’s public repository. An MCP-connected data catalog is a system of record for data-asset metadata that agents query over MCP. One lists servers. The other governs the metadata that an agent reasons with.

What is the most common MCP catalog failure mode in production?

Permalink to “What is the most common MCP catalog failure mode in production?”

Silent content failure. The agent reaches a fully authenticated server, receives stale or undefined metadata, and returns a fast, well-formatted answer that happens to be wrong. Nothing crashes, so the mistake flows into downstream decisions before anyone catches it.

Is MCP secure for connecting catalogs to agents?

Permalink to “Is MCP secure for connecting catalogs to agents?”

MCP supports authentication at the protocol layer, but security depends on the deployment and the catalog. A 2026 measurement study found 40.55% of live MCP servers ran with no authentication, and every tested OAuth server carried at least one flaw. A well-configured server on a governed catalog can meet enterprise requirements; an ungoverned connection cannot.

Which data catalogs support MCP?

Permalink to “Which data catalogs support MCP?”

Google Knowledge Catalog, Databricks Unity Catalog, dbt, CData, and Atlan all offer MCP access, and Microsoft and Docker publish registries of MCP servers. The meaningful question is what each exposes. Some surface governed metadata with certification and lineage; others pass queries through with little governance.

How do I make my catalog MCP-enabled correctly?

Permalink to “How do I make my catalog MCP-enabled correctly?”

Govern the catalog first, then expose it. Certify assets, assign stewards, define glossary terms, and apply access controls at the catalog level so they propagate through the MCP server. Configure the server to expose only certified assets by default, then monitor agent queries and flag quality degradation over time.

What metadata can an agent get from an MCP catalog?

Permalink to “What metadata can an agent get from an MCP catalog?”

Through an MCP server, an agent can access table schemas, column descriptions, business term definitions, lineage graphs, ownership records, certification status, and data quality scores. The exact set depends on what the catalog stores, what the server is configured to expose, and the permissions applied to the agent’s identity.

What is the difference between an MCP-connected catalog and a semantic layer?

Permalink to “What is the difference between an MCP-connected catalog and a semantic layer?”

A semantic layer encodes business metrics and definitions so queries return consistent, governed calculations. An MCP-connected catalog exposes governed metadata, including glossary, lineage, and certification, to agents over the protocol. The two are complementary, and the distinction is covered in detail in the comparison of a context layer versus a semantic layer.

What happens if an agent reads ungoverned data through MCP?

Permalink to “What happens if an agent reads ungoverned data through MCP?”

The agent returns fast, well-formatted, confidently wrong answers. Without certification, it cannot tell trusted assets from untrusted ones. Without a business context, terms are ambiguous. Without lineage, it cannot trace where a number came from. The failure is not a crash; it is a systematically incorrect output delivered at AI speed.


When is an MCP catalog actually production-ready?

Permalink to “When is an MCP catalog actually production-ready?”

MCP makes wiring an agent to a catalog trivial. It standardizes access, not trust. The right question is not “are we MCP-enabled?” It is “is what we expose certified, current, lineage-backed, and access-controlled?”

The fix is not a better protocol. MCP is fine, and it does not need replacing. The fix is a governed catalog underneath it. That is the standard worth building toward.


Share this article

signoff-panel-logo

Atlan is the Context Layer for AI — a Leader in the Gartner Magic Quadrant for D&A Governance (2026) and the Forrester Wave for Data Governance (Q3 2025). Atlan unifies your data, business knowledge, and the meaning behind your terms into one Enterprise Data Graph that gives every team and every AI agent the trusted context they need. Trusted by Mastercard, Workday, General Motors, CME Group, HubSpot, FOX, Virgin Media O2, Elastic, and 400+ enterprises representing $10T+ in market cap.

Bridge the context gap.
Ship AI that works.

[Website env: production]