DataHub MCP Server

The DataHub MCP Server implements the Model Context Protocol (MCP), giving AI agents direct access to your DataHub metadata. Search for data assets, traverse lineage, inspect schemas, and generate SQL — all through natural language in tools like Cursor, Windsurf, Claude Desktop, and OpenAI.

Want to learn more about the motivation, architecture, and advanced use cases? Check out our deep dive blog post.

Deployment Options

Managed MCP Server - Available on DataHub Cloud v0.3.12+
Self-Hosted MCP Server - Available for DataHub Core

Capabilities

Search for Data
Find the right data by asking questions in plain English. Supports wildcard matching (revenue_*), field searches (tag:PII), and boolean logic ((sales OR revenue) AND quarterly).

Dive Deeper
Get usage stats, ownership, documentation, tags, glossary terms, and quality signals for any table, column, dashboard, & more — so agents can separate signal from noise.

Lineage & Impact Analysis
Trace data flow at table and column level, upstream or downstream, across multiple hops. Understand the origins of your data, and plan for upcoming changes.

Query Analysis & Authoring
Surface real SQL queries that reference a dataset — see join patterns, common filters, and aggregation behavior — then generate new queries grounded in actual usage.

Works Where You Work
Seamlessly integrates with Cursor, Windsurf, Claude Desktop, OpenAI, and any other MCP-compatible client.

Tools

The DataHub MCP Server provides the following tools, grouped by whether they read from or write to DataHub. All tools are annotated with MCP-standard hints (readOnlyHint, destructiveHint, idempotentHint) so compatible clients (e.g. Claude) can surface which tools modify catalog state and prompt for confirmation accordingly.

Read-Only Tools

These tools only query DataHub and never modify catalog state.

Discovery & Inspection — Find entities (datasets, dashboards, users, etc.), pull their full metadata, drill into schemas, trace lineage upstream or downstream at the table and column level, and search across saved documents and glossary term history.

info

Document tools (search_documents, grep_documents) are automatically hidden if no documents exist in the catalog.

Tools

search — Search DataHub using structured keyword search (/q syntax) with boolean logic, filters, pagination, and optional sorting by usage metrics.

get_entities — Fetch detailed metadata for one or more entities by URN; supports batch retrieval for efficient inspection of search results.

list_schema_fields — List schema fields for a dataset with keyword filtering and pagination, useful when search results truncate fields or when exploring large schemas.

get_me — Retrieve information about the currently authenticated user, including profile details and group memberships.

get_lineage — Retrieve upstream or downstream lineage for any entity (datasets, columns, dashboards, etc.) with filtering, query-within-lineage, pagination, and hop control.

get_lineage_paths_between — Retrieve the exact lineage paths between two assets or columns, including intermediate transformations and SQL query information.

search_documents — Search for documents using keyword search with filters for platforms, domains, tags, glossary terms, and owners.

grep_documents — Search within document content using regex patterns. Useful for finding specific information across multiple documents.

list_lifecycle_stages — List the lifecycle stages (e.g. proposed, approved, deprecated) configured for glossary terms.

get_glossary_term_versions / compare_glossary_term_versions — Inspect version history for a glossary term, or diff two versions to see what changed.

SQL & Queries — Surface real queries that hit a dataset and use that context to draft new SQL grounded in actual usage, joins, and filters.

Tools

get_dataset_queries — Fetch real SQL queries referencing a dataset or column—manual or system-generated—to understand usage patterns, joins, filters, and aggregation behavior.

find_sql_context — Locate relevant tables, columns, and example queries for drafting SQL grounded in real catalog usage.

draft_sql_for_tables — Draft a SQL query against a specified set of tables using context retrieved from DataHub (schemas, sample queries, lineage).

Governance — Review pending metadata change proposals awaiting approval.

Tools

list_pending_proposals — List metadata change proposals that are pending review.

Mutation Tools

info

Mutation tools are available in mcp-server-datahub v0.5.0+ and on DataHub Cloud v0.3.17+. They are enabled via the TOOLS_IS_MUTATION_ENABLED=true environment variable. Each tool is annotated with readOnlyHint: false so MCP clients can require confirmation before invoking them.

Metadata Editing — Apply tags, glossary terms, ownership, domains, descriptions, and structured properties to entities or individual columns. Also handles entity/term lifecycle stage transitions and document authoring.

Tools

add_tags / remove_tags — Add or remove tags from entities or schema fields (columns). Supports bulk operations on multiple entities.

add_terms / remove_terms — Add or remove glossary terms from entities or schema fields. Useful for applying business definitions and data classification.

add_owners / remove_owners — Add or remove ownership assignments from entities. Supports different ownership types (technical owner, data owner, etc.).

set_domains / remove_domains — Assign or remove domain membership for entities. Each entity can belong to one domain.

update_description — Update, append to, or remove descriptions for entities or schema fields. Supports markdown formatting.

add_structured_properties / remove_structured_properties — Manage structured properties (typed metadata fields) on entities. Supports string, number, URN, date, and rich text value types.

set_lifecycle_stage — Set the lifecycle stage (e.g. proposed, approved, deprecated) of an entity or glossary term directly.

save_document — Save standalone documents (insights, decisions, FAQs, notes) to DataHub's knowledge base. Documents are organized under a configurable parent folder.

Glossary Authoring — Create new glossary terms, version them over time, and link related terms together.

Tools

create_glossary_term — Create a new glossary term directly.

create_glossary_term_version — Create a new version of an existing glossary term to capture changes over time.

add_related_terms — Link related glossary terms (e.g. synonyms, contains, inherits-from) to express relationships in the business vocabulary.

Proposals (governed workflows) — Submit changes for review rather than applying them directly, and accept or reject pending proposals. Useful when an agent should suggest, not commit, metadata changes.

Tools

propose_create_glossary_term — Submit a proposal to create a new glossary term, pending approval.

propose_lifecycle_stage — Submit a proposal to change an entity's lifecycle stage, pending approval.

accept_or_reject_proposals — Accept or reject pending metadata change proposals.

Connecting to Managed MCP Server with OAuth - Recommended

Available in DataHub Cloud v1.0.2+

DataHub Cloud supports OAuth2 with Dynamic Client Registration (DCR) for MCP, so each DataHub user can connect with their own personal login — including SSO through providers like Okta, Azure AD, and others configured for your tenant. Compatible MCP clients (Claude, Claude Code, Cursor, ChatGPT, Snowflake, Databricks, etc.) discover the auth server, register themselves, and walk you through a browser-based login. Tokens are scoped to the signed-in user and refresh automatically.

Single Entry Point

Point your MCP client at the universal endpoint:

https://mcp.datahub.com/mcp

On first connection, the page prompts you for your DataHub domain (e.g. <tenant> for https://<tenant>.acryl.io). Enter it once and the OAuth flow redirects you to your tenant's login, then back to the client — fully authenticated, no token copy-paste required.

The mcp.datahub.com tenant picker prompting for your DataHub domain

Use your direct tenant URL instead

If you'd rather skip the domain prompt at mcp.datahub.com, you can point clients directly at your tenant:

https://<tenant>.acryl.io/integrations/ai/mcp

This endpoint also supports OAuth2 + DCR. The only difference is that mcp.datahub.com/mcp is a single shared URL you can hand out without knowing the tenant ahead of time — handy for marketplace listings or shared docs.

For on-premises DataHub Cloud, use your DataHub FQDN, e.g. https://datahub.example.com/integrations/ai/mcp.

Configure Your Client

Claude (web, desktop, mobile)

In claude.ai or Claude Desktop, open Settings → Connectors (Team/Enterprise: Organization settings → Connectors).
Click Add custom connector.
Name: DataHub. Remote MCP server URL: https://mcp.datahub.com/mcp. Leave the Advanced settings (OAuth Client ID / Secret) empty — DataHub registers the client automatically via DCR.
Click Add, then Connect. Claude opens a browser window for the DataHub OAuth flow.
Enter your DataHub domain when prompted (e.g. <tenant>), sign in, and approve the connection.

note

Remote MCP connectors are configured via the Claude UI, not claude_desktop_config.json — that file is reserved for local stdio servers. For older Claude Desktop versions without remote MCP support, fall back to the mcp-remote bridge with a PAT.

Claude Code

Claude Code supports OAuth-based remote MCP servers natively, including Dynamic Client Registration:

claude mcp add --transport http datahub https://mcp.datahub.com/mcp

The first time you invoke a DataHub tool, the server responds 401 Unauthorized. Claude Code flags the server as needing authentication — run /mcp inside Claude Code and select Authenticate to complete the browser-based OAuth flow. Enter your DataHub domain when prompted; tokens are stored securely and refreshed automatically.

To use your tenant URL directly:

claude mcp add --transport http datahub https://<tenant>.acryl.io/integrations/ai/mcp

Cursor

Cursor supports remote MCP servers via the url field and handles OAuth flows automatically.

Open Cursor → Settings → Cursor Settings → Tools & MCP → New MCP Server (or edit ~/.cursor/mcp.json for global config / .cursor/mcp.json for project-scoped config).

Paste:

{
  "mcpServers": {
    "datahub": {
      "url": "https://mcp.datahub.com/mcp"
    }
  }
}

Save. Cursor triggers the OAuth flow in your browser using the cursor://anysphere.cursor-mcp/oauth/callback callback — enter your DataHub domain and sign in.
The MCP settings page should show a green dot and list the DataHub tools.

ChatGPT

Custom MCP connectors require Developer Mode and are available on Plus, Pro, Team, Enterprise, and Edu plans (not Free).

Enable Developer Mode: profile picture → Settings → Connectors → Advanced, then toggle Developer mode on. (On Team/Enterprise, an admin must first allow it under Workspace Settings → Permissions & Roles → Connected Data → Developer mode / Create custom MCP connectors.)
Still under Settings → Connectors, click Add custom connector (or Create).
Name: DataHub. MCP server URL: https://mcp.datahub.com/mcp. Authentication: OAuth.
Save. ChatGPT walks you through the OAuth flow — enter your DataHub domain and sign in. Pick which DataHub tools to enable for the connector.

Snowflake Cortex Agents / Snowflake Intelligence

Snowflake exposes external MCP servers to Cortex Agents through an API Integration + External MCP Server object pair. Both are created via SQL by an ACCOUNTADMIN, then the resulting connector is added to an agent in Snowsight.

Create an API integration using DCR (run as ACCOUNTADMIN):

CREATE API INTEGRATION datahub_mcp_api_integration
  API_PROVIDER = external_mcp
  API_ALLOWED_PREFIXES = ('https://mcp.datahub.com')
  API_USER_AUTHENTICATION = (
    TYPE = OAUTH_DYNAMIC_CLIENT,
    OAUTH_RESOURCE_URL = 'https://mcp.datahub.com/mcp'
  )
  ENABLED = TRUE;

Create the MCP server object:

CREATE EXTERNAL MCP SERVER datahub_mcp_server
  WITH DISPLAY_NAME = 'DataHub'
  URL = 'https://mcp.datahub.com/mcp'
  API_INTEGRATION = datahub_mcp_api_integration;

In Snowsight, navigate to AI & ML → Agents, open your agent, choose MCP Connectors, and add the DataHub connector.
In Snowflake Intelligence, click Connect next to the DataHub connector — Snowflake walks each user through the DataHub OAuth flow and reuses the credential on subsequent calls.

See the Snowflake agent context guide for end-to-end setup, or use your tenant URL (https://<tenant>.acryl.io/integrations/ai/mcp) in place of mcp.datahub.com if you prefer.

Databricks (Agent Bricks / Genie / AI Playground)

Databricks registers external MCP servers as Unity Catalog HTTP connections behind a managed proxy. The connection then becomes available to Agent Bricks, Genie Code, and AI Playground at https://<workspace>/api/2.0/mcp/external/<connection_name>.

In your workspace, open Catalog → External Data → Connections → Create connection (requires CREATE CONNECTION on the metastore).
Connection type: HTTP. Name: datahub. URL: https://<tenant>.acryl.io/integrations/ai/mcp (your tenant URL is required here — the global https://mcp.datahub.com/mcp endpoint is not yet supported by Databricks).
Check the Is MCP connection box.
For Auth type, select Dynamic Client Registration (DCR per RFC 7591) — Databricks registers a client with DataHub automatically and stores refresh tokens.
Save. The first time the agent uses a DataHub tool, each operator authorizes DataHub via the workspace's managed OAuth flow.

note

DCR requires the workspace's Managed MCP Servers preview to be enabled and the workspace to be in a Model Serving–supported region. The HTTP connection must use streamable HTTP transport (DataHub's /mcp endpoint does).

See the Databricks Agent Bricks and Databricks Genie guides for end-to-end agent setup.

When to Use PAT Auth Instead

OAuth + DCR is the recommended path for interactive clients where a human signs in. Stick with personal access tokens (described below) for:

Service accounts and unattended agentic workflows (CI/CD, scheduled jobs)
DataHub Cloud < v1.0.2 or self-hosted DataHub Core
MCP clients that don't yet implement OAuth-based remote MCP

Connecting to Managed MCP Server with Access Tokens

For DataHub Cloud v0.3.12+, you can connect directly to the hosted MCP server endpoint — no local installation required.

info

The managed MCP server endpoint is only available with DataHub Cloud v0.3.12+. For DataHub Core and older versions of DataHub Cloud, self-host the MCP server instead.

Streamable HTTP Only

DataHub's managed MCP server uses the streamable HTTP transport. Some older MCP clients (e.g. chatgpt.com) may only support the deprecated SSE transport — for those, use mcp-remote to bridge the gap.

Prerequisites

The URL of your DataHub Cloud instance, e.g. https://<tenant>.acryl.io
A personal access token

Connecting & Authenticating

Your managed MCP server URL is:

https://<tenant>.acryl.io/integrations/ai/mcp/

There are two ways to authenticate:

Authorization header — pass your token as a Bearer token in the Authorization header:
```
Authorization: Bearer <token>
```
Token in URL — append your token as a query parameter:
```
https://<tenant>.acryl.io/integrations/ai/mcp/?token=<token>
```
This is a convenient alternative when your MCP client doesn't support custom headers.

On-Premises DataHub Cloud

For on-premises DataHub Cloud, replace <tenant>.acryl.io with your DataHub FQDN, e.g. https://datahub.example.com/integrations/ai/mcp/?token=<token>.

Configure

Claude Desktop

Open your claude_desktop_config.json file. You can find it by navigating to Claude Desktop -> Settings -> Developer -> Edit Config.
Update the file to include the following content. Be sure to replace <tenant> and <token> with your own values.

{
  "mcpServers": {
    "datahub-cloud": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://<tenant>.acryl.io/integrations/ai/mcp/?token=<token>"
      ]
    }
  }
}

Claude Code

Claude Code natively supports streamable HTTP, so no proxy or additional dependencies are needed.

Run the following command, replacing <tenant> and <token> with your own values:

claude mcp add --transport http datahub-cloud \
  "https://<tenant>.acryl.io/integrations/ai/mcp/" \
  --header "Authorization: Bearer <token>"

For a detailed walkthrough, see the Claude integration guide.

Cursor

Make sure you're using Cursor v1.1 or newer.
Navigate to Cursor -> Settings -> Cursor Settings -> MCP -> add a new MCP server.
Enter the following into the file, replacing <tenant> and <token> with your own values:

{
  "mcpServers": {
    "datahub-cloud": {
      "url": "https://<tenant>.acryl.io/integrations/ai/mcp/",
      "headers": {
        "Authorization": "Bearer <token>"
      }
    }
  }
}

Once you've saved the file, confirm that the MCP settings page shows a green dot and the DataHub tools listed.

For a detailed walkthrough, see the Cursor integration guide.

Gemini CLI

gemini mcp add --transport http \
  --header "Authorization: Bearer <token>" \
  datahub-cloud \
  "https://<tenant>.acryl.io/integrations/ai/mcp/"

For a detailed walkthrough, see the Gemini CLI integration guide.

Other

Most AI tools support remote MCP servers. Provide the hosted MCP server URL:

https://<tenant>.acryl.io/integrations/ai/mcp/?token=<token>

Make sure authentication mode is not set to "OAuth" (if applicable).

For clients that don't yet support remote MCP servers, use mcp-remote:

Command: npx
Args: -y mcp-remote https://<tenant>.acryl.io/integrations/ai/mcp/?token=<token>

Service Accounts for Agentic Workflows

For autonomous or agentic workflows — such as CI/CD pipelines, scheduled scripts, or AI agents that run without human intervention — we recommend using a Service Account rather than a personal access token.

Setup:

Create a service account in Settings > Users & Groups > Service Accounts
Generate an access token for the service account
Use that token when configuring the MCP server connection

Scoping search with a Default View (DataHub Cloud v1.0.0+ / DataHub Core v1.6.0+):

Service accounts support a Default View that restricts which data assets the MCP server searches across. This is configured directly from the Service Accounts management screen (the "Default View" column). When set, all searches performed by the MCP server using that service account's token will be scoped to the selected view — useful for limiting an agent's visibility to a specific domain, platform, or team's assets.

tip

Combine a service account with a default view to create a tightly-scoped MCP connection — for example, a "Snowflake Production" view for an agent that only needs access to production Snowflake datasets.

Self-Hosted MCP Server Usage

Run the open-source MCP server locally. This works with any DataHub instance — both DataHub Core and DataHub Cloud.

Prerequisites

Install uv:

# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

The URL of your DataHub instance's GMS endpoint, e.g. http://localhost:8080 or https://<tenant>.acryl.io
A personal access token

Connecting & Authenticating

The self-hosted server authenticates via environment variables:

DATAHUB_GMS_URL — your DataHub GMS endpoint
DATAHUB_GMS_TOKEN — your personal access token

These are passed to the mcp-server-datahub process at startup (see configuration examples below).

Configure

Claude Desktop

Run which uvx to find the full path to the uvx command.
Open your claude_desktop_config.json file. You can find it by navigating to Claude Desktop -> Settings -> Developer -> Edit Config.
Update the file to include the following content. Be sure to replace the placeholder values.

{
  "mcpServers": {
    "datahub": {
      "command": "<full-path-to-uvx>",  // e.g. /Users/hsheth/.local/bin/uvx
      "args": ["mcp-server-datahub@latest"],
      "env": {
        "DATAHUB_GMS_URL": "<your-datahub-url>",
        "DATAHUB_GMS_TOKEN": "<your-datahub-token>"
      }
    }
  }
}

Claude Code

Run the following command, replacing the placeholder values:

claude mcp add datahub \
  -e DATAHUB_GMS_URL="<your-datahub-url>" \
  -e DATAHUB_GMS_TOKEN="<your-datahub-token>" \
  -- uvx mcp-server-datahub@latest

For a detailed walkthrough, see the Claude integration guide.

Cursor

Navigate to Cursor -> Settings -> Cursor Settings -> MCP -> add a new MCP server.
Enter the following into the file, replacing the placeholder values:

{
  "mcpServers": {
    "datahub": {
      "command": "uvx",
      "args": ["mcp-server-datahub@latest"],
      "env": {
        "DATAHUB_GMS_URL": "<your-datahub-url>",
        "DATAHUB_GMS_TOKEN": "<your-datahub-token>"
      }
    }
  }
}

Once you've saved the file, confirm that the MCP settings page shows a green dot and the DataHub tools listed.

For a detailed walkthrough, see the Cursor integration guide.

Gemini CLI

gemini mcp add \
  -e DATAHUB_GMS_URL="<your-datahub-url>" \
  -e DATAHUB_GMS_TOKEN="<your-datahub-token>" \
  datahub \
  uvx mcp-server-datahub@latest

For a detailed walkthrough, see the Gemini CLI integration guide.

Other

For other AI tools, provide the following configuration:

Command: uvx
Args: mcp-server-datahub@latest
Env:
- DATAHUB_GMS_URL: <your-datahub-url>
- DATAHUB_GMS_TOKEN: <your-datahub-token>

Troubleshooting

`spawn uvx ENOENT`

The full stack trace might look like this:

2025-04-08T19:58:16.593Z [datahub] [error] spawn uvx ENOENT {"stack":"Error: spawn uvx ENOENT\n    at ChildProcess._handle.onexit (node:internal/child_process:285:19)\n    at onErrorNT (node:internal/child_process:483:16)\n    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"}

Solution: Replace the uvx bit of the command with the output of which uvx.

Is this page helpful?

DataHub MCP Server

Deployment Options​

Capabilities​

Tools​

Read-Only Tools​

Mutation Tools​

Connecting to Managed MCP Server with OAuth - Recommended

Single Entry Point​

Configure Your Client​

When to Use PAT Auth Instead​

Connecting to Managed MCP Server with Access Tokens

Prerequisites​

Connecting & Authenticating​

Configure​

Service Accounts for Agentic Workflows​

Self-Hosted MCP Server Usage​

Prerequisites​

Connecting & Authenticating​

Configure​

Troubleshooting​

spawn uvx ENOENT​

Deployment Options

Capabilities

Tools

Read-Only Tools

Mutation Tools

Single Entry Point

Configure Your Client

When to Use PAT Auth Instead

Prerequisites

Connecting & Authenticating

Configure

Service Accounts for Agentic Workflows

Self-Hosted MCP Server Usage

Prerequisites

Connecting & Authenticating

Configure

Troubleshooting

`spawn uvx ENOENT`