Data Contracts & SLAs

Hard 25 min read

Overview

Why This Matters

Data contracts formalize expectations between producers and consumers: required schema (columns, types), freshness SLA (updated by 8am), and quality assertions (no nulls in keys, values in range). DataHub tracks contract compliance.

Core Concepts

Data Contracts & SLAs is a critical capability in DataHub's metadata platform. Understanding the core concepts helps you implement effective metadata management.

Configuration

DataHub provides both UI-based and API-based configuration for data contracts & slas. Most settings can be managed through the admin panel or programmatically via GraphQL.

Integration

Works seamlessly with DataHub's ingestion framework, search index, and event system. Changes are automatically propagated across the platform.

Automation

Leverage DataHub Actions to automate data contracts & slas workflows. Trigger actions on metadata changes, schedule periodic checks, and integrate with external systems.

Monitoring

Track usage and effectiveness through DataHub's analytics. Monitor adoption metrics, coverage, and compliance with organizational standards.

How It Works

Configuration
# Configure data contracts & slas via DataHub CLI
datahub put --urn "urn:li:dataset:(...)" \
  --aspect "datasetProperties" \
  -d '{"description": "Configured via CLI"}'

# Or via Python SDK
from datahub.emitter.rest_emitter import DatahubRestEmitter
emitter = DatahubRestEmitter("http://localhost:8080")

# Emit metadata for data contracts & slas
emitter.emit_mcp(
    entity_urn="urn:li:dataset:(...)",
    aspect_name="datasetProperties",
    aspect_value=DatasetPropertiesClass(
        description="Updated via SDK"
    )
)
Output
Successfully emitted metadata change proposal
  Entity: urn:li:dataset:(urn:li:dataPlatform:snowflake,analytics.revenue,PROD)
  Aspect: datasetProperties
  Status: 200 OK
Key Takeaway: Data contracts formalize the agreement between data producers and consumers: what schema is guaranteed, when data will be fresh, and what quality assertions are enforced. They shift data issues from "who broke it?" to "was the contract honored?"

Architecture Integration

When data contracts & slas metadata is updated, DataHub emits a Metadata Change Event (MCE) to Kafka. Downstream consumers update the search index (Elasticsearch) and graph index, ensuring all views stay consistent in near real-time.

Hands-On Tutorial

Step-by-Step Setup
# Step 1: Verify DataHub is running
curl -s http://localhost:8080/config | python3 -m json.tool

# Step 2: Configure data contracts & slas via GraphQL
curl -X POST http://localhost:8080/api/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "mutation { updateDataset(urn: \"urn:li:dataset:(...)\" input: {}) }"}'

# Step 3: Verify in the UI
# Navigate to http://localhost:9002 and check the entity page

Common Mistake

Wrong: Creating contracts for every dataset from day one

Why it fails: Contract overhead becomes unmanageable. Teams resist adoption because maintaining hundreds of contracts is impractical.

Instead: Start with contracts only for Tier-1 datasets (revenue, customer, product) that feed critical dashboards. Expand to Tier-2 as the process matures.

Deep Dive: Contract Components

A DataHub data contract has three pillars: Schema contract (required columns and types -- breaking changes violate the contract), Freshness contract (SLA like "updated by 8am UTC daily"), and Quality contract (assertions like "no null primary keys, revenue > 0"). DataHub evaluates contracts continuously and marks them as PASSING, FAILING, or UNKNOWN. Contract status is visible on dataset pages and can trigger Actions for alerting.

Key Takeaway: Pair data contracts with DataHub Actions to automatically notify owners via Slack when a contract is violated. This creates accountability and fast resolution loops.

Best Practices

Practice Problems

Practice 1

Design a data contracts & slas strategy for a data team with 500 datasets across 8 databases. What do you prioritize? How do you measure success?

Practice 2

A new data engineer joins your team and needs to understand data contracts & slas in DataHub. Create a 30-minute onboarding guide covering the essentials.

Practice 3

Your organization's data contracts & slas adoption is at 30% after 3 months. Identify potential blockers and design an adoption acceleration plan.

Quick Reference

FeatureAccessNotes
UI ConfigurationSettings → Data Contracts & SLAsPoint-and-click setup
GraphQL APIPOST /api/graphqlProgrammatic access
Python SDKpip install acryl-datahubHigh-level client
CLIdatahub put / datahub getCommand-line operations
ActionsEvent-driven triggersAutomation framework