Overview
Data Contracts empower teams to define and enforce data quality expectations directly within their source code. A data contract is a YAML file (.fd.yml or .fd.yaml) that lives alongside your project’s code and declaratively specifies the rules, ownership, and quality standards for your data assets.
Distributed Ownership and Scalability
By enabling every team to define contracts within their own repository, Foundational eliminates the need for a single, central vendor-specific system for data governance. This distributed approach offers several key advantages:
Proximity to Code: Defining contracts near the relevant code makes them easier to maintain and ensures they evolve alongside your data products.
Scalable Quality: Organizations can scale data quality across many repositories and teams without the bottleneck of a centralized platform.
Open Format & No Vendor Lock-in: Foundational uses an open-format YAML, ensuring your data definitions remain portable and preventing reliance on proprietary vendor systems.
Automated Monitoring and Alerts
When Foundational scans repositories configured with data contract files, it discovers and validates these contracts to enforce your defined monitors and policies. This ensures that any changes impacting your data stack—whether in data engineering or production code—do not compromise data quality.
If a contract rule is violated, the configured contacts are immediately notified through email, Slack, Microsoft Teams, etc.
Capabilities
Capability | Description |
Ownership & accountability | Define a data owner, steward, support contact, and downstream consumers for every dataset. |
Freshness / timeliness monitoring | Set maximum acceptable latency for a table and get alerted when data goes stale — tracked by write events, row-count changes, or both. |
Schema-level field definitions | Document every column's name, type, nullability, description, sensitivity flag, and example values. |
Field constraints | Enforce regex patterns, allowed-value lists, and min/max value ranges on individual columns. |
Quality thresholds | Set target completeness, uniqueness, and validity percentages per column. |
Advanced rate metrics | Monitor zero rate, true rate, null rate, and distinct-values rate with configurable min/max bounds (0–100%). |
Advanced value metrics | Set bounds on min, max, count, sum, average, and row count for any table. |
Custom SQL monitors | Run arbitrary SQL queries on a schedule and monitor the result columns for violations. |
Incremental monitoring | Optionally scan only new or changed rows based on an incrementing column (timestamp, date, or integer). |
Flexible scheduling | Choose an interval (minimum 1 hour) or a cron expression for monitoring frequency. |
Automated alerting | Contacts marked with receive_alerts: true are automatically notified via email, Slack, or Microsoft Teams when violations occur. |
Version control | Contracts are versioned and stored in git, giving you a full audit trail of every change. |
Example use-cases
Guarantee SLA freshness for a critical table. Define a timeliness rule that alerts the data engineering team within 1 hour if the orders table hasn't been updated - but only when downstream consumers are actively reading the data (only_if_read: true).
Prevent bad data from reaching production. For example, if a corrupted ETL process results in an unexpected spike in NULL values for a critical field like
customer_id, Foundational flags the issue immediately, allowing you to address the data integrity at the source.Monitor referential integrity across tables. Write a custom SQL monitor that checks for orphaned child records (e.g., order_line_items referencing an order_id that no longer exists in orders) and get alerted when the count rises above zero.
Track data completeness over time. Set a completeness threshold of 98% on a customer_email column and monitor the null rate to catch ETL regressions before they impact downstream reports.
Establish data ownership. Assign a data owner, data steward, and support contact to every dataset so that when something breaks, the right person is notified on the right channel.
Monitor high-value numeric columns. Use value metrics to set bounds on a total_amount column's sum, average, min, and max so that anomalous financial data is surfaced early.
Data Contracts vs. Traditional Observability
While traditional data observability tools focus on monitoring data at rest via a UI, Foundational’s Data Contracts shift this logic into your development workflow.
Feature | Traditional Observability Tools | Foundational Data Contracts |
Rule Definition | Defined in a non-scalable, vendor-specific UI. | Defined in code (YAML) within the relevant repository. |
Vendor Lock-in | High; rules are trapped in a proprietary platform. | None; open-format YAML belongs to your codebase. |
Management | Centralized; every team must use one platform. | Decentralized; teams manage rules in their own Git repos. |
Transparency | Changes often lack clear audit trails. | Full transparency; every change is tracked and reviewed in Git. |
Getting Started with Data Contracts
To begin enforcing data standards in Foundational, follow these steps to move from definition to automated enforcement:
Step 1: Define your contract
Outline your data validations, invariant rules, and governance policies in a YAML file.
Extension: Use the
.fd.ymlor.fd.yamlextension.Location: Place the file anywhere in a repository Foundational scans for contracts —ideally near the code it describes.
Pro Tip: You can leverage AI tools to generate these YAML definitions at scale based on your existing table schemas.
Step 2: Create and Merge
Create a Pull Request in your Git repository containing the new contract file.
Discovery: Foundational automatically scans your repositories and recursively discovers these files.
Transparency: Because contracts are code, every change is tracked, reviewed, and versioned in Git, providing a full audit trail.
Step 3: Connect Notifications
To ensure the right people are alerted when a contract is violated, finalize your communication settings.
Alert Channels: Connect Slack or Microsoft Teams in your Foundational settings.
Receiver Logic: Ensure the
data_owneror relevant contacts havereceive_alerts: trueset in the YAML file.
Data Contract YAML Format
Contract metadata (required)
# Unique ID across all contract files in the repo
contract_id: "order_events_v1"
# Version string (for your own tracking)
version: "1.0.0"
# Active | Inactive | Draft | Deprecated | Retired
status: "Active"
description: "Contract for the order events dataset."
Note: Only contracts with status Active are enforced. Use Draft while iterating, Deprecated or Retired to phase out old contracts, and Inactive to temporarily disable enforcement.
Ownership & contacts (data_owner required)
data_owner: # Required
name: "Jon Smith"
email: "jon@example.com"
team: "Data Engineering"
receive_alerts: true # Will be notified on violations
data_steward: # Optional
name: "Jane Doe"
email: "jane@example.com"
team: "Data Governance"
receive_alerts: true
support_contact: # Optional
team: "BI Support"
email: "bi-support@example.com"
slack_channel: "#data-support" # Slack channel for alerts
teams_channel: "Data Alerts" # Microsoft Teams channel for alerts
receive_alerts: true
consumers: # Optional — downstream teams
- team: "Sales Analytics"
email: "sales@example.com"
- team: "Operations"
email: "ops@example.com"
receive_alerts: true
Any contact with receive_alerts: true will be notified through the configured channels (email, Slack, and/or Teams) when a violation occurs.
Business context (optional)
business_domain: "Order Management"
business_description: |
This dataset contains one record per order event.
It is used by downstream analytics for dashboards and billing.
tags:
- "order"
- "high_priority"
- "PII"
Table definitions (required)
Each contract must define at least one table:
tables:
- description: "Master list of orders."
data_source:
# Snowflake | BigQuery | Postgres | MySQL | Oracle | MongoDB
# | GlueCatalog | S3
type: "Snowflake"
database: "prod_db"
schema: "public"
table_name: "orders"
Timeliness (optional)
Monitor data freshness and get alerted when data is stale:
timeliness:
frequency: "hourly" # hourly | daily | weekly | monthly
max_latency: "1 hour" # Supported: "X hour(s)" or "X day(s)"
only_if_read: true # Only alert if stale data was actually read
timeliness_by_writes: true # Track freshness by write events
timeliness_by_row_count: false # Track freshness by row count changes
timeliness_alert:
description: "Orders table is stale"
severity: "HIGH" # LOW | MEDIUM | HIGH | CRITICAL
Partitioning (optional)
Document the table's partitioning strategy:
partitioning:
strategy: "ByDate" # ByDate | ByHash
columns:
- name: "order_created_at"
type: "date"
description: "Partition on order date"
frequency: "daily" # hourly | daily | weekly | monthly
format: "yyyy-MM-dd"
Scheduling (optional)
Control how often monitors run. Defaults to every 1 hour if omitted.
# Interval-based (minimum 1 hour):
schedule:
schedule_type: "interval"
frequency_hours: 1
# Or cron-based (must fire on rounded hours, i.e., minute = 0):
schedule:
schedule_type: "cron"
cron_expression: "0 */6 * * *" # Every 6 hours
Incremental monitoring (optional)
Monitor only new/changed data for efficiency on large tables:
incremental_monitoring:
increment_by_column_name: "updated_at"
column_type: "timestamp" # timestamp | date | int | bigint
column_format: "%Y-%m-%d %H:%M:%S" # Optional format string
Field definitions (optional)
Define column-level schema, constraints, and quality thresholds:
fields:
- name: "order_id"
type: "string"
nullable: false
description: "Globally unique order identifier."
example: "ORD-20250602-12345"
sensitive: false # Mark PII/sensitive columns as true
constraints:
pattern: "^ORD-[0-9]{8}-[0-9]+$" # Regex pattern
quality_thresholds:
completeness: 100.0 # % of non-null values (0–100)
uniqueness: 100.0 # % of distinct values (0–100)
- name: "order_status"
type: "string"
nullable: false
constraints:
allowed_values: # Enumerated allowed values
- "PENDING"
- "CONFIRMED"
- "SHIPPED"
- "DELIVERED"
- "CANCELLED"
quality_thresholds:
validity: 100.0 # % of valid values (0–100)
- name: "total_amount"
type: "decimal(10,2)"
nullable: false
constraints:
value_range:
min: 0.0
max: 100000.0
Advanced quality metrics (optional)
For finer-grained control, use rate_metrics and value_metrics inside quality_thresholds:
quality_thresholds:
rate_metrics: # All rate values are percentages
zero_rate:
max: 5.0 # At most 5% zeros
null_rate:
max: 2.0 # At most 2% nulls
true_rate:
min: 0.0
max: 100.0
distinct_values_rate:
min: 95.0 # At least 95% distinct values
value_metrics: # Bounds on aggregate statistics
min:
min: 0.0 # Column minimum must be >= 0
max: 100.0 # Column minimum must be <= 100
max:
min: 0.0
max: 10000.0
count:
min: 1.0
sum:
min: 0.0
avg:
min: 0.0
max: 500.0
Note: For each field, use either the simple thresholds (completeness, uniqueness, value_range) or the advanced metrics (rate_metrics, value_metrics), not both.
Custom SQL monitors (optional)
Run arbitrary SQL queries and monitor the output columns:
custom_monitors:
- name: "referential_integrity_check"
description: "Verify every line item references a valid order."
data_source:
type: "Snowflake"
database: "prod_db"
sql_statement: >
SELECT count(*) AS bad_references
FROM order_line_items
LEFT JOIN orders ON order_line_items.order_id = orders.id
WHERE orders.id IS NULL;
fields:
- name: "bad_references"
type: "integer"
description: "Count of orphaned line items."
constraints:
value_range:
max: 0.0 # Should always be zero
Custom monitors also support schedule and incremental_monitoring with the same syntax as table definitions.
Full example
Below is a complete .fd.yml file demonstrating the major features:
contract_id: "order_events_v1.0"
version: "1.0.0"
status: "Active"
description: >
Data contract for the Order Events topic. Contains real-time events
emitted whenever an order is created, updated, or cancelled.
data_owner:
name: "Jon Smith"
email: "jon@example.com"
team: "Data Engineering"
receive_alerts: true
data_steward:
name: "Jane Doe"
email: "jane@example.com"
team: "Data Governance"
receive_alerts: true
support_contact:
team: "BI Support"
email: "bi-support@example.com"
slack_channel: "#order-events-support"
receive_alerts: true
consumers:
- team: "Sales Analytics"
email: "sales-analytics@example.com"
- team: "Operations Dashboard"
email: "ops-dashboard@example.com"
receive_alerts: true
business_domain: "Order Management"
business_description: |
This dataset contains one record per order event:
- Creation
- Status updates (e.g., PENDING → SHIPPED → DELIVERED)
- Cancellation
tags:
- "order"
- "event_stream"
- "high_priority"
tables:
- description: "Master list of orders and their current status."
data_source:
type: "Snowflake"
database: "prod_analytics_db"
schema: "public"
table_name: "orders"
timeliness:
frequency: "hourly"
max_latency: "1 hour"
only_if_read: true
timeliness_by_writes: true
timeliness_by_row_count: false
timeliness_alert:
description: "Alert if orders table is stale"
severity: "HIGH"
fields:
- name: "order_id"
type: "string"
nullable: false
description: "Globally unique identifier for the order."
example: "ORD-20250602-12345"
constraints:
pattern: "^ORD-[0-9]{8}-[0-9]+$"
quality_thresholds:
completeness: 100.0
uniqueness: 100.0
- name: "customer_id"
type: "string"
nullable: false
description: "Unique identifier for the customer."
sensitive: true
constraints:
pattern: "^CUST-[0-9]+$"
quality_thresholds:
completeness: 100.0
- name: "order_status"
type: "string"
nullable: false
description: "Current status of the order."
constraints:
allowed_values:
- "PENDING"
- "CONFIRMED"
- "SHIPPED"
- "DELIVERED"
- "CANCELLED"
quality_thresholds:
validity: 100.0
- name: "total_amount"
type: "decimal(10,2)"
nullable: false
description: "Total monetary amount for the order (USD)."
constraints:
value_range:
min: 0.0
max: 100000.0
custom_monitors:
- name: "referential_integrity_check"
description: >
Verify that every line item references a valid order.
data_source:
type: "Snowflake"
database: "prod_analytics_db"
sql_statement: >
SELECT count(*) AS bad_references
FROM order_line_items
LEFT JOIN orders ON order_line_items.order_id = orders.id
WHERE orders.id IS NULL;
fields:
- name: "bad_references"
type: "integer"
description: "Count of orphaned line items — should always be zero."
constraints:
value_range:
max: 0.0
Need help?
For any questions, feedback, or issues with data contracts, reach out to us at support@foundational.io.
