Introduction

Foundational is a unified data management platform that shifts the focus from reactive troubleshooting to proactive incident prevention.

By leveraging proprietary deterministic source code analysis and end-to-end lineage, Foundational is engineered to identify and prevent data quality, governance, and cost issues before problematic code is ever deployed into production.

High-level flow

Foundational establishes continuous governance by diligently scanning and detecting all changes across your data ecosystem, covering both code and non-code assets.

The core flow is:

Change detection: Foundational is triggered by code changes (like a pending Pull Request in GitHub/GitLab ) or by scheduled scans of non-code systems (like a BI tool or data warehouse).
Lineage extraction: Using deterministic source code analysis and connectors, Foundational extracts and updates the real-time, end-to-end lineage graph across your entire data stack, from upstream operational databases to downstream dashboards.
Proactive validation (Foundational CI): Foundational performs an impact analysis on the proposed change. It checks for potential issues, including breaking schema changes , semantic bugs , and violations of data quality or contracts.
Enforcement and notification: Foundational acts as an intelligent gatekeeper by surfacing the check results natively within customer workflows. This is typically done by showing the “Foundational Data Issue Analysis” check directly in the Pull Request interface , which can be configured to block merges of faulty code

Architecture components

Foundational’s architecture is composed of interconnected systems that work together to maintain real-time continuous governance and output.

Triggers and events

The Foundational workflow is initiated by changes across the data ecosystem.

Code Triggers: Code changes, typically a pending Pull Request/Merge Request in a source control system (GitHub, GitLab, etc.), trigger scans via Web Hooks that notify Foundational of the change.
System Scans: For non-code systems like BI tools and data warehouses, Foundational uses time-scheduled scans to gather metadata, query logs, and schema information, ensuring the lineage graph is always up-to-date.

Lineage graph and merger (end-to-end lineage)

The Lineage Graph is the single source of truth for all data relationships and dependencies, spanning code repositories (dbt, Spark, ORMs), warehouses/databases (Snowflake, BigQuery), and BI/consumption platforms (Tableau, Looker).

Code Analysis and Extraction: Framework-specific Lineage Engines process data-related code. For dynamic languages like PySpark, the Foundational Code Engine uses dynamic analysis to simulate execution in a sandbox and construct precise column-level lineage.
System Integration and Ingestion: Foundational connects to every system in the customer’s data stack to ingest comprehensive information, including metadata, query logs, usage, cost, and ownership details.
Graph Builder: The Lineage Graph Builder merges relations extracted from code with metadata and operational details ingested from non-code systems to construct the Final Lineage Graph.
Time Series: The lineage is maintained as a versioned lineage graph, operating as a time series.

Security and data management

Foundational operates on a security-first principle, focusing on metadata integrity and strict access controls.

Metadata focus: Foundational is primarily a metadata platform. By default, it does not require access to customer data and is therefore not considered a data processor. The platform accesses and stores only metadata about the data stack, such as schemas, table names, descriptions, and dashboard names.
Conditional data access: Foundational only requires and accesses customer data if, and only if, the customer chooses to activate data observability checks that require validating the underlying data (e.g., null counts, or distinct values). Foundational remains fully functional for governance, lineage, and CI checks without any data access.
Compliance and authentication: Foundational is SOC 2 Type II compliant. It supports authentication best practices, including SSO (Single Sign-On) and SAML configuration.

Deployment model

Customers interact with Foundational through multiple secure channels:

Foundational UI: The primary web SaaS interface
.
Public API: Provides programmatic access to accurate, real-time data lineage for advanced workflows.
MCP (Model Context Protocol): Foundational offers an MCP Remote Server that provides AI tools with rich context (lineage, usage, ownership) about the data stack.

Scalability

Foundational scales effectively in large enterprise environments through a modular, service-oriented architecture designed for flexibility and resilience. Its microservices-based structure allows independent scaling of components to handle variable workloads efficiently.

The platform supports multi-region and multi-cloud deployments, ensuring high availability, low latency, and robust disaster recovery. Load balancing and auto-scaling dynamically adjust resources to maintain consistent performance during peak operations.

OpenLineage Support in Foundational

Explore Data Lineage

Extracting Lineage from Python

Extracting Lineage from Scala

Extracting Lineage from Kafka

Solution architecture