# Introduction

Foundational is a **unified data management platform** that shifts the focus from reactive troubleshooting to **proactive incident prevention**.

By leveraging proprietary deterministic source code analysis and end-to-end lineage, Foundational is engineered to identify and prevent data quality, governance, and cost issues **before** problematic code is ever deployed into production.

# High-level flow

Foundational establishes continuous governance by diligently scanning and detecting all changes across your data ecosystem, covering both code and non-code assets.

The core flow is:

1. **Change detection:** Foundational is triggered by **code changes** (like a pending Pull Request in GitHub/GitLab ) or by scheduled scans of non-code systems (like a BI tool or data warehouse).
​

2. **Lineage extraction:** Using **deterministic source code analysis** and connectors, Foundational extracts and updates the **real-time, end-to-end lineage graph** across your entire data stack, from upstream operational databases to downstream dashboards.
​

3. **Proactive validation (Foundational CI):** Foundational performs an **impact analysis** on the proposed change. It checks for potential issues, including breaking schema changes , semantic bugs , and violations of data quality or contracts.
​

4. **Enforcement and notification:** Foundational acts as an **intelligent gatekeeper** by surfacing the check results **natively within customer workflows**. This is typically done by showing the **“Foundational Data Issue Analysis”** check directly in the Pull Request interface , which can be configured to **block merges** of faulty code

# Architecture components

Foundational’s architecture is composed of interconnected systems that work together to maintain real-time continuous governance and output.

![](https://downloads.intercomcdn.com/i/o/pbbyfcys/1822726969/0ef1fb216041b86930819c565ff7/solution+architecture.png?expires=1781782200&amp;signature=a5d24cb006ec028252c77279dc5d82490fa3c43e13f3b13502e887396fae5764&amp;req=dSglFM58m4hZUPMW1HO4zV1piaJI58ywAThsQu%2FTJIDZ1FAyTj3m%2BxlgNL%2B9%0ADVbJALr018RKgrHzb2k%3D%0A)

## Triggers and events

The Foundational workflow is initiated by changes across the data ecosystem.

- **Code Triggers:** Code changes, typically a pending Pull Request/Merge Request in a source control system (GitHub, GitLab, etc.), trigger scans via **Web Hooks** that notify Foundational of the change.
​

- **System Scans:** For non-code systems like BI tools and data warehouses, Foundational uses **time-scheduled scans** to gather metadata, query logs, and schema information, ensuring the lineage graph is always up-to-date.

## Lineage graph and merger (end-to-end lineage)

The Lineage Graph is the single source of truth for all data relationships and dependencies, spanning **code repositories** (dbt, Spark, ORMs), **warehouses/databases** (Snowflake, BigQuery), and **BI/consumption platforms** (Tableau, Looker).

- **Code Analysis and Extraction:** Framework-specific **Lineage Engines** process data-related code. For dynamic languages like PySpark, the **Foundational Code Engine** uses **dynamic analysis** to simulate execution in a sandbox and construct precise column-level lineage.
​

- **System Integration and Ingestion:** Foundational connects to every system in the customer’s data stack to ingest comprehensive information, including **metadata**, **query logs**, **usage**, **cost**, and **ownership** details.
​

- **Graph Builder:** The **Lineage Graph Builder** merges relations extracted from code with **metadata** and operational details ingested from non-code systems to construct the **Final Lineage Graph**.
​

- **Time Series:** The lineage is maintained as a **versioned lineage graph**, operating as a **time series**.

![](https://downloads.intercomcdn.com/i/o/pbbyfcys/1822729172/3f84d0efff4599ae5df39e0448b5/lineage+graph+manager.png?expires=1781782200&amp;signature=96fd1e0291f339609186d237defdc9fabf9130da77e97080adab7f44954992be&amp;req=dSglFM58lIBYW%2FMW1HO4zWaJAyDF0rYsiyHzGWsjd%2F35xOsUW1GC1hyjcR20%0AfKPwvPcSBc%2Bki7SeAjk%3D%0A)

## Security and data management

Foundational operates on a **security-first** principle, focusing on metadata integrity and strict access controls.

- **Metadata focus:** Foundational is primarily a metadata platform. By **default**, it **does not require access to customer data** and is therefore not considered a data processor. The platform accesses and stores only metadata about the data stack, such as schemas, table names, descriptions, and dashboard names.
​

- **Conditional data access:** Foundational only requires and accesses customer data if, and only if, the customer chooses to activate **data observability checks** that require validating the underlying data (e.g., null counts, or distinct values). Foundational remains fully functional for governance, lineage, and CI checks without any data access.
​

- **Compliance and authentication:** Foundational is **SOC 2 Type II compliant**. It supports authentication best practices, including **SSO (Single Sign-On)** and **SAML** configuration.

# Deployment model

Customers interact with Foundational through multiple secure channels:

- **Foundational UI:** The primary web SaaS interface
.

- **Public API:** Provides programmatic access to accurate, real-time data lineage for advanced workflows.
​

- **MCP (Model Context Protocol):** Foundational offers an MCP Remote Server that provides AI tools with rich context (lineage, usage, ownership) about the data stack.

# **Scalability**

Foundational scales effectively in large enterprise environments through a modular, service-oriented architecture designed for flexibility and resilience. Its microservices-based structure allows independent scaling of components to handle variable workloads efficiently.

The platform supports multi-region and multi-cloud deployments, ensuring high availability, low latency, and robust disaster recovery. Load balancing and auto-scaling dynamically adjust resources to maintain consistent performance during peak operations.