Before accessing the Lineage API, ensure you have an API token. If you haven’t already created one, follow the steps in 'Creating API token' to generate your token.
Overview
The Foundational Lineage API provides programmatic access to accurate, real-time data lineage information. This allows you to easily trace data flow, locate specific entities, and understand upstream and downstream dependencies.
Example Use Cases
Integrate with CI/CD
Trigger targeted tests automatically when a Pull Request modifies relevant dashboards, ensuring data integrity with every code change.
PII Labeling Automation
Streamline compliance by automatically propagating PII labels to all downstream entities derived from a PII column.
Automated Access Control
Implement dynamic access control by automatically applying permissions to tables based on the access rules of their upstream data sources.
Risk Prioritization by Downstream Impact
Proactively manage risk by automatically generating scores for each table, considering the impact on downstream entities like tables and dashboards.
Foundational's Policies page further enhances automation, enabling you to define rules such as automatically assigning reviewers for table modifications or sending Slack notifications when specific downstream dashboards are affected.
Concepts
Entity
An entity represents a data object within your data stack. This can be a data-containing element (e.g., column, table, S3 file, GCS file) or a data-processing element (e.g., Tableau custom query, PowerBI dashboard, job, executed query). Entities are hierarchical, reflecting the structure of your data ecosystem. For example, a column belongs to a table, which resides within a schema and database. This hierarchy facilitates intuitive entity location and referencing (e.g., "find all tables under the prod
database").
Lineage Graph
The lineage graph is a directed graph where nodes represent entities and edges represent relationships between them. Edges have a direction (e.g., src
-> dest
indicates data flow from src
to dest
).
Graph Edges
Two primary edge types exist:
Lineage edges
Represent the flow or copying of data between entities (e.g.,src
->dest
signifies data flows fromsrc
todest
).Usage edges
Represent a relationship where one entity uses data from another without direct data copying (e.g., anORDER BY
clause).
For instance, let's take the following SQL query:
CREATE dest_table AS (
SELECT src_col
FROM src_table
ORDER by order_col)
This query creates the following edges:
Lineage edge:
src_col
->dest_col
Usage edge:
order_col
->dest_table
Usage edges capture relationships where data is used but not copied. In our example, order_col
influences the creation of dest_table
but its data isn't directly transferred. Notably, edges in the lineage graph can connect entities of different types, such as a column (order_col
) and a table (dest_table
).
Upstream/Downstream
These terms describe data dependencies:
Upstream: The upstream of an entity includes all entities with a path to that entity (i.e., its data sources).
Downstream: The downstream of an entity encompasses all entities with a path from that entity (i.e., its data consumers).
Upstream and downstream paths can be direct or indirect, spanning multiple intermediate entities. For example, in the lineage col1 -> col2 -> col3 -> col4
, the upstream of col4
includes col1
, col2
, and col3
.
Graph Versions
Foundational maintains a versioned lineage graph, creating a time series of graph states. A new graph version is generated whenever code changes are made to data pipelines or when a system scan (e.g., Snowflake, Postgres, Presto) is performed. This ensures that each code version or timestamp has a corresponding lineage graph version.
Tutorial
To help you get started, see our sample Python script for querying our API here: https://github.com/foundational-io/lineage-api-examples
API Documentation
The Foundational API is available and documented using OpenAPI, allowing you to explore and test it directly in your browser.
To explore the complete documentation and live-test each API endpoint, please visit the OpenAPI page for the Lineage API.