Skip to main content
All CollectionsAPI
Getting Started with the Lineage API
Getting Started with the Lineage API
Updated over a week ago

Before accessing the Lineage API, ensure you have an API token. If you haven’t already created one, follow the steps in 'Creating API token' to generate your token.

Overview

The Foundational Lineage API provides programmatic access to accurate, real-time data lineage information. This allows you to easily trace data flow, locate specific entities, and understand upstream and downstream dependencies.

Example Use Cases

  • Integrate with CI/CD
    Trigger targeted tests automatically when a Pull Request modifies relevant dashboards, ensuring data integrity with every code change.

  • PII Labeling Automation
    Streamline compliance by automatically propagating PII labels to all downstream entities derived from a PII column.

  • Automated Access Control
    Implement dynamic access control by automatically applying permissions to tables based on the access rules of their upstream data sources.

  • Risk Prioritization by Downstream Impact
    Proactively manage risk by automatically generating scores for each table, considering the impact on downstream entities like tables and dashboards.

Foundational's Policies page further enhances automation, enabling you to define rules such as automatically assigning reviewers for table modifications or sending Slack notifications when specific downstream dashboards are affected.

Concepts

Entity

An entity represents a data object within your data stack. This can be a data-containing element (e.g., column, table, S3 file, GCS file) or a data-processing element (e.g., Tableau custom query, PowerBI dashboard, job, executed query). Entities are hierarchical, reflecting the structure of your data ecosystem. For example, a column belongs to a table, which resides within a schema and database. This hierarchy facilitates intuitive entity location and referencing (e.g., "find all tables under the prod database").

Lineage Graph

The lineage graph is a directed graph where nodes represent entities and edges represent relationships between them. Edges have a direction (e.g., src -> dest indicates data flow from src to dest).

Graph Edges

Two primary edge types exist:

  1. Lineage edges
    Represent the flow or copying of data between entities (e.g., src -> dest signifies data flows from src to dest).

  2. Usage edges
    Represent a relationship where one entity uses data from another without direct data copying (e.g., an ORDER BY clause).

For instance, let's take the following SQL query:

CREATE dest_table AS (
SELECT src_col
FROM src_table
ORDER by order_col)

This query creates the following edges:

  • Lineage edge: src_col -> dest_col

  • Usage edge: order_col -> dest_table

Usage edges capture relationships where data is used but not copied. In our example, order_col influences the creation of dest_table but its data isn't directly transferred. Notably, edges in the lineage graph can connect entities of different types, such as a column (order_col) and a table (dest_table).

Upstream/Downstream

These terms describe data dependencies:

  • Upstream: The upstream of an entity includes all entities with a path to that entity (i.e., its data sources).

  • Downstream: The downstream of an entity encompasses all entities with a path from that entity (i.e., its data consumers).

Upstream and downstream paths can be direct or indirect, spanning multiple intermediate entities. For example, in the lineage col1 -> col2 -> col3 -> col4, the upstream of col4 includes col1, col2, and col3.

Graph Versions

Foundational maintains a versioned lineage graph, creating a time series of graph states. A new graph version is generated whenever code changes are made to data pipelines or when a system scan (e.g., Snowflake, Postgres, Presto) is performed. This ensures that each code version or timestamp has a corresponding lineage graph version.

Tutorial

To help you get started, see our sample Python script for querying our API here: https://github.com/foundational-io/lineage-api-examples

API Documentation

The Foundational API is available and documented using OpenAPI, allowing you to explore and test it directly in your browser.

To explore the complete documentation and live-test each API endpoint, please visit the OpenAPI page for the Lineage API.

Did this answer your question?