Skip to main content

Get started with the Lineage API

Updated yesterday

Introduction

The Foundational Lineage API provides programmatic access to real-time data lineage information.

Use this API to:

  • trace data flows

  • search and view details of specific entities

  • understand upstream and downstream dependencies across your data ecosystem.


Prerequisites

You need a Foundational API token before using the Lineage API. If you haven’t created one, check out the article Create API Tokens.


Typical use cases

The table below highlights common data lineage scenarios.

Use Case

When to use

Automate PII Labeling

Propagate PII labels automatically to all downstream entities derived from a PII column.

Apply Automated Access Control

Apply permissions dynamically to tables based on the access rules of their upstream data sources.

Prioritize Risks by Downstream Impact

Identify downstream impact, and use it to prioritize risks automatically

Use the Workflows and Alerts page to:

  • Extend automation. For example, you can:
    Assign reviewers automatically for table modifications.

  • Send alerts / notifications to Slack, Microsoft Teams or other tools when specific downstream dashboards are affected.


Concepts

Use the following concepts to understand how the Lineage API organizes and represents your data stack.

Entity

An entity represents a data object within your data stack.

It can be either:

  • A data-containing element (for example, column, table, S3 file, GCS file), or

  • A data-processing element (for example, Tableau query, Power BI dashboard, job, or executed query).

Entities are hierarchical. For example, in a data warehouse:

  • A column belongs to a table.

  • A table resides in a schema.

  • A schema belongs to a database.

This hierarchy helps you query entities easily—for example:

You can find all tables under the proddatabase.

Lineage graph

The lineage graph is a directed graph where:

  • Nodes represent entities.

  • Edges represent relationships between them…

Graph edges

Edges have a direction (e.g., src -> dest indicates data flow from src to dest).

There are two primary edge types:

  • Lineage edges: Represent the flow or copying of data between entities (e.g., src -> dest signifies data flows from src to dest).

  • Usage edges: Represent a relationship where one entity uses data from another without direct data copying (e.g., an ORDER BY clause).

For instance, let’s take the following SQL query:

CREATE dest_table AS ( 
SELECT src_col
FROM src_table
ORDER by order_col
)

This query creates the following edges:

  • Lineage edge: src_col -> dest_col

  • Usage edge: order_col -> dest_table

Usage edges capture relationships where data is used but not copied. In our example, order_col influences the creation of dest_table but its data isn’t directly transferred. Notably, edges in the lineage graph can connect entities of different types, such as a column (order_col) and a table (dest_table).

Upstream and downstream

Upstream and downstream paths can be direct or indirect, passing through multiple intermediate entities. These terms describe data dependencies:

Term

Definition

Upstream

All entities that supply data to the current entity (its data sources).

Downstream

All entities that consume data from the current entity (its data consumers).

Example:

In the lineage col1 → col2 → col3 → col4, the upstream of col4 includes col1, col2, and col3.

Graph versions

Foundational maintains versioned lineage graphs to track changes over time.

  • Foundational generates a new version of the graph whenever a change is made across the data stack, either in code or in data systems like: Tableau, Snowflake, etc.

  • Each version captures the lineage graph as it existed at that point in time.


Tutorial

To help you get started, see our sample Python project querying our API. Go to https://github.com/foundational-io/lineage-api-examples.


API documentation

The Foundational Lineage API is documented with OpenAPI. You can explore the full specification and test each endpoint directly in your browser.

Go to OpenAPI page for the Lineage API to view and try the live documentation.

Did this answer your question?