Skip to main content
All CollectionsLineage
OpenLineage Support in Foundational
OpenLineage Support in Foundational

How you can easily ingest OpenLineage data into Foundational, as well as export Foundational lineage into catalogs that support OpenLineage

Updated over a month ago

OpenLineage support is currently in Preview, available for selected customers. If you’re interested in using this feature, please contact the Foundational Support team at support@foundational.io

What is OpenLineage?

OpenLineage is an open standard for collecting, analyzing, and sharing data lineage information across tools and systems in the modern data ecosystem. It provides a structured framework to track the flow of data using runtime information emitted from pipelines. By establishing a common language for lineage, OpenLineage enables interoperability between tools like Apache Airflow, Spark, dbt, and more, creating a unified view of data dependencies.

Benefits of OpenLineage

By standardizing how lineage is captured and shared, OpenLineage facilitates seamless interoperability between tools, breaking down silos and fostering collaboration. If you’ve already enabled OpenLineage for your pipelines, you can leverage its emitted information across a wide range of ecosystem tools, including Foundational.

This approach ensures customers avoid vendor lock-in and enables them to invest in a system that remains forward-compatible with the evolving data ecosystem.

Foundational Support for OpenLineage

Foundational is committed to supporting open standards to help customers avoid vendor lock-in. In alignment with our support for open data contracts, Foundational fully supports OpenLineage. This includes the ability to:

  1. Ingest runtime OpenLineage information for customers who already emit OpenLineage data from their running pipelines.

  2. Export Foundational code-time lineage—generated through static analysis of pipeline code—in OpenLineage format, making it compatible with other tools (e.g., your favorite data catalog).

By integrating OpenLineage, Foundational enhances its capabilities to provide a comprehensive view of data lineage, spanning both runtime and code-based workflows. This empowers data teams to maintain high data quality, perform impact analyses with precision, and confidently manage changes within the modern data stack.

Combining Code-Time Lineage with Runtime Lineage Using OpenLineage

OpenLineage’s runtime lineage capabilities have revolutionized how organizations track data dependencies during pipeline execution, offering real-time visibility into transformations and data flows. However, runtime lineage alone cannot account for all scenarios, such as rarely executed pipelines, ad hoc processes, or code paths triggered only under specific conditions. This is where code-time lineage becomes essential. By extracting lineage directly from code, organizations can capture potential data flows—what could happen if the code is executed—addressing gaps left by runtime lineage.

Combining code-time lineage with runtime lineage in OpenLineage format provides organizations with a comprehensive view of their data ecosystem. Runtime lineage tracks actual execution, while code-time lineage reveals all possible transformations encoded in the system, even those that occur infrequently. Together, these approaches ensure full coverage of active and dormant dependencies, reducing blind spots and mitigating the risk of unexpected data issues.

OpenLineage’s support for static lineage, introduced in 2023, serves as the foundation for this integration. Using the Job object to represent code locations and facets like SourceCodeLocationJobFacet for metadata, OpenLineage seamlessly models both runtime and code-based lineage. This unified framework allows data teams to analyze lineage holistically, blending execution context with design-time insights to enhance governance, improve operational efficiency, and meet compliance requirements.

By leveraging the combined power of code-time and runtime lineage, data teams can confidently manage changes, troubleshoot issues, and optimize pipelines, creating a robust and future-proof data infrastructure.

Did this answer your question?