Introduction
SQLAlchemy is an Object-Relational Mapping (ORM) library for Python. It allows developers to interact with databases using Python objects instead of raw SQL. SQLAlchemy provides a high-level ORM that maps Python classes to database tables, enabling developers to perform CRUD operations in a Pythonic way.
Why this framework matters in your data stack
Engineering teams often use SQLAlchemy to manage operational and production databases, especially in OLTP environments. These SQLAlchemy-defined tables are commonly ingested into cloud data warehouses such as BigQuery and Snowflake through ETL tools like Fivetran and AirByte.
Once in the warehouse, transformation tools such as dbt and Google Dataform reshape the data. BI tools such as Tableau and Looker consume the transformed datasets. Downstream consumers also include data science teams, machine learning models, reverse ETL pipelines, and other analytics or operational systems.
Any change to an upstream SQLAlchemy model can impact the full data lifecycle. Renaming or removing a column may break dashboards, downstream transformations, machine learning features, or reverse ETL syncs. Teams depending on downstream assets should be notified when SQLAlchemy changes are planned. This proactive notification helps maintain data quality and prevents data incidents.
How Foundational analyzes this framework
Foundational automates SQLAlchemy schema and lineage extraction by examining SQLAlchemy code. It does not rely on deployed database schemas. This gives teams visibility into SQLAlchemy schema changes during development or while changes are still in a pending Pull Request.
Early visibility allows teams to review the potential downstream impact before deployment and reduce the risk of breaking downstream analytics and machine learning workflows.
Foundational’s process to extract schema & lineage
The process includes these steps:
Step 1: Identify relevant files
Foundational scans accessible repositories to locate SQLAlchemy definition files. It uses heuristics such as detecting Python classes that inherit from:
sqlalchemy.ext.declarative.declarative_base()
Step 2: Create a safe analysis environment
Foundational uses dynamic mocks in Python to create a sandbox. This environment loads SQLAlchemy definitions safely and ensures that imports and non-built-in functions are handled without executing unsafe code.
Step 3: Load and analyze framework definitions
Inside the sandbox, Foundational loads SQLAlchemy classes and inspects each class using Python introspection. This method allows the system to extract SQLAlchemy schemas even when the models contain complex logic.
Advantages of Foundational’s approach
Traditional lineage tools connect directly to a running database such as a Postgres instance hosted on AWS RDS and retrieve the deployed schema. This approach only shows schema changes after deployment. It provides no early warning to downstream consumers. Dashboards may break, transformations may fail, and machine-learning pipelines may error before anyone realises an upstream change has occurred.
Schema drift also occurs. Developers may update SQLAlchemy models without deploying changes, or migration files may be incomplete or broken, which leads to mismatches between the code and the live database.
Foundational takes a different approach by examining SQLAlchemy model definitions directly in the repository instead of depending on the deployed database. It inspects SQLAlchemy models inside the codebase, identifies structural changes, and evaluates their potential downstream impact during development or when a Pull Request remains open. This approach gives teams early visibility and the ability to prepare for downstream impact before changes reach production.
Foundational’s approach provides several advantages:
Early visibility: Teams see schema changes during development.
Proactive communication: Downstream consumers can prepare for upcoming changes.
Reduced breakages: Dashboards, ML features, transformations, and pipelines are less likely to fail unexpectedly.
Better alignment across teams: Engineering and data teams stay coordinated when schemas evolve.
Example of SQLAlchemy model from our own code at Foundational
Why doesn’t Foundational analyze Alembic files?
Alembic is a migration tool commonly paired with SQLAlchemy to manage database schema changes. There are several reasons why Foundational does not use Alembic:
SQLAlchemy is the source of truth
Developers may update SQLAlchemy models without updating Alembic migration files. As a result, Alembic’s representation may fall out of sync with the real schema.
Migrations can be outdated or broken
Alembic relies on sequential migration files. If one migration in the sequence is invalid or missing, the entire migration chain becomes unreliable.
Scaling considerations
SQLAlchemy scales with database size. Alembic scales with the number of historical changes. Over time, the number of migrations can far exceed the number of tables
Set up SQLAlchemy lineage in Foundational
Setup is simple. Connect the repositories that contain your SQLAlchemy definitions. Foundational automatically identifies SQLAlchemy files, loads them safely, extracts schema from code, detects changes in Pull Requests, and evaluates downstream impact.
To connect to your source control, check out the relevant How-to article from the Help Center Connectors and Integrations category.
No addtional configured is required.


