Introduction
Java is the backbone of enterprise software. Whether you run Hibernate entities to manage a Postgres OLTP database, orchestrate complex pipelines with Spring Data JPA, or embed raw SQL in JDBC calls, Java is often the upstream source of truth for the data that powers your analytics stack.
Because Java is statically typed and annotation-driven, tracing exactly which columns are read, written, and transformed across Java services requires deep static analysis. When data moves from a Java-managed operational database through an ETL pipeline into a data warehouse and then into BI dashboards, traditional runtime observability tools lose visibility — turning your Java data flows into a black box.
Foundational connects to your existing repositories and reads your Java code directly. It builds column-level lineage from the code itself, without requiring a running database or any changes to your application.
Why this framework matters in your data stack
Java applications can sit anywhere in your data stack: as the operational source that feeds ETL tools, as microservices that write to data lakes, or as batch jobs that populate warehouse tables. Java-managed schemas are the starting point for a chain of downstream dependencies.
When an upstream change occurs — such as a renamed column in a Hibernate entity or a removed field in a Spring Data JPA model — it can silently break downstream transformations, dashboards, and machine-learning pipelines. Data teams need a way to predict the downstream impact of these schema changes before they deploy.
Foundational CI, powered by comprehensive end-to-end lineage, thoroughly checks every pending Pull Request to ensure it doesn't disrupt downstream transformations, dashboards, ML models, and more.
How Foundational analyzes this framework
Foundational's code engine scans and extracts lineage straight from the Java source code. Data teams gain full visibility into any Java data flow, including legacy applications and custom JDBC code that do not rely on a common ORM framework.
This shift-left approach makes it possible to review data flow changes in pending Pull Requests, ensuring that any changes impacting your data stack do not lead to data incidents, pipeline disruptions, or compromises in data quality.
Foundational supports a wide range of Java data frameworks, including:
Hibernate ORM — entity classes annotated with
@Entity,@Table, and@Column
Spring Data JPA — repository and entity definitions managed through the Spring ecosystem
MyBatis — XML mapper files and annotated mapper interfaces that define SQL queries and column mappings
JDBC — raw SQL strings embedded in Java code, analyzed for column-level read and write operations
jOOQ and QueryDSL — typesafe query builder frameworks that define SQL programmatically in Java
Other JPA providers and custom frameworks — Foundational's code engine is designed to handle a broad range of Java data access patterns, including in-house frameworks and less common ORMs
Multi-step extraction process
Foundational uses a multi-step process to track data:
Identify relevant files: Foundational scans accessible repositories to locate Java files that define database schemas. It uses heuristics to detect entity classes, mapper files, and embedded SQL strings across supported frameworks. For example:
@Entity @Table(name = "orders") public class Order { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; @Column(name = "customer_id") private Long customerId; @Column(name = "total_amount") private BigDecimal totalAmount; @Column(name = "status") private String status; }Parse annotations and extract schema: Foundational parses Java source files using static analysis. It reads class-level and field-level annotations to extract table names, column names and types, primary and foreign key relationships (
@Id,@JoinColumn,@ManyToOne,@OneToMany), and inheritance structures (@MappedSuperclass). When column names are not declared explicitly, Foundational applies the framework's default naming strategy — for example, converting camelCase field names to snake_case, as Hibernate does by default.
Analyze SQL and data flow: For MyBatis, JDBC, and other SQL-based access patterns, Foundational extracts lineage by analyzing SQL operations in mapper files and embedded query strings. It identifies which columns are read (SELECT), written (INSERT, UPDATE), and used for filtering or ordering. For example, this JDBC snippet:
String sql = "INSERT INTO reporting.order_summary (customer_id, total_amount) " + "SELECT customer_id, SUM(total_amount) " + "FROM orders " + "GROUP BY customer_id"; statement.execute(sql);Produces the following column-level lineage:
orders.customer_id → reporting.order_summary.customer_idorders.total_amount → reporting.order_summary.total_amount
Graph construction and linking: Because a single Java service rarely contains full schema context, Foundational merges code analysis results with schema definitions found elsewhere in your repositories or data systems. It resolves cross-module dependencies, handles wildcard references like
SELECT *using known schema context, and assembles the complete column-level lineage graph — tracing data from the Java-managed operational database through the warehouse, into transformations, and out to BI tools and downstream consumers.
Advantages of Foundational's approach
Foundational provides data teams with:
Early visibility: Shows how Java schema changes impact data flows during development, seamlessly integrated into your source control — GitHub, GitLab, Azure Repos, Bitbucket, and more.
Shift-left impact analysis: Detects breaking changes in open Pull Requests before they reach production, so downstream consumers can prepare in advance.
Broad Java framework coverage: Supports Hibernate, Spring Data JPA, MyBatis, JDBC, jOOQ, QueryDSL, and more — including embedded SQL and legacy code that runtime tools miss entirely.
Reduced breakages: Prevents dashboards, ML features, transformation pipelines, and reverse ETL syncs from breaking due to upstream Java schema changes.
Set up Java lineage in Foundational
Setup is seamless.
Connect the repositories that contain your Java application code. There is no need to manually annotate code, add instrumentation, or modify your application.
From there, the code engine automatically identifies Java data access patterns across supported frameworks, extracts schema and lineage, detects changes in Pull Requests, and evaluates downstream impact.
