Databricks LTAP Explained: Turning Postgres WAL into Lakehouse Storage
Li Shen
SubscribeSign in
Databricks LTAP Explained: Turning Postgres WAL into Lakehouse Storage
Li Shen<br>Jul 01, 2026
Share
Databricks recently published a technical blog on Lakebase and LTAP: From monolith to Lakebase to LTAP: rethinking database storage. The post is interesting not because it introduces another acronym, but because it reframes a familiar data infrastructure problem:<br>Why do operational databases and analytical systems still feel like two different worlds?<br>Most production systems begin with an operational database: PostgreSQL, MySQL, Oracle, SQL Server, or a cloud-native variant. These systems handle user-facing writes: orders, payments, inventory updates, account changes, workflow state, application metadata.<br>But the moment a team wants analytics—dashboards, long-range aggregations, ML features, customer segmentation, agent context, fraud detection—the data is usually copied somewhere else: a warehouse, a data lake, or a lakehouse.<br>That copy is not free. It creates pipelines, lag, schema drift, validation jobs, replay logic, failure handling, and a permanent question: which system has the freshest and most correct view?<br>LTAP is Databricks’ attempt to move part of that boundary. Instead of treating operational data as something that must be exported into the lakehouse later, LTAP asks:<br>Can the storage layer of an operational database directly produce lakehouse-native columnar data?
That is the core idea. The rest of the architecture is a way to make that idea plausible.
1. The old split: OLTP wants rows, OLAP wants columns
Operational databases and analytical systems optimize for different access patterns.<br>An OLTP database serves short, high-concurrency transactions:<br>create an order
update a payment status
look up one user
decrement inventory
insert an event
fetch a workflow state record
These operations usually touch a small number of rows and must be fast, isolated, and durable.<br>An OLAP system serves large analytical queries:<br>revenue by region over 12 months
cohort retention
product-level conversion rates
feature generation for ML
historical context for AI agents
joins across large business tables
These queries often scan many rows but only a subset of columns.<br>That difference leads to different storage layouts.<br>A row-oriented layout stores one record together:<br>order_id | user_id | product_id | price | status | created_at<br>This is good when the application frequently reads or updates one full record.<br>A column-oriented layout stores values from the same column together:<br>order_id: 1, 2, 3, 4, ...<br>user_id: 9, 9, 5, 8, ...<br>price: 12.0, 25.0, 7.5, ...<br>status: paid, pending, paid, ...<br>created_at: ...<br>This is better for analytical scans. If a query only needs price, status, and created_at, the engine can avoid reading the rest. Apache Parquet, for example, is explicitly designed as an open-source column-oriented file format for efficient storage and retrieval in analytical workloads.1<br>So the basic tension is simple:<br>Applications prefer row-oriented transactional storage. Analytics prefers column-oriented scan storage.<br>Most data stacks resolve this by copying data from one world to the other.
This works, but it also means the analytics stack is downstream from the operational database. It has to catch up.
2. The key database primitive: WAL
To understand LTAP, we need one database concept: WAL , or Write-Ahead Log .<br>A database does not simply modify data files and hope for the best. Before it makes a durable change, it first records the change in a log.<br>That log is WAL.<br>WAL matters for several reasons:<br>Durability : once a transaction commits, the database can recover it even if the machine crashes.
Recovery : after restart, the database can replay WAL to reconstruct a correct state.
Replication : replicas can follow the primary by consuming the WAL stream.
Change capture : downstream systems can use WAL as a chronological stream of data changes.
For LTAP, the fourth point is the important one.<br>If WAL already contains the ordered history of database changes, then it is a natural place to derive other representations of the data.<br>Traditional replication uses WAL to maintain another database copy. CDC uses WAL to feed another system. Neon-style storage uses WAL to reconstruct Postgres pages. LTAP pushes the idea further: use WAL to produce lakehouse-native columnar data.
3. Lakebase starts by disaggregating Postgres
Lakebase builds on the architecture popularized by Neon: separate Postgres compute from the storage layer.<br>In a conventional Postgres deployment, the compute process, WAL, buffer cache, data pages, indexes, checkpoints, and local storage are tightly coupled around the database instance.<br>In a Neon-style architecture, the system is split into distinct services:<br>Postgres compute executes SQL and talks to applications.
Safekeepers durably replicate WAL and define the commit...