Snowflake with Iceberg: Lakekeeper, Dbt, and Some Sparks Flying

eigenBasis1 pts0 comments

Snowflake with Iceberg: Lakekeeper, dbt, and some Sparks Flying | by Samuel Valente | fresha-data-engineering | May, 2026 | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

fresha-data-engineering

Data Engineering Blog

Snowflake with Iceberg: Lakekeeper, dbt, and some Sparks Flying

Samuel Valente

13 min read·<br>2 days ago

Listen

Share

Press enter or click to view image in full size

Snowflake, Iceberg, and the long road through the caveats.TL;DR<br>It kinda works, but it’s far from perfect. It’s like asking your dishwasher to also do the laundry — with enough duct tape you can make it happen, but then you start questioning your life choices.

Background<br>Here at Fresha, we don’t pretend to only use the latest and greatest tools, but we’re also keen on using the ones that are already proven and well-established. We’ve been using Snowflake for a long time, and we’re happy with it. We ingest data from all our sources in Snowflake and then we transform with dbt.<br>Honestly, it’s a great platform for data warehousing and analytics. We have no plans to abandon it, we intend to keep using it where it shines, and to use other tools to complement it where it doesn’t.

If you’re interested in how we ingest data into Snowflake, check out the post where we peek into the architecture in more detail.<br>The Use Case<br>We need to leverage Snowflake data in other platforms, such as StarRocks, to avoid duplicating data ingestion and processing efforts, in a way that is efficient, scalable, and reliable. Portability and interoperability come into the consideration when we work across different systems and tools.<br>Naturally, Iceberg is a great fit for this use case, it’s open source, scalable, reliable, and well-supported.<br>To achieve our goal, we set up an Airflow DAG that waits for a dbt Snowflake job to finish in dbt Cloud, spins up a Spark job to ingest data from Snowflake into Iceberg, and then performs maintenance operations on the Iceberg table.<br>Press enter or click to view image in full size

The current pipeline, meh.While this works, it’s not ideal, and it’s not something we’re proud of. There are quite a few steps, moving parts, and boilerplate code that we had to write to make it work.<br>The Promise of Iceberg on Snowflake<br>But wait… Snowflake does support Iceberg tables! And they even support them through an external REST catalog! This means that we can use the tools we want and when we want to. Power to the Data Engineers! The prophecy is fulfilled!<br>Relevant docs: Apache Iceberg™ tables · REST catalog integration · Snowflake Open Catalog (Polaris)<br>The Raised Eyebrows<br>While this is a great promise, we’re cautious and skeptical. When looking at the Snowflake docs, they mostly focus on their own catalog, a managed version of Polaris, and while they mention the REST catalog, it doesn’t seem to be as polished as their proprietary one. Being familiar with Snowflake, there’s always a catch to be expected, normally in the form of some weird limitation. Besides, as a SaaS platform, would you expect anything else? At the end of the day, they don’t want you to move your data to another platform.<br>Yes, we support Open Table Formats!*<br>*But you better use our catalog.

Anyway, let’s give it a try for the sake of science.<br>The Ideal Pipeline<br>Before we dive into the implementation, let’s define the ideal pipeline.<br>The dbt-snowflake adapter supports Iceberg, so instead of hacking that complicated pipeline we have now, we can simply use the materialisation and add an additional node to the dbt run to run the Iceberg models, porting the Snowflake base tables to Iceberg.<br>Press enter or click to view image in full size

Neat!No Spark jobs, no waiting for this and that, no custom code, just a simple dbt run (which already runs in Snowflake, so no need to change that) and we’re done!<br>Infrastructure Setup<br>Before we get to the fun part, let’s talk about how we actually wired this up in Terraform.<br>Here’s the catch: the Snowflake Terraform provider, at the time of writing, does not support external volumes, catalog integrations, or linked databases. These are relatively new Snowflake features and the provider simply hasn’t caught up yet.<br>So what do you do? You use snowflake_execute.<br>snowflake_execute is Terraform's polite way of saying "just run this SQL and trust the process". You hand it a CREATE statement, a DROP for the revert, and a SHOW query so it can check if the resource already exists. It works. It's not pretty, but it works.<br>It’s the Terraform equivalent of // TODO: fix this properly — except you ship it, and then it runs in production, and then you stop thinking about it.

How It All Fits Together<br>Three Snowflake-side resources need to exist before dbt can write a single file: the external volume tells Snowflake which S3 path and IAM role to use for storage; the catalog integration gives Snowflake a connection to Lakekeeper (the REST catalog) and the OAuth credentials to authenticate; the linked...

snowflake iceberg data catalog lakekeeper works

Related Articles