Polars — Fluent, not native: agents translating pandas to Polars<br>We're hiring
Docs User guide Rust Python
Resources Our services Academy Blog About us
We're hiring
--> Back to blog<br>Fluent, not native: agents translating pandas to Polars<br>By Thijs Nieuwdorp on Thu, 25 Jun 2026
For a growing number of developers, the first Polars they ever see was written by a language model.<br>Some just ask them for advice on how to tackle certain transformations, while others haven’t programmed a Polars query themselves in months.
One of the leading business cases is that LLMs have made migration a quick win.<br>A developer feeds in a pandas script, gets back Polars, and verifies that the output matches.<br>The result is typically a pipeline that runs an order of magnitude faster on smaller, cheaper machines.1<br>With a lower cost of translation, the return on investment is achieved even sooner.
The success of this business case depends on the quality of the translation an LLM can produce.<br>The Polars API is expressive, and a good translation should reach for the construct that states each intent most directly, not merely code that runs.<br>We wanted to know whether current models do that, or whether they fall back on pandas-shaped habits that happen to be valid Polars.
We had Claude Opus 4.8 translate a pandas corpus to Polars and checked the result for idiomatic use of Polars.<br>Each case was translated in a fresh session: one shot.<br>Most translations ran, the outputs matched, and the code read as idiomatic Polars.<br>Two structural patterns remained: cases where the model translates the existing pandas structure rather than the Polars construct that expresses the intent directly.<br>We packaged fixes into a skill, re-ran the exercise with it loaded, and measured the difference.<br>The skill nudges the model in the right direction on those patterns, but it is not a magic bullet.<br>Fable 5, Anthropic’s newest model, is not available to us outside the United States, so we could not include it.
The setup
To measure where things stand today, we translated three bodies of pandas code:
the 22 PDS-H benchmark queries (This is a benchmark derived from TPC-H. Learn more here), using the pandas implementations from polars-benchmark. By providing just the pandas queries as input, we have an easily verifiable benchmark, since we have equivalent Polars queries available.
an 11-case EDA corpus drawn from three public pandas teaching repositories2, covering window functions, resampling, as-of joins, reshaping, string processing, binning, and missing data. These advanced patterns are not found in the PDS-H benchmark.
two real-world ETL notebooks from Kaggle, used for initial manual exploration. This gives larger real life examples to compare against.
Each case was translated by Claude Opus 4.8 in a fresh session: one shot, with a prompt that explicitly asked it to translate idiomatically to Polars.<br>We verified every translation by comparing its output against the output of the original pandas code (and for PDS-H against the reference answers) with polars.testing.assert_frame_equal.<br>We ran the same corpus through Claude Sonnet 4.6 for comparison. Those results appear in the appendix.
Translation quality today
A year ago, LLMs frequently mixed the pandas and Polars APIs, reaching for deprecated methods like groupby and with_column, and producing code that did not run.<br>Today, 31 of 33 Opus 4.8 translations ran correctly and the verified outputs matched.<br>The two exceptions were minor: a missing import in one case and a column-reference error in another.<br>Opus attempts to translate Polars more idiomatically than earlier models did, restructuring rather than translating line by line, although it occasionally introduces small errors.
The patterns that were problematic in the past all translated cleanly:
Basics like group_by, with_columns, is_in land correctly.
Semi and anti joins are used where they fit.
Conditional aggregations stay native. map_elements did not appear in any of the PDS-H or EDA translations.
The lazy/eager distinction is respected and collect() appears where it should. In the lazy API you build up a query and Polars only runs it, with optimizations applied, once you call collect().
List-wrapped arguments, Python-side scalar arithmetic, and map_elements fallbacks did not appear. These were common tells in earlier models; the Sonnet 4.6 results in the appendix show what that looked like.
The patterns that remain are subtler.<br>The code runs and the output is correct, but in a small number of cases the solution is structured the way the original pandas code was structured.<br>The model knows the Polars API well, but does not always recognize when a different construct expresses the intent more directly.
Pattern 1: Translating structure instead of intent
This is the clearest case we found, and the hardest to eliminate.<br>In this example, the source notebook annotates per-minute stock prices with 5-minute window aggregates.<br>The notebook uses a...