Polars GPU Engine

jonbaer1 pts0 comments

Polars GPU engine — cudf 26.06.01 documentation

Skip to main content

Back to top

Ctrl+K

Search<br>Ctrl+K

System Settings

Light

Dark

GitHub

Twitter

Search<br>Ctrl+K

Home<br>cudf<br>cucimcudf-javacudfcugraphcumlcuprojcuspatialcuvscuxfilterdask-cudadask-cudfkvikiolibcudflibcumllibcuprojlibcuspatiallibkvikiolibrapidsmpflibrmmlibucxxnvforestraftrapids-cmakerapidsmpfrmmucxx

stable (26.06)<br>nightly (26.08)stable (26.06)legacy (26.04)

System Settings

Light

Dark

GitHub

Twitter

Collapse Sidebar<br>Expand Sidebar

Polars GPU engine#

cuDF provides GPU-accelerated execution engines for Python users of the Polars Lazy API. The<br>engines support most of the core expressions and data types as well as a growing set of more<br>advanced dataframe manipulations and data file formats. When a GPU engine is selected, Polars<br>converts expressions into an optimized query plan and determines whether the plan is supported<br>on the GPU. If it is not, the execution transparently falls back to the standard Polars engine<br>and runs on the CPU.

Install#

Follow the RAPIDS installation guide and pick the<br>cudf-polars package for your CUDA and Python versions. For example, with conda:

conda install -c rapidsai -c conda-forge -c nvidia cudf-polars

Or with pip (CUDA 13 wheels; use cudf-polars-cu12 for CUDA 12):

pip install cudf-polars-cu13

Quick start#

RayEngine with no arguments uses<br>every GPU visible to the process, so the same code runs on one GPU and scales to multi-GPU /<br>multi-node setups automatically:

import polars as pl<br>from cudf_polars.engine.ray import RayEngine

query = (<br>pl.scan_parquet("/data/dataset/*.parquet")<br>.filter(pl.col("amount") > 100)<br>.group_by("customer_id")<br>.agg(pl.col("amount").sum())

with RayEngine() as engine:<br>result = query.collect(engine=engine)

See Usage for the full tutorial, Engines for a conceptual overview of the<br>available engines, Configuration Options for the<br>StreamingOptions configuration, and<br>Understanding Memory Use in the GPU Streaming Engine for guidance on out-of-memory errors and memory tuning.

Benchmark#

Polars delivers high performance across a wide range of data scales through multiple execution engines. The default CPU engine is highly optimized for interactive and medium-scale analytics on a single node. The Polars GPU engine lets you move seamlessly to GPU nodes, providing meaningful acceleration when your dataset grows to hundreds of gigabytes or more.

We ran the Polars Decision Support (PDS) benchmarks to compare the Polars GPU engine with the CPU engine at larger scale factors to show how the GPU engine delivers meaningful speedups as dataset size grows:

PDS-H (SF1K)#

PDS-DS (SF1K)#

On a single GPU, you can run TB-scale workloads with significant speedups compared to running on CPU. You can also scale up to run on multiple GPUs for processing even larger workloads:

PDS-H (SF3K)#

PDS-DS (SF3K)#

For more information on the benchmarks being run, see the PDS-DS queries in the cuDF GitHub repository.

Learn More#

The GPU engine for Polars is now available in Open Beta and the engine is undergoing rapid development.<br>To learn more, visit the GPU Support page on the Polars website.

Contents:

Usage

Engines

Configuration Options

Profiling and Tracing

Other Engines

Understanding Memory Use in the GPU Streaming Engine

API Reference

Launch on Google Colab#

Try out the GPU engine for Polars in a free GPU notebook environment.<br>Sign in with your Google account and launch the demo on Colab.#

On this page

Show Source

engine polars cudf engines data memory

Related Articles