Kraid: A New Compiler for Panfrost

losgehts1 pts0 comments

Kraid: A new compiler for Panfrost

-->

-->

-->

About

Who we are

Our expertise

Our work

Open Source

Our ecosystem

Services

Guide

Train

Build

Integrate

Optimize

Maintain

Industries

Automotive

Digital TV

Silicon

OEM

VR/AR

News & Blog

Careers

Contact

About

Services

Industries

News & Blog

Careers

Contact

-->

+44 (0)1223 362967

+1 514 667 2499

contact@collabora.com

-->

Home "

News & Blog "<br>News & Events "

-->

Home<br>News & Blog<br>News

Kraid: A new compiler for Panfrost

-->

25/06/2026<br>-->

Posted on 25/06/2026 by Faith Ekstrand

-->

Kraid: A new compiler for Panfrost

Posted on 25/06/2026 by Faith Ekstrand<br>--><br>Faith Ekstrand<br>June 25, 2026

Share this post:

-->

Reading time:

The Panfrost compiler stack is getting a little long in the tooth. It was originally written for Bifrost and then Valhall support was sort of bolted on. It works but the Bifrost compiler is still, at its heart, a Bifrost compiler and we're running into the limitations of the original design.

After doing a thorough evaluation of what we actually want, where we are at now, and what it would take to get there, we came to the conclusion that we need a fresh start. In particular, what we're looking for in a new compiler is:

Proper 64-bit sources. The old IR borrowed the Bifrost convention where 64-bit ops take two registers, one for each half of the 64-bit value. On Valhall (v9) and all later Mali GPUs, 64-bit sources are a single source in the instruction encoding that has to be aligned to an even register. While a translation from Valhall to Bifrost convention would be easy, translating the other direction requires a bunch of tracking everywhere to know which sources are paired. It's a headache all over the IR.

16-bit SSA defs (and maybe 8-bit, too): The Bifrost IR is fundamentally a 32-bit IR. Each SSA value maps to one or more 32-bit registers. For mediump and other smaller bit sizes, we support them as vectors that pack multiple values into a 32-bit register. This works okay as long as everything nicely packs. However, our experience with mediump in OpenGL ES is that there are often scalar values which don't get nicely packed into vectors. Those scalars end up taking a full 32-bit register even if only 16 bits of data are used. If we can pack two scalar values together into the same register, we may be able to reduce the number of registers used by a shader and get higher occupancy.

Core IR definition separate from encoding: The Bifrost IR was originally built for Bifrost and it was based on an ISA definition in XML. Each instruction is a Bifrost hardware instruction and the translation from bi_instr to encoded bits is mostly auto-generated. This approach has its advantages but it means that the entire IR is intrinsically linked to the Bifrost instruction set, making problems like the 64-bit source issue very difficult to solve. Instead, we want to separate the core IR from the encoding so we can make choices in the IR that make sense from a compiler perspective and then map that IR onto the ISA as a second step. This will help us isolate hardware generation differences and make it easier to support any changes Arm throws at us in the future.

An encoder and [dis]assembler that's derived directly from the Arm XML: While the old IR was based on an ISA description in XML, the XML was hand-typed by Alyssa Rosenzweig as part of her reverse-engineering efforts. Over the years, we've found multiple errors in the reverse-engineered XML. Now that we're more tightly collaborating with Arm, we have the opportunity to do better. As part of this effort, Arm is providing XML files which are generated from their internal XML descriptions of the hardware. While nothing is perfect, this should help reduce the likelihood of running into these issues in the future.

A new SSA-based register allocator: While the old IR uses Single Static Assignment (SSA) form, it still uses an iterative register allocation and spilling approach. This is slow, especially for large shaders which need to spill more than a few values. Even worse, there are cases where the current register allocation and spilling algorithm will simply fail to compile the shader.

HW unit tests: One of the things I learned while writing the Nouveau compiler (for NVIDIA GPUs) is value of a hardware unit test suite. Unlike software unit tests, these compile tiny shaders and execute them on the GPU. This makes it easy to test the exact behavior of the specific hardware instructions. Even though we have access to documentation from Arm, there are still details that aren't documented. But when you're writing a compiler, details matter. The ability to ferret out hardware corner cases is often essential to writing a compiler that's actually correct.

Better generalization of opcodes across data types: The Mali instruction set often has multiple forms of each instruction that operate on different data types. For instance, IAND comes in 4 different variants:...

compiler bifrost from register news instruction

Related Articles