Co-dfns versus BQN's implementation
(github) / BQN / implementation
Co-dfns versus BQN's implementation
Co-dfns is under active development so this document might not reflect its current state. Last update 2025-05-02.
The BQN self-hosted compiler is directly inspired by the Co-dfns project, a compiler for a subset of Dyalog APL. I'm very grateful to Aaron for showing that array-oriented compilation is even possible! In addition to the obvious difference of target language, BQN differs from Co-dfns both in goals and methods.
The shared goals of BQN and initial Co-dfns research are to implement a compiler for an array language with whole-array operations. This provides the theoretical benefit of a short critical path, which in practice means that both can make good use of a GPU or a CPU's vector instructions simply by providing an appropriate runtime. The two implementations also share a preference for working "close to the metal" by passing around arrays of numbers rather than creating abstract types to work with data. Objects are right out. These choices lead to a compact source code implementation, and may have some benefits in terms of how easy it is to write and understand the compiler.
More recent versions of Co-dfns are far less strict about the array style. The patterns are still usually parallel in some sense, but they might work with many small arrays and it's no longer clear that they are suitable for GPU evaluation, or fast in a classic APL interpreter. Co-dfns has never truly run on a GPU to my knowledge. The compiler has never been self-hosting, and the GPU measurements in Aaron's thesis were taken with a hand-translated version of the APL code.
Compilation strategy
The Co-dfns source separates tokenization and parsing, core compilation, and code generation from each other. Initial work focused on the core compiler only. The associated Ph.D. thesis and famous 17 lines figure refer to this section, which transforms the abstract syntax tree (AST) of a program to a lower-level form, and resolves lexical scoping by linking variables to their definitions. While all of Co-dfns is written in APL, other sections aren't necessarily data-parallel, and could perform differently. In contrast, the BQN compiler is entirely written in a data-parallel style. Its source splits out tokenization but mixes everything else together; code generation is much simpler because it only outputs IR, as discussed in the next section.
The core Co-dfns compiler is based on manipulating the syntax tree, which is mostly stored as parent and sibling vectors—that is, lists of indices of other nodes in the tree. BQN is less methodical, but generally treats the source tokens as a simple list. This list is reordered in various ways, allowing operations that behave like tree traversals to be performed with scans under the right ordering. This strategy allows BQN to be much stricter in the kinds of operations it uses. The compiler shown in Aaron's thesis regularly uses ⍣≡ (repeat until convergence) for iteration and creates nested arrays with ⌸ (Key, like Group). BQN has only a single instance of iteration to resolve quotes and comments, plus one complex but parallelizable scan for numeric literal processing. And it only uses Group to extract identifiers and strings, and collect metadata for final output. This means that most primitives, if we count simple reductions and scans as single primitives, are executed a fixed number of times during execution, making complexity analysis even easier than in Co-dfns.
As of 2025 I think Co-dfns has backed off a bit from the data-parallel array paradigm: even the core compiler slips in a few each-ish patterns that work on an arbitrary number of small arrays (it does rely less on nested arrays). These parts are still parallel with good complexity in a multi-core model, because the different iterations are independent, but they don't have the homogeneity that makes array primitives such a good fit for GPU and SIMD CPU implementations. For parsing, many different methods have been used historically, beginning with sequential ones like parser combinators and a parsing expression grammar. In 2021, tokenization and most of parsing were fairly close to the style used by the core compiler. In 2025 I would describe both the tokenizer and parser as using a hybrid system, which sticks to flat arrays when convenient, but also readily uses loops or occasionally nested arrays based on source code features like numeric literals, function headers, or expressions. These changes point to an emphasis on handling more APL features and improving the compiler output, recognizing compiler speed as less practically important.
Backends and optimization
Co-dfns was designed from the beginning to build GPU programs, and outputs code in ArrayFire (a C++ framework), which is then compiled. GPU programming is quite limiting, and as a result Co-dfns began with strict limitations in functionality that were slowly...