Why not tail recursion?
The Futhark Programming Language
High-performance purely functional data-parallel array programming
Overview<br>Examples<br>Docs<br>Publications<br>Gotta Go Fast!<br>Get Involved<br>Blog
Fork me on GitHub
Why not tail recursion?
Posted on January 20, 2026
by Troels Henriksen
This post was inspired by this discussion on<br>/r/ProgrammingLanguages.
Futhark is unusual among functional programming languages in not allowing<br>functions to be recursive. This is to some extent justified by Futhark being a<br>parallel language, and recursive looping being an intrinsically sequential<br>operation, and therefore less crucial. However, most nontrivial parallel<br>algorithms still involve some amount of sequential looping, and so this<br>capability cannot be left out entirely. The reason Futhark still does not allow<br>recursive functions is due to our goal of generating code for restricted<br>targets, most notably GPUs, where recursion is at best inefficient and often<br>impossible, due to restrictions on stack usage. Our solution to this problem is<br>to provide dedicated syntax for sequential looping that is semantically just a<br>local tail-recursive function:
loop acc = 1 for i do acc * (i+1)
This construct defines a loop parameter acc that is initially set to 1,<br>then n times evaluates the body acc * (i+1), each time re-binding the acc<br>parameter to the value returned by the previous iteration of the body. This is<br>equivalent to the tail-recursive function
def loop acc i = if i<br>then loop (acc * (i+1)) (i+1)<br>else acc
initially called as loop 1 0. A variation also exists for while loops. The<br>advantage of this design is that it obviously allows for looping without using<br>any stack space.
However, there are plenty of functional languages where recursion is the primary<br>or only way to loop. These languages usually perform tail call<br>elimination, in which the stack space<br>of the caller is reused by the callee. Why don’t we just do that in Futhark as<br>well?
One reason is that most Futhark backends do not produce machine code directly,<br>but rather C, which is then processed by some downstream C compiler. The pros<br>and cons of this design choice perhaps merits a dedicated blog post in the<br>future, but the main downside is that we can only express things that are easily<br>(and preferably portably) expressible in C, and tail calls are not such a<br>thing. There are well known techniques for implementing them, such as<br>trampolining,<br>but I am worried that they will confuse the kernel compilers provided by GPU<br>drivers. Futhark generally benefits from generating C code that looks like what<br>these compilers expect, and when we deviate, we often<br>suffer. As I have mentioned at<br>various times, the main goal of Futhark is not to run at all, but to run fast,<br>and we are not interested in features that we cannot run<br>fast. If your main goal is just to write a<br>nice functional program, then Futhark is not necessary.
Alright, so general tail call elimination is out, but what about merely tail<br>recursion? That can always be rewritten to a C for or while loop (which is<br>also what inspired loop in Futhark). Here the problem is that for Futhark,<br>optimising these tail calls is not just a nice optimisation, but would have to<br>be an ironclad guarantee - in those cases where the recursion is not in tail<br>position (and so can be eliminated), the program must be rejected, because<br>consuming stack space proportional to the recursion depth is not acceptable.
It turns out that most functional languages do not promise that tail recursion<br>is optimised, although they unsurprisingly tend to do a pretty good job of it in<br>practice. I think one of the first languages to make a promise in this area,<br>and to precisely specify the bounds of this promise, was Scheme. In the last<br>specification for Scheme that I understand, namely the fifth<br>one, tail calls are<br>defined in Section<br>3.5. The<br>rules therein are a detour from the rest of the otherwise famously elegant<br>definition, in that it is basically a list of syntax-driven rules for when an<br>application is in tail position. I think the rules are supposed to be applied<br>after macro expansion, but I do not see it explicitly mentioned.
Although these rules are simple and predictable, I dislike that they impose<br>constraints on superficial elements of your programming style. Consider a<br>tail-recursive factorial function written in Haskell:
fact acc i = if i == 0 then acc<br>else fact (acc*i) (i-1)
We can easily imagine some syntax-driven rules for which the recursive call to<br>fact is classified as a tail call. But Haskell programmers hate parentheses,<br>and might instead write it like this:
fact acc i = if i == 0 then acc<br>else fact (acc*i) $ i-1
The $ operator is a library definition (not a language builtin) that is just<br>function application with a low operator priority. It serves no semantic<br>purpose, but allows long chains of applications such as f (g (h x)) to instead<br>be written f $ g $ h x. It serves the same purpose as the pipeline operator<br>|> popularised...