Martin Uecker
Blog
Contact/Impressum
Using GCC's Nested Functions with Wide Pointers and no Trampolines
Martin Uecker, 2026-01-06
Introduction
Nested functions are extremely useful, which is why basically any computer<br>language since ALGOL60 has them. Except C.
In particular, nested functions are useful to write kernels for<br>passing them to higher-order functions. The kernels can access local state,<br>and this is important because it then does not have to be passed as a void<br>pointer to the higher-order function.
void tree_increase_all(tree(int) *t, int increment)<br>void update(int *value) { (*value) += increment; }
tree_walk(t, update);
For very simply nested functions it makes also sense to use a lambda<br>expression, i.e. an anonymous nested function.
void tree_increase_all(tree(int) *t, int increment)<br>tree_walk(t, (void(int *value)){
(*value) += increment);<br>});
How does this work?
The nested function gets passed in a special register ‐ the static<br>chain register ‐ a link to its environment. It can then access the<br>variables of the parent function(s) via this pointer. But how does the<br>calling function know what to pass in the static chain register? The obvious<br>solution is to have this information contained in the function pointer,<br>which would be allowed by the C standard. But on most platform the standard<br>application binary interface (ABI) does not allow this. There are different<br>solutions, as we will see below.
GCC supports nested function as an extension (but not lambdas). C++ has<br>lambdas with a different syntax, but those are actually objects where<br>each instance has its own unique type. C++'s lambdas are therefor most<br>useful in when used as argument to template functions. To be able to pass<br>them to a non-template function, its unique type needs to be erased<br>by wrapping it in std::function. Why C++'s lambdas are<br>an ingenious design, but have various properties which make them a<br>terrible fit for C. But this is a story for another day.
The Problem with GCC's Nested Functions
When creating a pointer to a nested function to pass it to another function,<br>a program compiled with GCC creates at runtime a small piece of code - a so-called<br>trampoline based on an idea from the eighties (Breuel et al.). These trampolines<br>were traditionally placed on the stack, which then has to be executable. For<br>security reason, it is commonly agreed on that an executable stack should be avoided.<br>An alternative solution is to create the trampoline somewhere else, e.g. on memory<br>allocated on the heap. In fact, GCC now can support this using the flag<br>-ftrampoline-impl=heap. Unfortunately, this is computationally more<br>expensive and it can also lead to memory leaks when the cleanup path is skipped<br>over in via longjmp. It therefor makes sense to look for alternative<br>solutions.
Wide Pointers and Descriptors
Instead of a trampoline, one can use a special function pointer type, which is<br>essentially what C++ does with std::function and also what<br>Apple's Block language extension does using its special syntax.<br>There are essentially two ways how this can work. Either, the new function type<br>is a wide pointer that consists of two parts, a pointer to the machine code of<br>the function just as in a regular function pointer and another pointer that<br>points to the environment of the function, i.e. the static chain pointer. When<br>the function is called, the value for the static chain register is extracted from<br>the wide pointer and loaded into the static chain register (r10 on x86_64)<br>before the code pointer is loaded and the function is called.
A alternative is to build a function descriptor that contains code pointer<br>and environment pointer, and then construct a regular pointer to this descriptor.<br>A major advantage is that the pointer has the same size as any regular pointer.<br>A disadvantage is that calling the function requires a second indirection<br>to first load code and data pointers from the descriptor. Another disadvantage<br>is that a conversion of a regular pointer then requires the construction<br>of a descriptor, which needs to be stored somewhere, while it could be<br>converted directly to a wide pointer simply by setting the static chain to<br>NULL.
Preliminary Implementation in GCC
Defining a wide pointer type for a function R(A) is not difficult.
#define wide(R, A) \<br>struct wideptr_##R##_##A { \<br>void *chain; \<br>R (*code)(A); \
Given such a wide pointer, we can then call this function using a<br>built-in function __builtin_call_with_static_chain that GCC<br>already provides to support other languages that make use of this ABI.
#define CALL(ptr, arg) \<br>__builtin_call_with_static_chain(ptr.code(args), ptr.chain)
But where do we get the two pointer when constructing the wideptr? Surprisingly,<br>there is no built-in function for this. In the past, I used a grotesque hack<br>to read the two pointer from the trampoline that GCC has created. But this<br>still required the creation of the trampoline even though it is then never used.<br>The technique is...