Cheney on the M.T.A.
CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A.[1]
DRAFT for comp.lang.scheme.c Feb. 4, 1994<br>ACM Sigplan Notices 30, 9 (Sept. 1995), 17-20.
Henry G. Baker
Nimble Computer Corporation, 16231 Meadow Ridge Way, Encino, CA 91436
(818) 501-4956 (818) 986-1360 (FAX)
Copyright (c) 1994 by Nimble Computer Corporation. All rights reserved.
Abstract
Previous Schemes for implementing full tail-recursion when compiling into C<br>have required some form of "trampoline" to pop the stack. We propose solving<br>the tail-recursion problem in the same manner as Standard ML of New Jersey, by<br>allocating all frames in the (garbage-collected) heap. The Scheme program is<br>translated into continuation-passing style, so the target C functions never<br>return. The C stack pointer then becomes the allocation pointer for a Cheney-style<br>copying garbage collection scheme. Our Scheme can use C function calls,<br>C arguments, C variable-arity functions, and separate compilation without<br>requiring complex block-compilation of entire programs.
Introduction
IEEE Scheme [IEEE90] requires that all functions be properly tail-recursive,<br>in order that tail-recursive programs not require an unbounded amount of stack<br>space. Several Scheme compilers have targeted the C language [Bartlett89],<br>because C is an efficient systems programming language which is available on<br>nearly every computer, thus ensuring portability of the compiled code.<br>Unfortunately, C compilers are not required to be properly tail-recursive, and<br>only the GNU C compilers [Stallman90] attempt to achieve tail recursion in<br>their implementations.
A popular method for achieving proper tail recursion in a non-tail-recursive C<br>implementation is a trampoline.[2] A trampoline is an outer function which<br>iteratively calls an inner function. The inner function returns the address<br>of another function to call, and the outer function then calls this new<br>function. In other words, when an inner function wishes to call another inner<br>function tail-recursively, it returns the address of the function it wants to<br>call back to the trampoline, which then calls the returned function. By<br>returning before calling, the stack is first popped so that it does not grow<br>without bound on a simple iteration. Unfortunately, the cost of such a<br>trampoline function call is 2-3 times slower than a normal C call, and it<br>requires that arguments be passed in global variables
[Tarditi92].
Appel's unpublished suggestion for achieving proper tail recursion in C uses a<br>much larger fixed-size stack, continuation-passing style, and also does not<br>put any arguments or data on the C stack. When the stack is about to<br>overflow, the address of the next function to call is longjmp'ed (or<br>return'ed) to a trampoline. Appel's method avoids making a large number of<br>small trampoline bounces by occasionally jumping off the Empire State<br>Building.
The Proposed Scheme
We propose to compile Scheme by converting it into continuation-passing style<br>(CPS), and then compile the resulting lambda expressions into individual C<br>functions. Arguments are passed as normal C arguments, and function calls are<br>normal C calls. Continuation closures and closure environments are passed as<br>extra C arguments. (Of course, calls to closures perform a C call on the code<br>portion of the closure, and pass the environment portion of the closure as an<br>additional argument.) Such a Scheme never executes a C return, so the stack<br>will grow and grow.
Since the C "stack" never contracts, we can allocate all of our closures and<br>user data structures on this stack as automatic/dynamic/local data. All<br>closures and user data structures whose sizes are known at compile time are<br>statically allocated in the C "stack" frame; dynamic arrays and other data<br>structures whose size is unknown at compile time can be allocated by C's<br>alloca primitive (or equivalent), which also obtains space from the "stack".[3]<br>Since our C "stack" is also the "heap", there is no distinction between stack<br>and heap allocation.
Since none of our C functions ever returns, the only live frame on this<br>"stack" is the top one. However, within many of the dead frames will be found<br>live closures and live user data objects. Eventually, the C "stack" will<br>overflow the space assigned to it, and we must perform garbage collection.<br>Garbage collection (GC) by copying is a relatively straight-forward process.<br>There are a number of static roots, as well as the latest continuation<br>closure, which is passed to the GC as an argument. (Forming an explicit<br>continuation closure for the GC avoids the necessity of scanning C stack<br>frames.) The live objects and live closures are all copied (and thereby<br>condensed) into another area, so that execution can be restarted with a<br>"stack" frame at the beginning of the C "stack" allocation area.
A key point is that since only live objects are traced--i.e., garbage<br>(including the C frames, which are all dead) is not traced--the GC does not<br>have to know the...