Cheney on the MTA (1994)

Cheney on the M.T.A.

CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A.[1]

DRAFT for comp.lang.scheme.c Feb. 4, 1994 ACM Sigplan Notices 30, 9 (Sept. 1995), 17-20.

Henry G. Baker

Nimble Computer Corporation, 16231 Meadow Ridge Way, Encino, CA 91436

(818) 501-4956 (818) 986-1360 (FAX)

Abstract

Previous Schemes for implementing full tail-recursion when compiling into C have required some form of "trampoline" to pop the stack. We propose solving the tail-recursion problem in the same manner as Standard ML of New Jersey, by allocating all frames in the (garbage-collected) heap. The Scheme program is translated into continuation-passing style, so the target C functions never return. The C stack pointer then becomes the allocation pointer for a Cheney-style copying garbage collection scheme. Our Scheme can use C function calls, C arguments, C variable-arity functions, and separate compilation without requiring complex block-compilation of entire programs.

Introduction

IEEE Scheme [IEEE90] requires that all functions be properly tail-recursive, in order that tail-recursive programs not require an unbounded amount of stack space. Several Scheme compilers have targeted the C language [Bartlett89], because C is an efficient systems programming language which is available on nearly every computer, thus ensuring portability of the compiled code. Unfortunately, C compilers are not required to be properly tail-recursive, and only the GNU C compilers [Stallman90] attempt to achieve tail recursion in their implementations.

A popular method for achieving proper tail recursion in a non-tail-recursive C implementation is a trampoline.[2] A trampoline is an outer function which iteratively calls an inner function. The inner function returns the address of another function to call, and the outer function then calls this new function. In other words, when an inner function wishes to call another inner function tail-recursively, it returns the address of the function it wants to call back to the trampoline, which then calls the returned function. By returning before calling, the stack is first popped so that it does not grow without bound on a simple iteration. Unfortunately, the cost of such a trampoline function call is 2-3 times slower than a normal C call, and it requires that arguments be passed in global variables

[Tarditi92].

Appel's unpublished suggestion for achieving proper tail recursion in C uses a much larger fixed-size stack, continuation-passing style, and also does not put any arguments or data on the C stack. When the stack is about to overflow, the address of the next function to call is longjmp'ed (or return'ed) to a trampoline. Appel's method avoids making a large number of small trampoline bounces by occasionally jumping off the Empire State Building.

The Proposed Scheme

We propose to compile Scheme by converting it into continuation-passing style (CPS), and then compile the resulting lambda expressions into individual C functions. Arguments are passed as normal C arguments, and function calls are normal C calls. Continuation closures and closure environments are passed as extra C arguments. (Of course, calls to closures perform a C call on the code portion of the closure, and pass the environment portion of the closure as an additional argument.) Such a Scheme never executes a C return, so the stack will grow and grow.

Since the C "stack" never contracts, we can allocate all of our closures and user data structures on this stack as automatic/dynamic/local data. All closures and user data structures whose sizes are known at compile time are statically allocated in the C "stack" frame; dynamic arrays and other data structures whose size is unknown at compile time can be allocated by C's alloca primitive (or equivalent), which also obtains space from the "stack".[3] Since our C "stack" is also the "heap", there is no distinction between stack and heap allocation.

Since none of our C functions ever returns, the only live frame on this "stack" is the top one. However, within many of the dead frames will be found live closures and live user data objects. Eventually, the C "stack" will overflow the space assigned to it, and we must perform garbage collection. Garbage collection (GC) by copying is a relatively straight-forward process. There are a number of static roots, as well as the latest continuation closure, which is passed to the GC as an argument. (Forming an explicit continuation closure for the GC avoids the necessity of scanning C stack frames.) The live objects and live closures are all copied (and thereby condensed) into another area, so that execution can be restarted with a "stack" frame at the beginning of the C "stack" allocation area.

A key point is that since only live objects are traced--i.e., garbage (including the C frames, which are all dead) is not traced--the GC does not have to know the...

Cheney on the MTA (1994)

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits