Understanding the Garbage Collector · OCaml Documentation
Introduction
Values and Functions
Basic Data Types and Pattern Matching
Loops and Recursions
Lists
Higher Order Functions
Labelled and Optional Arguments
Mutability and Imperative Control Flow
Module System
Modules
Functors
First-Class Modules
Libraries With Dune
Data Structures
Options
Arrays
Maps
Sets
Hash Tables
Sequences
CS3110
Memoization
CS3110
Monads
Advanced Topics
Preprocessors and PPXs
Operators
Objects
Runtime & Compiler
RWO
Memory Representation of Values
RWO
Understanding the Garbage Collector
RWO
Compiler Frontend
RWO
Compiler Backend
Understanding the Garbage Collector
This is an adaptation of the chapter Understanding the Garbage Collector from the book Real World OCaml, reproduced here with permission.
This chapter includes contributions from Stephen Weeks and Sadiq Jaffer.
We've described the runtime format of individual OCaml variables earlier, in<br>Memory Representation Of Values.<br>When you execute your program, OCaml manages the lifecycle of these variables<br>by regularly scanning allocated values and freeing them when they're no<br>longer needed. This in turn means that your applications don't need to<br>manually implement memory management, and it greatly reduces the likelihood<br>of memory leaks creeping into your code.
The OCaml runtime is a C library that provides routines that can be called<br>from running OCaml programs. The runtime manages a heap, which is a<br>collection of memory regions that it obtains from the operating system. The<br>runtime uses this memory to hold heap blocks that it fills up with OCaml<br>values in response to allocation requests by the OCaml program.
Mark and Sweep Garbage Collection
When there isn't enough memory available to satisfy an allocation request<br>from the pool of allocated heap blocks, the runtime system invokes the<br>garbage collector (GC). An OCaml program can't explicitly free a value when<br>it is done with it. Instead, the GC regularly determines which values are<br>live and which values are dead, i.e., no longer in use. Dead values are<br>collected and their memory made available for reuse by the application.
The GC doesn't keep constant track of values as they are allocated and used.<br>Instead, it regularly scans them by starting from a set of root values that<br>the application always has access to (such as the stack). The GC maintains a<br>directed graph in which heap blocks are nodes, and there is an edge from heap<br>block b1 to heap block b2 if some field of b1 is a pointer to b2.
All blocks reachable from the roots by following edges in the graph must be<br>retained, and unreachable blocks can be reused by the application. The<br>algorithm used by OCaml to perform this heap traversal is commonly known as<br>mark and sweep garbage collection, and we'll explain it further now.
Generational Garbage Collection
The usual OCaml programming style involves allocating many small<br>values that are used for a short period of time and then never<br>accessed again. OCaml takes advantage of this fact to improve<br>performance by using a generational GC.
A generational GC maintains separate memory regions to hold blocks based on<br>how long the blocks have been live. OCaml's heap is split into two such<br>regions:
A small, fixed-size minor heap where most blocks are initially allocated
A larger, variable-size major heap for blocks that have been live longer
A typical functional programming style means that young blocks tend to<br>die young and old blocks tend to stay around for longer than young<br>ones. This is often referred to as the generational<br>hypothesis.
OCaml uses different memory layouts and garbage-collection algorithms for the<br>major and minor heaps to account for this generational difference. We'll<br>explain how they differ in more detail next.
The Gc Module and OCAMLRUNPARAM
OCaml provides several mechanisms to query and alter the behavior of<br>the runtime system. The Gc module provides this functionality from<br>within OCaml code, and we'll frequently refer to it in the rest of the<br>chapter. As with several other standard library modules, Core alters<br>the Gc interface from the standard OCaml library. We'll assume that<br>you've opened Core in our explanations.
You can also control the behavior of OCaml programs by setting the<br>OCAMLRUNPARAM environment variable before launching your application. This<br>lets you set GC parameters without recompiling, for example to benchmark the<br>effects of different settings. The format of OCAMLRUNPARAM is documented in<br>the<br>OCaml manual.
The Fast Minor Heap
The minor heap is where most of your short-lived values are held. It consists<br>of one contiguous chunk of virtual memory containing a sequence of OCaml<br>blocks. If there is space, allocating a new block is a fast, constant-time<br>operation that requires just a couple of CPU instructions.
To garbage-collect the minor heap, OCaml uses copying collection to<br>move all live blocks in the minor heap to the major heap. This...