Understanding the Garbage Collector

jkxyz1 pts0 comments

Understanding the Garbage Collector · OCaml Documentation

Introduction

Values and Functions

Basic Data Types and Pattern Matching

Loops and Recursions

Lists

Higher Order Functions

Labelled and Optional Arguments

Mutability and Imperative Control Flow

Module System

Modules

Functors

First-Class Modules

Libraries With Dune

Data Structures

Options

Arrays

Maps

Sets

Hash Tables

Sequences

CS3110

Memoization

CS3110

Monads

Advanced Topics

Preprocessors and PPXs

Operators

Objects

Runtime & Compiler

RWO

Memory Representation of Values

RWO

Understanding the Garbage Collector

RWO

Compiler Frontend

RWO

Compiler Backend

Understanding the Garbage Collector

This is an adaptation of the chapter Understanding the Garbage Collector from the book Real World OCaml, reproduced here with permission.

This chapter includes contributions from Stephen Weeks and Sadiq Jaffer.

We've described the runtime format of individual OCaml variables earlier, in<br>Memory Representation Of Values.<br>When you execute your program, OCaml manages the lifecycle of these variables<br>by regularly scanning allocated values and freeing them when they're no<br>longer needed. This in turn means that your applications don't need to<br>manually implement memory management, and it greatly reduces the likelihood<br>of memory leaks creeping into your code.

The OCaml runtime is a C library that provides routines that can be called<br>from running OCaml programs. The runtime manages a heap, which is a<br>collection of memory regions that it obtains from the operating system. The<br>runtime uses this memory to hold heap blocks that it fills up with OCaml<br>values in response to allocation requests by the OCaml program.

Mark and Sweep Garbage Collection

When there isn't enough memory available to satisfy an allocation request<br>from the pool of allocated heap blocks, the runtime system invokes the<br>garbage collector (GC). An OCaml program can't explicitly free a value when<br>it is done with it. Instead, the GC regularly determines which values are<br>live and which values are dead, i.e., no longer in use. Dead values are<br>collected and their memory made available for reuse by the application.

The GC doesn't keep constant track of values as they are allocated and used.<br>Instead, it regularly scans them by starting from a set of root values that<br>the application always has access to (such as the stack). The GC maintains a<br>directed graph in which heap blocks are nodes, and there is an edge from heap<br>block b1 to heap block b2 if some field of b1 is a pointer to b2.

All blocks reachable from the roots by following edges in the graph must be<br>retained, and unreachable blocks can be reused by the application. The<br>algorithm used by OCaml to perform this heap traversal is commonly known as<br>mark and sweep garbage collection, and we'll explain it further now.

Generational Garbage Collection

The usual OCaml programming style involves allocating many small<br>values that are used for a short period of time and then never<br>accessed again. OCaml takes advantage of this fact to improve<br>performance by using a generational GC.

A generational GC maintains separate memory regions to hold blocks based on<br>how long the blocks have been live. OCaml's heap is split into two such<br>regions:

A small, fixed-size minor heap where most blocks are initially allocated

A larger, variable-size major heap for blocks that have been live longer

A typical functional programming style means that young blocks tend to<br>die young and old blocks tend to stay around for longer than young<br>ones. This is often referred to as the generational<br>hypothesis.

OCaml uses different memory layouts and garbage-collection algorithms for the<br>major and minor heaps to account for this generational difference. We'll<br>explain how they differ in more detail next.

The Gc Module and OCAMLRUNPARAM

OCaml provides several mechanisms to query and alter the behavior of<br>the runtime system. The Gc module provides this functionality from<br>within OCaml code, and we'll frequently refer to it in the rest of the<br>chapter. As with several other standard library modules, Core alters<br>the Gc interface from the standard OCaml library. We'll assume that<br>you've opened Core in our explanations.

You can also control the behavior of OCaml programs by setting the<br>OCAMLRUNPARAM environment variable before launching your application. This<br>lets you set GC parameters without recompiling, for example to benchmark the<br>effects of different settings. The format of OCAMLRUNPARAM is documented in<br>the<br>OCaml manual.

The Fast Minor Heap

The minor heap is where most of your short-lived values are held. It consists<br>of one contiguous chunk of virtual memory containing a sequence of OCaml<br>blocks. If there is space, allocating a new block is a fast, constant-time<br>operation that requires just a couple of CPU instructions.

To garbage-collect the minor heap, OCaml uses copying collection to<br>move all live blocks in the minor heap to the major heap. This...

ocaml heap blocks values garbage memory

Related Articles