Jane Street Blog - Using OxCaml to implement type-safe reference counting between OCaml and Python
Using OxCaml to implement type-safe reference counting between OCaml and Python
Jun 15, 2026 |
14 min read
Share on Facebook
Share on Twitter
Share on LinkedIn
By: Nicolas Trangez
Jane Street is known for being an OCaml shop, but for years now Python has been our second<br>major programming language, acting as the primary tool for data analysis and (especially<br>importantly these days) machine learning. Most of our traders and researchers think and<br>write in Python, even as the majority of our infrastructure is written in OCaml.
So it’s been important to support a bridge between these two languages. For that we<br>developed PyOCaml, which lets authors expose a Python interface to their OCaml<br>library. The trouble is that sometimes things fall off the bridge: in particular, when you<br>represent a Python object as an OCaml value, the interactions between the different<br>languages’ memory management systems can lead to object deallocations getting materially<br>delayed. For simple, small-scale data types this is no big deal, but for programs working<br>with huge data frames or scarce resources like GPU memory, it’s a real problem.
We ended up developing a solution that relied on some nifty features of<br>OxCaml, a set of language extensions for OCaml intended to support<br>high-performance programs and data-race free parallelism. These features have allowed us<br>to encode prompt deallocation in a typesafe way. When PyOCaml library authors use these<br>new features, the compiler can actually statically guarantee that Python programs written<br>against them won’t have those promptness problems. This is a big win: in the old world, it<br>was theoretically possible to write Python that avoided losing track of objects, but it<br>required an impractical level of care and expertise. Now such offending programs are<br>impossible to write by construction.
To understand how that works, it helps first to know how Python objects are allocated and<br>GC’d; how they’re represented in OCaml; and how we can borrow the idea of “borrowing” to<br>implement explicit and typesafe reference counting between the two.
A primer on Python objects and their lifecycle
In Python, every object is allocated in memory using a structure with some type-specific<br>layout. The first fields in the structure are shared across all structures, and include<br>information like the type of the object and a reference count field.
Unlike OCaml, where the garbage collector is scanning and<br>moving, Python objects are reference counted. A<br>freshly created object has reference count 1; when a new reference to the object is<br>created (e.g., the object gets stored in a list), the reference count needs to be<br>incremented; when an object goes out of scope, the count is decremented. Once the count<br>reaches 0, the object is deallocated:
my_list = [] # Refcount of my_list is 1<br>my_dict = {}<br>my_dict["my_list"] = my_list # Refcount of my_list is 2<br>del my_list # Refcount of my_list is 1<br>del my_dict["my_list"] # Refcount of my_list went to 0, deallocated
Borrowing and stealing
In Python, when you pass an arg into a function, there are two ways to ensure that its<br>reference count is managed correctly. One is called “borrowing.” When a function borrows a<br>reference to the object, it doesn’t increment its reference count.
Code can “borrow” a reference to an object. As an example, when calling a function with<br>some argument, the argument object can be borrowed from the caller during the function<br>call, as long as the object doesn’t outlive the call:
def g():<br>obj = object() # We just made a new object.<br># Exactly one name (`obj`) points at it. Refcount = 1.
res = f(obj) # We call f, passing the object in.
def f(arg):<br># arg is just another name for the same object `obj`.<br># Counter is still 1, NOT 2.<br># That's "borrowing": passing into a function<br># does not increase the count.
res = [arg]<br># Now we put it inside a list.<br># The list is a new, persistent container that points at the object.<br># That DOES bump the count: 1 -> 2.
return res
Note that the above code merely demonstrates the concept, but isn’t actually true, in the<br>sense that the actual interpreter does things slightly differently (and even depends on<br>the exact version). The above applies only to functions that are not implemented in pure<br>Python (but are rather exposed to Python by some extension module written in, say, C).
Suppose g continued:
def g():<br>obj = object()<br>res = f(obj) # res is the list. The list still points at obj.<br># Counter on obj is 2: one from `obj`, one from the list.
del obj # Drop the name `obj`. Counter: 2 -> 1.<br># The list still has it.
del res # Drop the list. The list is freed,<br># which releases its reference to obj.<br># Counter: 1 -> 0. obj is freed.
This is safe: we know inside f that the reference count of arg will not go to 0, because<br>the caller g still holds a reference that’s valid at least until f returns.
Meanwhile, some APIs...