Cross-Language Data Types - ekxide Blog | ekxide
Contact
Cross-Language Data Types<br>Andreas Weis - 15/06/2026<br>When using different programming languages like C++ and Rust in the same<br>project, one problem that always comes up is how to share data across language<br>boundaries.<br>In this article we will explore some of the options that can be used for sharing<br>data between C++ and Rust code. Our working assumption will be that we want<br>to share data without copying it,<br>in order to allow efficient sharing of large sets of data.<br>Memory Representation<br>The first step in allowing such forms of data sharing is to ensure that<br>our data type of choice can actually be represented in both C++ and Rust.<br>What this boils down to usually, is that we restrict ourselves to the same data<br>types that are available in C: The elementary data types for signed and<br>unsigned integers, pointers, and floating points. And the ability to build<br>compound array and struct types from those elementary types. C is de facto<br>the lingua franca when it comes to interoperability between programming<br>languages, so whenever we want to ensure that data can be passed across language<br>boundaries, we fall back to what is representable with C.<br>In C++, thanks to its backward-compatibility with C, struct types follow the<br>same memory layout, as long as they don't use any C++-only<br>language features that impact the memory layout. The C++ standard uses the term<br>standard-layout class type for such types. The C++ standard library also<br>provides the std::is_standard_layout type trait to check whether a type<br>upholds these constraints.<br>cpp<br>// A point in 3D space<br>struct Point3 {<br>std::int32_t x;<br>std::int32_t y;<br>std::int32_t z;<br>};<br>static_assert(std::is_standard_layout_vPoint3>);
Rust by default reserves a lot more liberties for the exact layout of its data<br>types, but it provides the repr(C) representation for forcing types<br>to use a memory layout that is compatible with C.<br>rust<br>// A point in 3D space<br>#[repr(C)]<br>struct Point3 {<br>x: i32,<br>y: i32,<br>z: i32,
While each language provides mechanisms for ensuring that the declared type uses<br>the correct memory layout to be compatible with C, there is no built-in way in<br>the languages to ensure that the two types from the C++ and Rust world are<br>compatible with each other.<br>We must be extra careful to ensure that the declarations are indeed consistent.<br>The reward for those struggles is that we end up with data types that have the<br>exact same layout in both languages, so we can send the raw bits from Rust to<br>C++ (or vice versa) and they can be directly accessed with the same meaning<br>in the other language.<br>Preserving Type Invariants<br>As long as our shared data is nothing more than a soup of numerical values,<br>ensuring a consistent memory layout is all we need. For more complex data types,<br>additional concerns may arise, in particular if a type relies on complex<br>invariants regarding its state.<br>The valid values for a member of the type are often constrained, potentially<br>depending on the value of other fields. For example, consider the following type<br>representing rational numbers:<br>rust<br>struct Rational {<br>numerator: i32,<br>denominator: i32,
cpp<br>struct Rational {<br>std::int32_t numerator;<br>std::int32_t denominator;<br>};
The denominator must not be set to 0. Also, if the fraction is stored in<br>reduced form, each change to one of the fields potentially requires a change to<br>the other to maintain the reduced form. Violating these constraints may result<br>in a value that is no longer valid for this type.<br>Such problems are well addressed by the use of encapsulation. Encapsulation<br>requires a set of operations to be shipped alongside the data. Data is not<br>accessed directly, but only via the operations operating on the type, which<br>in turn have been carefully designed (and tested) to uphold any type<br>invariants. In the example above, a setter for the fraction could reject<br>values of 0 for the denominator and take care of properly reducing the fraction<br>when writing the fields.<br>For complex types, it is not sufficient to ensure a consistent memory<br>layout, we must also ensure that the surrounding program logic operating on<br>such data is consistent.<br>There are generally two ways to address this.<br>Language Bindings<br>Instead of just sharing the data layout between languages, this approach shares<br>code as well. We implement the methods that act on the underlying data once in<br>our programming language of choice, and then provide bindings for all the other<br>languages that allow invoking these functions. Internally these bindings will<br>use the foreign function interface (FFI) of the respective language.<br>The obvious advantage of this approach is that it is very easy to enforce<br>consistency between implementations, as there is only one single implementation<br>of the core logic interacting with the data. The maintenance and testing burden<br>is also carried in large part by that single implementation.<br>The downside of this approach is that the complete interface of the type will<br>have to fit through the...