Zig structs of arrays | Andreas Hohmann
Zig structs of arrays
Andreas Hohmann May 04, 2024 #software #zig #data oriented programming #comptime
If you want to see the power of Zig's comptime, look no further than<br>MultiArrayList. This collection stores the data of a list of<br>structs in struct of array (SoA) rather than an array of structs (AoS). This<br>technique is a staple of data-oriented design and<br>array-oriented programming (long live APL) as used in<br>high-performance applications such as game engines, scientific computing, and<br>compilers. The latter is probably the reason why it already exists in Zig's<br>standard library (see Andrew Kelley's Practical Data-Oriented<br>Design presentation).
Generating a struct-of-arrays type for a struct is a non-trivial type<br>manipulation that I expect from type-oriented languages such as TypeScript or<br>Scala, but not from a low-level system programming language such as Zig. How<br>does Zig pull this off? It turns out that it's "just" compile-time execution<br>and reflection.
Types in Zig are compile time values. They can be assigned to constants, passed<br>as arguments to (compile-time) functions, and returned by those functions. This<br>is already visible in the syntax for type definitions. A named struct type, for<br>example, is defined by initializing a constant with an anonymous struct<br>declaration.
1const Token = struct {<br>2 kind: enum { id, string, number },<br>3 data: []const u8,<br>4};<br>6test "token size" {<br>7 try testing.expectEqual(24, @sizeOf(Token));<br>8 try testing.expectEqual(2400, @sizeOf([100]Token));<br>9}
As the test shows, this structure needs 24 bytes on my 64 bit machine, 2*8<br>bytes for the data slice (pointer and length) and 8 bytes for the kind enum<br>and alignment. An array of 100 Token structs needs 2400 bytes.
The MultiArrayList cuts the memory needed down to 1700 bytes:<br>1600 bytes for the data slices and 100 bytes for the kinds. The test below<br>demonstrates this by using a fixed buffer allocator with exactly 1704 bytes (I<br>don't know why the 4 extra bytes are needed).
1const TokenList = std.MultiArrayList(Token);<br>3test "token lists" {<br>4 var buffer: [1704]u8 = undefined;<br>5 var fba = std.heap.FixedBufferAllocator.init(&buffer);<br>6 const allocator = fba.allocator();<br>8 var tokens = TokenList{};<br>9 try tokens.setCapacity(allocator, 100);<br>10 defer tokens.deinit(allocator);<br>11<br>12 tokens.appendAssumeCapacity(.{ .kind = .number, .data = "1000" });<br>13 try testing.expectEqual(Token{ .kind = .number, .data = "1000" }, tokens.get(0));<br>14}
How does this work under the hood? Types are passed to a function using a<br>compile-time (comptime) parameter of type type. This is the foundation for<br>Zig's generic types such as the collections in the standard library.<br>Here is a minimal version of an array list of fixed size (see<br>array_list.zig for the real thing).
1pub fn FixedArrayList(comptime T: type) type {<br>2 return struct {<br>3 const Self = @This();<br>5 items: []T,<br>6 allocator: Allocator,<br>8 pub fn init(allocator: Allocator, length: usize) Allocator.Error!Self {<br>9 return .{<br>10 .allocator = allocator,<br>11 .items = try allocator.alloc(T, length),<br>12 };<br>13 }<br>14<br>15 pub fn deinit(self: *Self) void {<br>16 self.allocator.free(self.items.ptr[0..self.items.len]);<br>17 }<br>18 };<br>19}<br>20<br>21test "allocates array list" {<br>22 const allocator = testing.allocator;<br>23<br>24 const n = 10;<br>25 const PointList = FixedArrayList(Point);<br>26 var points = try PointList.init(allocator, n);<br>27 defer points.deinit();<br>28<br>29 points.items[5] = .{ .x = 10, .y = 20 };<br>30 try testing.expectEqual(20, points.get(5).y);<br>31}
This is not a very useful structure as it only wraps a slice that one could use<br>in most cases directly, but it demonstrates a simple type generator function.<br>The type argument T is only used once for the type of the items slice, and<br>we don't need to know anything about the inner structure of T.
To construct the struct-of-arrays for a given struct, we have to go a step<br>further. Our type construction function must be able to look at the original<br>struct's fields and their types to construct the new struct-of-arrays type.<br>Zig offers a compile-time reflection API for this purpose.
To get a feel for this API, let's define a type constructor for points with<br>explicit coordinates x1, x2, x3, and so forth. Our type function takes<br>the type T of the coordinates and the dimension n and returns a struct<br>with one field for each of the dimensions.
1pub fn PointN(comptime T: type, comptime N: comptime_int) type {<br>2 var fields: [N]std.builtin.Type.StructField = undefined;<br>3 for (0..N) |i| {<br>4 var num_buf: [128]u8 = undefined;<br>5 fields[i] = .{<br>6 .name = std.fmt.bufPrintZ(&num_buf, "x{d}", .{i + 1}) catch unreachable,<br>7 .type = T,<br>8 .default_value = null,<br>9 .is_comptime = false,<br>10 .alignment = @alignOf(T),<br>11 };<br>12 }<br>13 return @Type(.{<br>14 .Struct = .{<br>15 .is_tuple = false,<br>16 .layout = .Auto,<br>17 .decls = &.{},<br>18 .fields = &fields,<br>19 },<br>20 });<br>21}<br>22<br>23test "n-dimensional point" {<br>24 const P3 = PointN(i32, 3);<br>25 const p = P3{ .x1 = 10, .x2 = 20, .x3...