Zig Structs of Arrays (2024)

Zig structs of arrays | Andreas Hohmann

Zig structs of arrays

Andreas Hohmann May 04, 2024 #software #zig #data oriented programming #comptime

If you want to see the power of Zig's comptime, look no further than MultiArrayList. This collection stores the data of a list of structs in struct of array (SoA) rather than an array of structs (AoS). This technique is a staple of data-oriented design and array-oriented programming (long live APL) as used in high-performance applications such as game engines, scientific computing, and compilers. The latter is probably the reason why it already exists in Zig's standard library (see Andrew Kelley's Practical Data-Oriented Design presentation).

Generating a struct-of-arrays type for a struct is a non-trivial type manipulation that I expect from type-oriented languages such as TypeScript or Scala, but not from a low-level system programming language such as Zig. How does Zig pull this off? It turns out that it's "just" compile-time execution and reflection.

Types in Zig are compile time values. They can be assigned to constants, passed as arguments to (compile-time) functions, and returned by those functions. This is already visible in the syntax for type definitions. A named struct type, for example, is defined by initializing a constant with an anonymous struct declaration.

1const Token = struct { 2 kind: enum { id, string, number }, 3 data: []const u8, 4}; 6test "token size" { 7 try testing.expectEqual(24, @sizeOf(Token)); 8 try testing.expectEqual(2400, @sizeOf([100]Token)); 9}

As the test shows, this structure needs 24 bytes on my 64 bit machine, 2*8 bytes for the data slice (pointer and length) and 8 bytes for the kind enum and alignment. An array of 100 Token structs needs 2400 bytes.

The MultiArrayList cuts the memory needed down to 1700 bytes: 1600 bytes for the data slices and 100 bytes for the kinds. The test below demonstrates this by using a fixed buffer allocator with exactly 1704 bytes (I don't know why the 4 extra bytes are needed).

1const TokenList = std.MultiArrayList(Token); 3test "token lists" { 4 var buffer: [1704]u8 = undefined; 5 var fba = std.heap.FixedBufferAllocator.init(&buffer); 6 const allocator = fba.allocator(); 8 var tokens = TokenList{}; 9 try tokens.setCapacity(allocator, 100); 10 defer tokens.deinit(allocator); 11 12 tokens.appendAssumeCapacity(.{ .kind = .number, .data = "1000" }); 13 try testing.expectEqual(Token{ .kind = .number, .data = "1000" }, tokens.get(0)); 14}

How does this work under the hood? Types are passed to a function using a compile-time (comptime) parameter of type type. This is the foundation for Zig's generic types such as the collections in the standard library. Here is a minimal version of an array list of fixed size (see array_list.zig for the real thing).

1pub fn FixedArrayList(comptime T: type) type { 2 return struct { 3 const Self = @This(); 5 items: []T, 6 allocator: Allocator, 8 pub fn init(allocator: Allocator, length: usize) Allocator.Error!Self { 9 return .{ 10 .allocator = allocator, 11 .items = try allocator.alloc(T, length), 12 }; 13 } 14 15 pub fn deinit(self: *Self) void { 16 self.allocator.free(self.items.ptr[0..self.items.len]); 17 } 18 }; 19} 20 21test "allocates array list" { 22 const allocator = testing.allocator; 23 24 const n = 10; 25 const PointList = FixedArrayList(Point); 26 var points = try PointList.init(allocator, n); 27 defer points.deinit(); 28 29 points.items[5] = .{ .x = 10, .y = 20 }; 30 try testing.expectEqual(20, points.get(5).y); 31}

This is not a very useful structure as it only wraps a slice that one could use in most cases directly, but it demonstrates a simple type generator function. The type argument T is only used once for the type of the items slice, and we don't need to know anything about the inner structure of T.

To construct the struct-of-arrays for a given struct, we have to go a step further. Our type construction function must be able to look at the original struct's fields and their types to construct the new struct-of-arrays type. Zig offers a compile-time reflection API for this purpose.

To get a feel for this API, let's define a type constructor for points with explicit coordinates x1, x2, x3, and so forth. Our type function takes the type T of the coordinates and the dimension n and returns a struct with one field for each of the dimensions.

1pub fn PointN(comptime T: type, comptime N: comptime_int) type { 2 var fields: [N]std.builtin.Type.StructField = undefined; 3 for (0..N) |i| { 4 var num_buf: [128]u8 = undefined; 5 fields[i] = .{ 6 .name = std.fmt.bufPrintZ(&num_buf, "x{d}", .{i + 1}) catch unreachable, 7 .type = T, 8 .default_value = null, 9 .is_comptime = false, 10 .alignment = @alignOf(T), 11 }; 12 } 13 return @Type(.{ 14 .Struct = .{ 15 .is_tuple = false, 16 .layout = .Auto, 17 .decls = &.{}, 18 .fields = &fields, 19 }, 20 }); 21} 22 23test "n-dimensional point" { 24 const P3 = PointN(i32, 3); 25 const p = P3{ .x1 = 10, .x2 = 20, .x3...

Zig Structs of Arrays (2024)

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy