Three Years of Abstractionless C

gritzko1 pts0 comments

Three years with Abstractionless C

Three years with Abstractionless C

Recently, “to C or not to C” became a topic on HN, which is a nice<br>excuse to spend couple hours on ABC retrospective. The decision<br>to work in C was rather natural: the author is a C/Go, not C++/Rust<br>kind of person, so once go runtime became a problem, C was the<br>most straightforward answer. The dirty secret of both C++ and C is<br>that these two are like IKEA or LEGO languages. Languages to create<br>other languages. For example, virtually any serious C++ user has<br>some sort of alternative standard library (Abseil, QT, there are<br>many). You don’t use C or C++ as-is, normally. C standard library<br>is small by design, so that is inevitable for most use cases.<br>C++/C standard libraries are sort of a very mixed bag, effectively<br>a chronicle of CS ideas for the last 40-50 years. If C standard lib<br>is kind of a manuscript chamber in a faraway monastery, C++ std lib<br>is more like the Library of Congress. Nobody knows it all, and most<br>of the ideas written are definitely not recommended today.

Abstractionless C resulted from many frustrations with C++ and its<br>endless quirks. I needed generics, STL-like containers, disk and<br>network serialization, some standard algorithms, with no pointer<br>arithmetics and no malloc/free headaches. Coming from Go, I clearly<br>needed slices. That was the pragmatic problem statement. On the<br>higher level, I wanted to avoid the tower-of-abstractions trap that<br>I felt quite sharply in C++. There, same bytes packaged differently<br>become an entirely different incompatible story (like std::string<br>vs std::vector vs std::vector etc). The fact<br>that C++ char is neither signed nor unsigned and all those quirks<br>that sound like a really strange religion – those drive me mad.

So the set of architectural choices was:

All primitive types have specified bit width and layout; that<br>gives serialization for free (u32, i64, sha256, etc).

Slices as arrays of two typed pointers, e.g. a byte slice is<br>typedef u8* u8s[2]; and a slice is non-owning.

Memory-owning buffers as arrays of four pointers, effectively<br>ring buffer logic or ptr/len/cap constructs is built in.

Generics through C templates, a known technique, enough for<br>STL-level containers.

Solid containers, pointer chasing and malloc be damned.<br>Vectors, heaps, open addressed hash maps, LSM sorted sets.

Naming conventions to enforce module structure, e.g.<br>void SHA1Sum(sha1* hash, u8csc from) declared in SHA1.h,<br>implemented in SHA1.c, tested in test/SHA1.c, etc.

Slices and generics are a bit unexpected in C, the rest is just<br>another C style, nothing out of the ordinary. The obvious issue<br>here is that C does not support slices in any of its standard APIs.<br>But, the C standard library is not that huge, and its usable part<br>is even less, so unless a function is a syscall or somehow<br>preferentially treated by the compiler, what is the value of it?<br>Diminishingly zero. Especially in the LLM era. What has a lot of<br>value is the toolchain that understands C and the OS kernel. Those<br>are true megaprojects.

So, I sketched some skeleton of my (un)standard lib and started<br>working with it. The “meat” slowly grew, the thing saw one or two<br>refactors along the way, but it mainly remains a collection of<br>small and focused modules with slice-based APIs and increasingly<br>rare malloc use. The cases for malloc go down for the following<br>reasons:

anything multiple-page sized can be mmapped directly,

smaller things can live on stack,

containers are solid (#1),

ABC buffers can work as arenas for variable-length content,<br>so you deal with u8cs (two-pointer slice) and the bytes<br>live in the arena,

there is a lot of mmapped file use (in-RAM bit layout<br>matches on-disk layout, forget SPARCs and Alphas already),

the remaining cases are either malloc or something else.

Out of remaining burning questions one may mention package and<br>dependendency management. Obviously, for C that is RPM, APT, apk,<br>Brew and so on. I am not going to bring along second copies of<br>CURL, libsodium, and all the other usual suspects.

So for my purposes, it worked out fine. As L.Torvalds once said:<br>“Standards are paper. Buy some and write your own.”<br>Or something like that.

standard like malloc years abstractionless library

Related Articles