Three Years with Abstractionless C

gritzko1 pts0 comments

Three years with Abstractionless C

Three years with Abstractionless C

Recently, “to C or not to C” became a topic on HN, which is a nice<br>excuse to spend couple hours on ABC retrospective. The decision<br>to work in C was rather natural: the author is a C/Go, not C++/Rust<br>kind of person, so once Go runtime became a problem, C was the<br>most straightforward answer. The dirty secret of both C++ and C is<br>that these two are like IKEA or LEGO languages. Languages to create<br>other languages. For example, virtually any serious C++ user has<br>some sort of alternative standard library (Abseil, QT, there are<br>many). You don’t use C or C++ as-is, normally. C standard library<br>is small by design, so that is inevitable for most use cases.<br>C++/C standard libraries are sort of a very mixed bag, effectively<br>a chronicle of CS ideas for the last 40-50 years. If C standard lib<br>is kind of a manuscript chamber in a faraway monastery, C++ std lib<br>is more like the Library of Congress. Nobody knows it all, and most<br>of the ideas written are definitely not recommended today.

Abstractionless C resulted from many frustrations with C++ and its<br>endless quirks. I needed generics, STL-like containers, disk and<br>network serialization, some standard algorithms, with no pointer<br>arithmetics and no malloc/free headaches. Coming from Go, I clearly<br>needed slices. That was the pragmatic problem statement. Things to<br>improve productivity while doing systems-programming.

On the higher philosophical level, I wanted to avoid the cursed<br>tower-of-abstractions trap that I felt quite sharply in C++.<br>There, same bytes packaged differently become entirely different<br>incompatible entities (like std::string vs std::vector vs<br>std::valarray<> etc). I understand quite clearly what happens on<br>the bit and byte level. Lawyering about pure abstractions always<br>felt counter-productive to me, and C++ always had lots of that.<br>Many of those abstractions abstracted away things that do not exist<br>anymore, like big-endian CPUs and HDDs.

I did not want to play Jenga with imaginary bricks.

So the set of architectural choices was:

All primitive types have specified bit width and layout; that<br>gives serialization for free (u32, i64, sha256, etc).

Slices as arrays of two typed pointers, e.g. a byte slice<br>is typedef u8* u8s[2]; and a slice is non-owning.

Memory-owning buffers as arrays of four pointers, effectively<br>ring buffer logic or ptr/len/cap constructs is built in.

Generics through C templates, a known technique, enough<br>for STL-level containers: HEAPu64Pop(), HEAPu8csPop(), etc

Solid containers, pointer chasing and malloc be damned.<br>Vectors, heaps, open addressed hash maps, LSM sorted sets,<br>these are fundamentally arrays.

Naming conventions to enforce module structure, e.g.<br>void SHA1Sum(sha1* hash, u8csc from) declared in SHA1.h,<br>implemented in SHA1.c, tested in test/SHA1.c, etc.

Ragel parsers for all text formats, TLV for binary, straight<br>mmap for solid containers.

Last but not least, the primitives must effortlessly recombine.<br>u8csb is a buffer-of-const-byte-slices. sha256bMap() mmaps<br>a buffer of hashes, which might be treated as a vector, a heap,<br>or a hash set, e.g. with HASHsha256Put()/HASHsha256Get().

Slices and generics are a bit unexpected in C, the rest is just<br>another C style with a funky notation, no biggie. The obvious issue<br>here is that C does not support slices in any of its standard APIs.<br>But, the C standard library is not that huge, and its usable part<br>is even less, so unless a function is a syscall or somehow gets<br>special treatment from the compiler, what is the value of it?<br>Diminishingly zero. Especially in the LLM era. What has a lot of<br>value is the toolchain that understands C and the OS kernel. Those<br>are true megaprojects.

So, I sketched some skeleton of my (un)standard lib and started<br>working with it. The “meat” slowly grew, the thing saw one or two<br>refactors along the way, but it mainly remains a collection of<br>small and focused modules with slice-based APIs and increasingly<br>rare malloc use. The cases for malloc go down for the following<br>reasons:

anything multiple-page sized can be mmapped directly,

smaller things can live on stack,

containers are solid (#1),

ABC buffers can work as arenas for variable-length content,<br>so you deal with u8cs (two-pointer slice) and the bytes<br>live in the arena,

there is a lot of mmapped file use (in-RAM bit layout<br>matches on-disk layout, forget SPARCs and Alphas already),

the remaining cases are either malloc or something else.

Out of remaining burning questions one may mention package and<br>dependendency management. Obviously, for C that is RPM, APT, apk,<br>Brew and so on. I am not going to bring along second copies of<br>CURL, libsodium, and all the other usual suspects.

So for my purposes, it worked out fine in a 100KLoC project.<br>As L.Torvalds once said: “Standards are paper. Buy some and write<br>your own.” Or something like that.

standard like containers malloc slices years

Related Articles