Everything in C is undefined behavior

lycopodiopsida2 pts0 comments

-->

Everything in C is undefined behavior

If he had been a programmer, Cardinal Richelieu would have said<br>“Give me six lines written by the hand of the most expert C programmer in the<br>world, and I will find enough in them to trigger undefined behavior”.

Nobody can write correct C, or C++. And I say that as someone who’s written C<br>and C++ on an almost daily basis for about 30 years. I listen to C++ podcasts.<br>I watch C++ conference talks. I enjoy reading and writing C++.

C++ has served us well, but it’s 2026, and the environment of 1985 (C++) or<br>1972 (C) is not the environment of today.

I’m definitely not the first to say this. I remember reading a post by someone<br>prominent about a decade ago saying that a good case can be made that use of<br>C++ is a SOX violation. And while I was not onboard with the rest of<br>their rant (nor their confusion about “its” vs “it’s”), I never disagreed about<br>that point.

With time I found it to be more and more true. WAY more things are<br>undefined behavior (UB) than you’d expect.

Everyone knows that double-free, use after free, accessing outside the bounds<br>of an object (e.g. array), and accessing uninitialized memory is UB. After all,<br>C/C++ is not a memory safe language. And yet we as an industry seem to be<br>unable to stop making even those mistakes over and over.

But there’s more. More subtle. More illogical.

It’s not about optimizations

Some people seem to think that as long as they don’t compile with optimizations<br>turned on, undefined behavior can’t hurt them. They believe that the compiler<br>is somehow being deliberately hostile, going “AHA! UB! I can do whatever I want<br>here!”, and without optimizations turned on it won’t.

This is incorrect.

UB doesn’t mean that the compiler can take advantage of your sloppiness. UB<br>means that the compiler can assume that your code is valid. It means that the<br>intention of your code that’s oh so obvious when read by a human, doesn’t even<br>have a way to be expressed between compiler stages or modules.

UB means that the compiler doesn’t even have to implement some special cases in<br>its code generation, because they “can’t happen”.

The compiler, and really the underlying hardware too, is playing a game of<br>telephone with your UB intentions. It may end up with what you wanted, but<br>there’s no guarantee for now or in the future.

UB is everywhere

The following is not an attempt at enumerating all the UB in the world. It’s<br>merely making the case that UB is everywhere, and if nobody can do it right,<br>how is it even fair to blame the programmer? My point is that ALL<br>nontrivial C/C++ code has UB.

Accessing an object which is not correctly aligned

As an example of this, take this code:

int foo(const int* p) {<br>return *p;

If this function is called with a pointer not correctly aligned (probably<br>meaning on an address that’s a multiple of sizeof(int), but who knows), this<br>is UB. C23 6.3.2.3.

On Linux Alpha, in some cases this would merely trap to the kernel, which<br>would software emulate what you intended. In other cases it would (probably)<br>crash your program with a SIGBUS.

On SPARC it would cause a SIGBUS.

Sure, on x86/amd64 (henceforth just “x86”) this is likely fine. Hell, it’s<br>probably even an atomic read. x86 is famously extremely forgiving about cache<br>coherency subtleties.

So here we have three cases:

kernel gave a helping hand (Alpha for some loads)

crash (other Alpha loads, and SPARC)

not a problem (x86)

What about ARM, RISC-V, and others? What about future architectures? A future<br>architecture could even have special int-pointer registers that do not<br>populate the lowest bits, because such pointers cannot exist.

Even if it works, maybe the compiler one day changes from using one load<br>instruction to another, and suddenly that’s no longer fixed up by the kernel.

Because the compiler is not obligated to generate assembly instructions that<br>work on unaligned pointers. Because it’s UB.

Or how about this:

void set_it(std::atomicint>* p) {<br>p->store(123);<br>int get_it(std::atomicint>* p) {<br>return p->load();

Is this operation atomic when the object is not correctly aligned? That’s the<br>wrong question to ask. Mu, unask the question. It’s UB. (but also yes, in<br>practice this can easily be an atomicity problem)

If you want to get even more convinced, you can try thinking about what happens<br>if an object you thought you were reading atomically spans pages. But<br>don’t think too much about it, or you may conclude that “it’s fine”. It’s not.<br>It’s UB.

Actually, it was UB even before that

Don’t blame the foo() function, above. The act of dereferencing the pointer<br>wasn’t the problem. Merely creating the pointer was enough to be a problem.

Example:

bool parse_packet(const uint8_t* bytes) {<br>const int* magic_intp = (const int*)bytes; // UB!<br>int magic_raw = foo(magic_intp); // Probably crashes on SPARC.<br>int magic = ntohl(magic_raw); // this is fine, at least.<br>[…]

That cast is the problem, not foo().

It’s perfectly valid for the compiler to assign...

even compiler undefined behavior code problem

Related Articles