Memory Safe Inline Assembly

pizlonator1 pts0 comments

Memory Safe Inline Assembly

Home

Installing

Documentation

Releases

GitHub

Meet Fil

Memory Safe Inline Assembly

NOTE: This is a pre-release feature. The Fil-C 0.679 release does not ship with this feature. To test this feature, you need to build from source.

GCC and clang both support an incredibly powerful inline assembly syntax. For example:

unsigned rotate(unsigned x, unsigned char c)<br>asm("roll %1, %0" : "+r"(x) : "c"(c) : "cc");<br>return x;

Instructs the compiler to emit assembly based on the roll %1, %0 template, where %1 is filled in with %cl, %0 is filled in with whichever register holds x, and c is moved into the %ecx register just before the roll instruction. Additionally, the compiler is told that the instruction will change the value of x and change the value of control flags.

This seems like it cannot possibly be safe! What if the programmer did something wrong, like omitted the + in "+r", or forgot th the "cc" clobber? In Yolo-C, if you make such a mistake, the compiler happily miscompiles your code in those cases.

Yet Fil-C supports this inline assembly syntax and it's completely safe!

This document explains why Fil-C supports inline assembly at all and then goes into the details of how that support is achieved while maintaining both programmer intent (you still get the assembly template you asked for) and complete memory safety (if you do something wrong, you'll panic or get an illegal instruction trap, at worst).

Why Inline Assembly?

While reviewing folks' C and C++ code, I've found the following reasons for inline assembly, where 1 is most common:

Blank inline assembly to prevent compiler analysis. This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst). It works because we're telling the compiler that the inline assembly clobbers all memory, which forces the compiler to serialize memory accesses, just like a signal fence would have. The contract with the compiler is clear: the compiler must emit exactly the assembly we're asking it to emit (which is blank here) without second-guessing our claims about the clobbers. That is, the compiler must not infer that because the assembly is blank then there cannot be a memory clobber. We said memory clobber, so that's what the compiler sees. Similarly, folks do stuff like asm("" : "+r(x)). This means: the assembly may read and then write x. The assembly is blank, so this incurs no cost other than forcing the compiler to assume that it doesn't know anything about x's value after the assembly executes. This kind of data flow fence is useful for writing constant-time crypto. Fil-C has long supported blank inline assembly since it's trivially safe. Fil-C even supports "+r" constraints on pointers, in which case both the intval and lower are threaded through their own "+r"-like constraints at the LLVM IR level.

cpuid and xgetbv. The inline assembly snippets for these two instructions occur most often in code that then goes on to use SIMD intrinsics. I think this is because the __get_cpuid API in cpuid.h is confusing to use and, as far as I can tell, does not work right in either GCC or clang. Hence, packages like zstd, simdutf, simdjson, and other SIMD-using programs tend to identify CPU features by using inline assembly that invokes cpuid. They often also use inline assembly to invoke to invoke xgetbv as well. In Fil-C, __get_cpuid is fixed, so you could use that, and zxgetbv is offered as an intrinsic. However, it's better to support those inline assembly snippets without requiring folks to change their code! And there's nothing unsafe about invoking cpuid and xgetbv so long as the code specifies the right clobbers and constraints.

Arithmetic over secrets in crypto code. A great example is OpenSSH's sntrup761 implementation, which wraps key arithmetic in inline assembly to ensure that it gets exactly the right instruction and not some instruction that might have varying execution time depending on inputs. Note that this kind of code often has fallbacks to try to get the compiler to emit constant-time code even if inline assembly is not supported, but those fallbacks are unlikely to be as rigorously validated, and often rely on "optimization blocking" idioms that hurt performance and could be circumvented by a sufficiently clever compiler. Hence, it's safest to support inline assembly snippets that do this. Luckily, these snippets are also completely safe, provided that the constraints and clobbers are correct.

Atomics. Compilers have long supported intrinsics for atomic instructions. Compilers also have a long history of implementing these intrinsics incorrectly! Most recently, clang had bugs in how it lowered CAS to LL/SC on ARM64. Hence, serious lock-free programmers tend to write their atomic instructions using inline assembly at least some of the time, like in those cases where they had encountered a miscompile and so dropping to assembly was their...

assembly inline compiler memory like code

Related Articles