It's not me, it's the compiler

SVI1 pts0 comments

Parsa's Blog | bool as u32

">

It's not me, it's the compiler!

Every programmer has, at least once, thought to themselves that "it's not me, it's the compiler!". Usually,<br>we're wrong, but this is the story of the time I was actually right.

On a Saturday evening, as one usually does, I was refactoring my JavaScript engine's parser, eariler I had<br>written this code in the project:

impl LexerConsumer {<br>#[inline]<br>pub fn consume(&mut self) {<br>self.0 += 1;

#[inline]<br>pub fn consume_test(&mut self, store: &LexStore, expected: TokenKind) -> bool {<br>if self.peek(store) == expected {<br>self.consume();<br>true<br>} else {<br>false<br>Which is on its own a fairly common pattern in a parser, but I didn't love the shape of this code,<br>each branch is returning the value of the comparison, so what if I just returned the value myself?

I got to work and wrote this version instead, and I was happy with the aesthetics of the generated asm<br>in isolation, it was a few bytes shorter and didn't have the branch.

#[inline]<br>pub fn consume_test(&mut self, store: &LexStore, expected: TokenKind) -> bool {<br>let x = self.peek(store) == expected;<br>self.0 += x as u32;

mov eax,DWORD PTR [rdi]<br>mov ecx,eax<br>and ecx,0x3f<br>movzx ecx,BYTE PTR [rsi+rcx*1]<br>cmp cl,dl<br>jne 233263<br>inc eax<br>mov DWORD PTR [rdi],eax<br>cmp cl,dl<br>sete al<br>ret

mov ecx,DWORD PTR [rdi]<br>mov r8d,ecx<br>and r8d,0x3f<br>xor eax,eax<br>cmp BYTE PTR [rsi+r8*1],dl<br>sete al<br>add ecx,eax<br>mov DWORD PTR [rdi],ecx<br>ret

So I went ahead to try and parse a simple statement, and right at the top of my bash<br>history was a command to parse a for-loop.

> joe parse - 'for (var lol; false; false) {}'<br>Error was found<br>Diagnostic { kind: E079, flag: Flag(11529215046068469773), byte: 5, current_token: Var }<br>Parse error<br>Wait! What just happened?! Did I somehow mess up the function? Maybe x as u32 has different semantics<br>than I remember... let me just write it more explicitly.

#[inline]<br>pub fn consume_test(&mut self, store: &LexStore, expected: TokenKind) -> bool {<br>let x = self.peek(store) == expected;<br>self.0 += if x { 1 } else { 0 };<br>And somehow this version was working!

> joe parse - 'for (var lol; false; false) {}'<br>Parsed 8 nodes in 14ns

Raw nodes:<br>[0] POS=0 ScriptStart payload=0<br>[1] POS=9 VariableDeclaration payload=0<br>[2] POS=14 FalseLit payload=0<br>[3] POS=21 FalseLit payload=0<br>[4] POS=28 BlockStatementStart payload=0<br>[5] POS=29 BlockStatement payload=0<br>[6] POS=0 ForStatement payload=0<br>[7] POS=0 Script payload=0

POS=000 [7] Script<br>POS=000 [6] ForStatement<br>POS=009 [1] VariableDeclaration @A0<br>POS=014 [2] FalseLit<br>POS=021 [3] FalseLit<br>POS=029 [5] BlockStatement<br>For a second I doubted my sanity, but I was fairly confident that I know what bool as u32 does, I've written that<br>exact cast hundreds of times, so at that momenet I said what any desperate and mildly confused programmer would:

It's not me, it's the compiler!

I had to look at what my for statement parser is really doing, lucky for me, with my in-house custom build system looking<br>at the assembly of any function is a command away, and no it's not a cargo asm wrapper, my project doesn't use cargo,<br>since I've made my own build system in TypeScript and it runs on Deno, which actually supports --dry mode as well, so you<br>can see what it runs under the hood, yayyy transparency!

> x -b --fn ForOrInOfStatement 1 --dry<br>MKDIR ./out<br>RUN rustc src/main.rs --crate-name=joe --crate-type=staticlib<br>--edition=2024 --out-dir=./out --target=x86_64-unknown-linux-gnu<br>--cfg joe_no_libc --emit=link,obj -Crelocation-model=static<br>-Copt-level=3 -Clto -Ccodegen-units=1<br>-Cdebuginfo=line-tables-only --diagnostic-width=150<br>-Cpanic=abort -Ztemps-dir=out/tmp -Zhuman-readable-cgu-names<br>--extern proc=./out/libproc.so

RUN mold -melf_x86_64 -o ./out/joe ./out/joe.o --static<br>--package-metadata="Joe!" -zrelro -znoexecstack<br>--discard-locals --build-id --gc-sections --no-undefined<br>--icf=safe --compress-debug-sections=zlib

RUN bash -c nm out/joe | rustfilt | grep ForOrInOfStatement<br>| grep -oP '^[0-9a-f]+ t\s+\K.+' | sed -n '1p'<br>| xargs -I {} objdump -WK -M intel -d --disassembler-color=on --visualize-jumps=color<br>--demangle=rust ./out/joe --disassemble={} 2>/dev/null<br>| grep -v 'Disassembly of section' | grep -v './out/joe:' | less -R

# line breaks added for readability, btw. you're welcome.<br>Anyway... back to inspecting the assembly, but maybe you should see the rust version first, so here<br>is a trimmed down version of the code, the real function is a few hundred lines long :D

fn ForOrInOfStatement(<br>store: &mut Storage,<br>mut lex: LexerConsumer,<br>mut emit: EmitBuffer,<br>mut stack: StateStack,<br>) -> Termination {<br>debug_assert_eq!(lex.peek(store), TokenKind::For);<br>lex.consume(); // `for`

if lex.peek_test(store, TokenKind::Await/*=0x6e*/) {<br>if !stack.has_flag(Flag::AWAIT) {<br>return raise_diagnostic(store, lex, emit, stack, DiagnosticKind::E056);

if !lex.consume_test(store, TokenKind::LParen) {<br>return raise_diagnostic(store, lex, emit, stack, DiagnosticKind::E057);

wip!()

if !lex.consume_test(store,...

store self payload expected tokenkind bool

Related Articles