Breaking what a program is
Breaking what a program is<br>June 1, 2026A while ago I went looking for where the kernel draws the line between “program” and “data”. The way to find an edge is to push on it until something happens, so most of this post is me trying to break what a program is, from the kernel’s point of view - taking real programs apart, and handing the kernel things it should not accept - until the rule it is enforcing becomes clear.<br>If you wanna follow along, you should probably take a look at these: the ELF format is laid out in man 5 elf, and the kernel side - who is allowed to run what - lives in fs/exec.c and the fs/binfmt_*.c files in the kernel source.<br>Gutting a real one<br>Let’s start with ls: 155 KB of machine code, strings, and tables, and it obviously runs. But the kernel reads almost none of that to decide whether to run it - only the ELF header at the very front. So let’s copy ls and start removing fields from that header, one at a time.<br>frn@debian:~$ cp /bin/ls myls<br>frn@debian:~$ readelf -h myls | grep -E 'Entry point|Number of program headers'<br>Entry point address: 0x6760<br>Number of program headers: 14<br>The program headers tell the kernel what to map into memory; the entry point tells it where to start once it has. Their offsets in the file: the header is a fixed-layout struct, and man 5 elf gives the order of its fields. The important thing is that the program-header count is at byte 56.<br>Let’s start with the program headers. Our first task is to open the copy, seek to 56, write two zero bytes over the count, then ask readelf whether it really went to zero:<br>frn@debian:~$ python3 -c 'f = open("myls", "r+b"); f.seek(56); f.write(b"\x00\x00")'<br>frn@debian:~$ readelf -h myls | grep 'Number of program headers'<br>Number of program headers: 0<br>All 155 KB of ls is still sitting in the file. But:<br>frn@debian:~$ ./myls<br>bash: ./myls: cannot execute binary file: Exec format error<br>ENOEXEC. With no program headers the kernel cannot tell which bytes are code or where they go in memory, so there is nothing for it to load. Now let’s put ls back and break the other field instead - the entry point, an eight-byte field 24 bytes into the file:<br>frn@debian:~$ cp /bin/ls myls<br>frn@debian:~$ python3 -c 'f = open("myls", "r+b"); f.seek(24); f.write(b"\x00" * 8)'<br>frn@debian:~$ readelf -h myls | grep 'Entry point'<br>Entry point address: 0x0<br>This time the program headers are intact, so the kernel loads the file happily - and then jumps to the entry point to start running, which is now zero, where there is no code:<br>frn@debian:~$ ./myls<br>Segmentation fault<br>It loaded, then died. So the rule looks like this: a program is something the kernel can map into memory and start executing - a segment to load and an address to jump to. Two fields out of 155 KB.<br>Hijacking who runs first<br>Except the entry point is not quite the whole truth. ls is dynamically linked, and when you run a dynamically-linked program the kernel does not jump to its entry point at all. It jumps to the interpreter - the dynamic linker - and lets that load the libraries and eventually call your code. The interpreter is just another field in the file, a path string, and readelf will point it out:<br>frn@debian:~$ readelf -p .interp /bin/ls<br>[ 0] /lib64/ld-linux-x86-64.so.2<br>So what if we point it somewhere else? Let’s write three instructions of freestanding assembly that print a line and exit and use it as the interpreter:<br>frn@debian:~$ cat fakeld.s<br>.global _start<br>_start:<br>mov $1, %rax # write(<br>mov $2, %rdi # stderr,<br>lea msg(%rip), %rsi # msg,<br>mov $msglen, %rdx # len)<br>syscall<br>mov $60, %rax # exit(<br>mov $3, %rdi # 3)<br>syscall<br>msg: .ascii "[fakeld] all your base are belong to us\n"<br>msglen = . - msg<br>frn@debian:~$ gcc -nostdlib -no-pie -o /tmp/fakeld fakeld.s<br>(My first try used an ordinary static binary and it segfaulted. Being the interpreter is its own job, with its own expectations about how it gets started; a program that makes nothing but raw syscalls has none of them.)<br>The interpreter path sits at a fixed offset in the file - readelf -l lists it as the INTERP segment, at 0x394 - so I can overwrite it in place and run the result:<br>frn@debian:~$ cp /bin/ls lshj<br>frn@debian:~$ python3 -c 'f = open("lshj", "r+b"); f.seek(0x394); f.write(b"/tmp/fakeld\x00")'<br>frn@debian:~$ readelf -p .interp lshj<br>[ 0] /tmp/fakeld<br>frn@debian:~$ ./lshj<br>[fakeld] all your base are belong to us<br>ls never ran. I did not touch one byte of its code - all 155 KB is still there - but the kernel went to the interpreter first, and the interpreter was mine. What a “program” does is not always something inside the program.<br>Interpreters all the way down<br>Scripts make that idea bare. A shell script is not machine code; it is text. It is “executable” only because its first line names an interpreter and the kernel hands the file to it. Which means the interprter does not have to be a shell. So let’s point it at...