One bitmask in task_struct fixes 15 years of Linux signal conflicts

gogakoreli1 pts0 comments

One bitmask in task_struct: how a 10-line kernel patch resolves 15 years of multi-runtime signal conflicts on Linux — Goga Koreli

gkoreli.com

Where excitement ends, depth begins.

Contents

Studio

backlog-mcp↗<br>@nisli/core↗<br>gkoreli.com↗<br>Design Language<br>Animations Lab

One bitmask in task_struct: how a 10-line kernel patch resolves 15 years of multi-runtime signal conflicts on Linux

I spent the last two days debugging why a Bun server on Linux would permanently freeze the moment a Go shared library and a WebAssembly module coexisted in the same process. The strace showed 8,500 SIGPWR signals per second flooding the main thread. The event loop never recovered.

A fix is in progress — Bun's team is patching their WebKit fork to work around it. But the root cause isn't a bug in any one project. It's a kernel feature that doesn't exist yet — one that would take about 10 lines to implement.

The bug

A server process on Linux loads two things:

A Go CGo shared library via dlopen() (for authentication)

A WebAssembly module (for collaborative editing)

The first WASM function call permanently kills the event loop. setTimeout never fires. fetch never resolves. Microtasks still work (Promise.resolve is fine), but all macrotasks are dead. The process burns 100% CPU doing nothing useful.

Strace reveals the cause:

[pid 555498] tgkill(555498, 555498, SIGPWR) = 0<br>[pid 555498] tgkill(555498, 555498, SIGPWR) = 0<br>[pid 555498] tgkill(555498, 555498, SIGPWR) = 0<br>... (25,678 times in 3 seconds)<br>A compilation helper thread sends SIGPWR to the main thread in an infinite retry loop. The signal handler never acknowledges. The helper never stops.

Why it happens

Three facts about Linux signal delivery:

sigaction flags (including SA_ONSTACK) are process-wide. All threads share one signal disposition per signal.

sigaltstack is per-thread. Each thread can configure its own alternate signal stack.

The kernel delivers on the alt stack if and only if BOTH are true: SA_ONSTACK is set on the handler AND the receiving thread has a sigaltstack configured.

Now the sequence:

Bun starts. Main thread calls sigaltstack(512KB) for its crash handler (needs alt stack to report stack overflows). Installs a SIGPWR handler without SA_ONSTACK — SIGPWR is used for thread suspension and must run on the normal stack for the handler's stack-position check to work.

Go .so loaded via dlopen. Go's runtime calls setsigstack() on every signal with a non-default handler. This reads the current sigaction, ORs in SA_ONSTACK, and reinstalls it. It's one line in Go's runtime/signal_unix.go:

// Even if we are not installing a signal handler,<br>// set SA_ONSTACK if necessary.<br>if fwdSig[i] != _SIG_DFL && fwdSig[i] != _SIG_IGN {<br>setsigstack(i)

Next SIGPWR delivery. Kernel checks: SA_ONSTACK? Yes (Go added it). Thread has sigaltstack? Yes (Bun's crash handler). Delivers on the alt stack.

Handler runs on wrong stack. The handler's stack-position check fails (it's on the alt stack, not the normal stack). It doesn't acknowledge the suspension. The sender retries. Forever.

This isn't a bug in Go, Bun, or WebKit

Go's behavior is documented and intentional:

"If there is an existing signal handler, the Go runtime will turn on the SA_ONSTACK flag and otherwise keep the signal handler."

Go needs SA_ONSTACK because goroutine stacks are 8KB. Without it, a signal arriving on a goroutine thread would overflow. Go configures per-thread sigaltstack on its own threads, but the kernel requires SA_ONSTACK on the handler too — otherwise the alt stack won't be used.

Bun needs sigaltstack on its main thread for crash reporting. Without it, a stack overflow followed by SIGSEGV would have no stack to run the crash handler on.

Both are correct. Both are necessary. They're incompatible because POSIX was designed for single-runtime processes — a world where one process meant one runtime with one signal handling policy.

The same bug, everywhere

Once I understood the mechanism, I found it recurring across the ecosystem:

Year<br>Project<br>Issue<br>Impact

2015<br>Go<br>#13034<br>Signal forwarding broken with embedders

2016<br>Linux kernel<br>bugzilla #153531<br>AVX-512 overflows MINSIGSTKSZ → memory corruption (P1, still open )

2025<br>Go + .NET<br>#78883<br>CoreCLR SIGSEGV when loaded with Go

2026<br>Bun + Go<br>#31158<br>Event loop permanently dead

2026<br>Bun + Go + Prisma<br>#29843<br>Database queries hang

Valve/Proton<br>#6762<br>Games crash on Linux

Duplicati<br>#5793<br>.NET + Go backup crashes

AFLplusplus<br>#2545<br>Fuzzer sigaltstack failure

LLVM<br>#48092<br>libFuzzer breaks ASAN stack-overflow detection

Each team thought it was their bug. Each shipped their own workaround:

Bun: read the interrupted SP from ucontext instead of the handler's own SP (WebKit #235)

.NET: increase alt stack size (dotnet/runtime#110368)

LLVM: preserve SA_ONSTACK flag in libFuzzer

Go: "host must use SA_ONSTACK" (documentation, not a fix)

Valve: unfixed

Nobody stepped back and asked: why does this keep happening?

The missing kernel primitive

The...

stack signal handler thread sa_onstack linux

Related Articles