Accepted proposal: a goroutine leak profile in the Go standard library | Redowan's ReflectionsSkip to content
Go 1.27 is getting a goroutine leak detector in runtime/pprof. The proposal<br>was accepted<br>in April.<br>A few common goroutine leaks #<br>A goroutine leaks when it blocks on a channel or lock that nothing will ever release, so it<br>lingers for the life of the process. I’ve been using uber-go/goleak<br>to catch them in<br>tests.<br>One is an early return that strands a sender, which I covered in Early return and goroutine<br>leak<br>. It looks like this:<br>func run(tasks []func() error) error {<br>errs := make(chan error) // unbuffered<br>var wg sync.WaitGroup<br>for _, task := range tasks {<br>wg.Go(func() { errs task() }) // (1)<br>for range tasks {<br>if err := errs; err != nil {<br>return err // (2)<br>wg.Wait()<br>return nil
Here:<br>(1) each task sends its result on the unbuffered channel through wg.Go<br>(2) the first error returns early, so the tasks still queued to send block forever<br>Giving errs a buffer big enough for every task, or draining all the results before<br>returning, keeps the sends from blocking.<br>A related leak shows up when you send a request to several replicas and keep only the first<br>answer:<br>func replicate(replicas []func() string) string {<br>results := make(chan string) // unbuffered<br>for _, r := range replicas {<br>go func() { results r() }() // (1)<br>return results // (2)
Here:<br>(1) every replica races to send its answer on the unbuffered channel<br>(2) the first answer returns, and the slower replicas block forever on their sends<br>Same as before, a buffer sized for every replica lets the slower ones send and exit.<br>Another is a forgotten close:<br>func stream(work []int) {<br>out := make(chan int)<br>go func() {<br>for v := range out { // (1)<br>handle(v)<br>}()<br>for _, v := range work {<br>out v<br>// (2) no close(out)
Here:<br>(1) the range keeps pulling from out until it’s closed<br>(2) stream returns without close(out), so the range never ends and the goroutine leaks<br>The fix is to close(out) after the last send, which ends the range and lets the goroutine<br>return.<br>They’re obvious once you spot them, but easy to let slip past under an early return or once<br>the surrounding code grows. goleak catches them in tests. In production you’ve got the<br>regular /debug/pprof/goroutine profile. It shows what each goroutine is blocked on, not<br>whether it will ever unblock, so you’re guessing which are stuck for good and which are just<br>idle.<br>This list is nowhere near exhaustive, and not every leak is in your own code. A dependency,<br>or one of its transitive deps, can leak one too. Uber catalogued the patterns across its Go<br>monorepo<br>The stdlib leak profile can now find them #<br>It came out of Uber, the same place as goleak, and was designed by Vlad Saioc and Milind<br>Chabbi. The detection rides on the garbage collector<br>. A goroutine is leaked when it’s<br>blocked on a channel or lock that no runnable goroutine can reach, directly or through<br>another goroutine a runnable one could unblock. Nothing can ever wake it. The GC flags it.<br>Note<br>Read that as a reachability test. If a goroutine is blocked on primitive P, and P is<br>unreachable from any runnable goroutine or from any goroutine those runnable ones could<br>unblock, then P cannot be unblocked. The goroutine can never wake up.
goleak and the profile answer different questions:<br>goleakgoroutineleak profileAskswhat’s still running you didn’t expectwhat can never run againHow it decidesa snapshot, no proofa reachability proof, via the GCWorks intests, at teardowna live processFalse positivesyes, on a live servernone, only provably stuck goroutinesThe split is about where each one runs. At a test’s teardown nothing should be left running.<br>Handing back whatever’s there is exactly what you want from goleak. A live server is the<br>opposite. Most of its goroutines are blocked on purpose, waiting for the next request, and<br>goleak can’t tell those from a real leak.<br>The profile proves it instead. It starts from the goroutines that can still run, follows<br>what they can reach, and rescues any blocked goroutine whose channel or lock is still in<br>play. Whatever’s left has nothing that could ever touch it. It’s stuck for good. Uber had<br>already tried in-production leak detection<br>with a sampling tool, but sampling flags by<br>heuristic and turns up false alarms. The GC pass reports only goroutines it can prove are<br>stuck. That’s the no false positives<br>guarantee.<br>The profile ships without goleak’s VerifyNone(t) or VerifyTestMain(m). The test<br>section<br>shows how to roll your own.<br>The API is tiny. There’s no new type or function, just a profile named goroutineleak. It<br>ships registered, and the standard pprof tooling reads it like any other profile.<br>You can pull the profile in the usual four ways #<br>Note<br>For now the profile is behind a build flag. Run the examples below with<br>GOEXPERIMENT=goroutineleakprofile, or pprof.Lookup("goroutineleak") returns nil. Go<br>1.27 will make it generally...