Announcing Silk: a silky smooth fiber runtime for ClickHouse | ClickHouse
Open searchOpen region selectorEnglish<br>Japanese
48.3kSign inGet Started
->Scroll to top<br>BackBlog<br>Engineering<br>Copy pageCopied!More actionsView as Markdown Open this page in Markdown<br>Open in ChatGPT Ask questions about this page<br>Open in Claude Ask questions about this page<br>Open in v0 Ask questions about this page
Announcing Silk: a silky smooth fiber runtime for ClickHouse
James Cunningham and Vadim Skiping<br>Jun 25, 2026 · 16 minutes read
TL;DR #
Silk is a stackful-fiber library and scheduler with a NUMA-aware work-stealing loop, io_uring as the I/O ground truth, and zero heap allocation in the steady-state hot path. We built it for ClickHouse, and the first place we aim to integrate it is in our distributed cache.
What are fibers? What is Silk? #
Fibers are a lightweight user-space execution unit, somewhat like threads. Unlike threads, fibers participate in cooperative multitasking instead of the preemptive multitasking that threads use; allowing fibers to yield their work instead of block on it. This particular behavior is best suited for asynchronous I/O, which is becoming more of a bottleneck in distributed systems as CPUs grow faster and clusters grow larger.
Unlike threads, fibers do not have a rich ecosystem of language support, which is why we created Silk. Silk is a C++ library that gives you a cooperative fiber scheduler, backed by a per-CPU scheduler that uses io_uring for asynchronous I/O and steals work between cores when local queues run dry. It is exceptional at executing high-concurrency networking I/O (hint hint: ClickHouse) and also at high-currency file I/O (surprise surprise: also ClickHouse).
The name is a homage to Cilk, the 1994 MIT work-stealing scheduler whose name was itself a portmanteau of "silk" plus C. Silk is meant to position itself in that lineage. The fiber-as-silk-thread metaphor is a side benefit.
What made us write a runtime rather than reach for an existing one off the shelf is the combination of properties we needed from it:
A fiber that yields in tens of nanoseconds
Work stealing that respects CPU topology
No heap allocation in the steady state
io_uring treated as the I/O ground truth rather than as a backend bolted onto an older reactor design.
None of the off-the-shelf options gets all four. So we wrote one that does, and we ship it with the harness, GDB extension, and BPF profiler that proves we aim to depend on it in ClickHouse.
Why fibers, why these fibers, and why now? #
ClickHouse already has a concurrency model, and it works. It's the right model for the parts of the engine that look like query execution: long-running threads doing real CPU work, where the per-thread overhead is amortized over millions of rows of computation.
Yet, we need silk for the rest of the engine. If you trace a query through ClickHouse Cloud, increasingly the long pole is not "a thread did a lot of computation," it is "ten thousand tiny operations completed in a particular order, and the slowest of them shaped the tail." This takes an aim at increasing the performance of object-storage I/O, distributed cache lookups, replica coordination, HTTP fan-out. All components that are I/O-bound, highly concurrent, and decided at the 99th and 99.9th percentile. They are exactly the workloads where the cost of one in-flight request is supposed to be a stack pointer, not a kernel thread.
The argument for stackful fibers, over OS threads or stackless C++20 coroutines, is essentially this: OS threads are too expensive to use as the primary unit of concurrency in a database engine. A few microseconds per context switch, kilobytes of stack, and a finite number of them before the kernel starts context-switching itself to death. Stackless coroutines are cheap but viral: every function on a suspension path has to be marked co_await-able, and the compiler's heap allocation elision optimization (HALO) reliably stops firing the moment the coroutine handle escapes to a real scheduler queue. Stackful fibers give you cheap suspension without the language footprint: any function can yield and the stack is a normal stack.
The historical objection to stackful fibers, dating back to the Photon paper from Alibaba, is cache aliasing: fibers allocated from a slab can have stacks that map to the same L1 cache lines, producing pathological eviction. The Photon paper measured a 13% scheduler-level cost from this. Silk's response is that the problem is a property of slab-allocated stacks specifically, not of stackful fibers in general. Each fiber's stack is mmap'd from a per-fiber pool with guard pages on either side. There is no slab and no aliasing. The 13% cost does not appear in our benchmarks, because the precondition for it does not exist.
What silk delivers, by its own benchmarks against the field, is roughly the following:
About 3.6 nanoseconds per fiber yield with cross-CPU work stealing
About 7.6 microseconds for...