Cerebras’ wafer-scale AI bet delivers blockbuster IPO
Jump to main content
Search
REG AD
AI+ML
Cerebras risked it all on dinner plate-sized AI accelerators a decade ago. Today it’s worth $66 billion
Here's a look at the tech powering the first big IPO of 2026
Tobias Mann
Tobias<br>Mann
Systems Editor
Published<br>fri 15 May 2026 // 00:02 UTC
Cerebras Systems has done what many chip startups aspire to but few ever achieve. On Thursday, the company and long-time Nvidia rival raised $5.55 billion in an initial public offering (IPO), making the company worth more than $66 billion on its first day of trading.<br>The milestone didn’t happen overnight. It took more than a decade, a radically different approach to chipmaking, and two separate attempts at an IPO to pull off.<br>Founded in 2015 by former SeaMicro head Andrew Feldman, Cerebras Systems' first chips looked nothing like GPUs or AI accelerators of the time.
REG AD
The bet that put Cerebras on the map
REG AD
At the time, most high-end GPUs used dies measuring roughly 800 square mm that’d been cut from a larger wafer. Eight or more of these GPUs would typically be stitched together by high-speed interconnects, like NVLink, which allowed them to pool their resources and behave like one big accelerator.<br>Rather than cutting up a wafer into smaller chips just to reconnect them again, Cerebras figured why not etch all that compute into a wafer-sized chip? And so the Wafer-Scale Engine (WSE), a giant chip measuring 46,225 square mm — about the size of a dinner plate — was born.<br>Cerebras' first chips weren’t just bigger; they were purpose-built for AI training and sported a novel compute engine designed to speed up the highly sparse matrix multiply-accumulate operations common in deep learning.<br>This hardware sparsity took advantage of the fact that large portions of a neural network’s parameters ultimately end up being zeros, allowing Cerebras to boost the effective computational output of its first-gen WSE accelerators from 2.65 16-bit petaFLOPS to 26.5.<br>Nvidia added support for sparsity in its Ampere generation a year later, but it only worked for a specific ratio (2:4), limiting its effectiveness to select use cases.<br>To train a model, up to 16 of these chips could be ganged together over a high-speed interconnect. This was kind of important too, because unlike GPUs, which stored model weights in HBM or GDDR memory, Cerebras' chips were almost entirely reliant on on-chip SRAM. Although SRAM is insanely fast, which is why it’s used for caches in basically every modern processor, it’s not particularly space efficient.<br>While Cerebras' first wafer-scale accelerator could theoretically reach 9 petabytes per second of memory bandwidth, it was limited to just 18 GB of capacity at a time when Nvidia was already at 32 GB per GPU and about to make the leap to 40 GB or even 80 GB per chip.<br>Still, the approach was performant enough that for its second-generation wafer-scale accelerator, launched in 2021, Cerebras doubled down on the architecture.
REG AD
While the WSE-2 wasn’t physically larger, the move to TSMC’s 7nm process tech allowed the company to more than double the transistor count, compute density, SRAM capacity, and bandwidth.<br>The chips also supported larger clusters, scaling up to 192, though in practice these clusters were usually smaller at between 16 and 32 systems per site.<br>It was also around this time that Cerebras caught the attention of United Arab Emirates-based cloud provider G42, which quickly became its largest financier. By mid-2023, the chip startup had secured orders worth $900 million for nine supercomputing sites with a 36 exaFLOPS of super sparse AI compute between them.<br>A year later, Cerebras made the jump to TSMC’s 5nm process with the WSE-3 and while memory and bandwidth only saw modest gains, compute once again doubled now topping a 125 petaFLOPS of Sparse (12.5 petaFLOPS dense) compute at 16-bit precision.<br>Cerebras’ CS-3 systems have now seen the largest deployment, and now power the majority of the Condor Galaxy cluster it built for G42, as well as several new sites across North America and Europe.<br>Cerebras' inference inflection<br>Up to mid-2024, Cerebras' primary focus had been on training, but then the company announced a boutique inference-as-a-service offering to rival those from competing chip startups like Groq and SambaNova.<br>It turns out, Cerebras’ latest AI accelerators’ massive SRAM capacity not only made them potent training accelerators but particularly well suited to high-speed LLM inference.
REG AD
In its third iteration, Cerebras' wafer scale accelerators boasted more memory bandwidth than they could realistically use. At 21 PB/s, the chip’s memory is nearly 1000x faster than Nvidia’s new Rubin GPUs.<br>This, along with a dash of speculative decoding, allowed Cerebras to generate tokens far faster than any GPU-based system of the time. Even today, Cerebras routinely ranks among the fastest inference providers in the...