Xenova (@xenovacom): "Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real.
Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser.
Agentic kernel optimization is the future of on-device inference" | XCancel
Xenova
@xenovacom
21h
Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real.
Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser.
Agentic kernel optimization is the future of on-device inference
Xenova
@xenovacom
Jun 13
I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference.
It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible.
Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.
The next day, access to Fable 5 was suspended globally.
Jun 17, 2026 · 4:54 PM UTC
69
160
1,733
264,359
Xenova
@xenovacom
21h
In case you hadn't noticed, we're working on something big. Stay tuned.
🔗 Link to the demo: huggingface.co/spaces/webml-…
Gemma 4 WebGPU Kernels - a Hugging Face Space by webml-community
Discover amazing ML apps made by the community
huggingface.co
120
7,879
Loïck Chambon (PhD in CV) - 🇺🇦🇮🇷🇪🇺@LoickCh
5h
Replying to @xenovacom
Do we know what they optimized?
232
Unni@karmakomik
17h
Replying to @xenovacom
Will try but I hope it did not just optimise for your GPU 😅
1,504
The Singularity Project
@01Singularity01
20h
Replying to @xenovacom
Failed to load: No supported WebGPU variant for com.xenova.gemma4.DecodeOprojNorm; rejected fused_rows: when guard resolved to false; fused: when guard resolved to false
18
3,100
xcaliburr@xscorpiox101
19h
Replying to @xenovacom
I'm much more interested if the output is still correct, quality normally deteriorates when speeds increases
1,732
Fab 🇧🇷🇨🇦
@FlockonUS
19h
Replying to @xenovacom
How many GB will my browser load if i access the page?
11
4,646
octalmage
@octalmage
17h
Replying to @xenovacom
it doesn't know...
20
2,888
Ian Danforth@iand_elicit
18h
Replying to @xenovacom
As far as I can tell it's fast and not very high quality. So interesting technical work, but I wouldn't use the model for anything.
1,830
The Singularity Project
@01Singularity01
18h
Replying to @xenovacom
WebGPU: Hardware accelerated<br>Adapter selected with `powerPreference: "high-performance"`:<br>```js<br>vendor: "nvidia",<br>architecture: "ampere",<br>subgroupMinSize: 32,<br>subgroupMaxSize: 128,<br>features: [<br>"shader-f16",<br>"subgroups",<br>"timestamp-query",<br>...<br>```<br>GPU: NVIDIA GeForce RTX 2050<br>Likely cause<br>The embedded `DecodeOprojNorm` variant guard appears to require an exact fixed subgroup range:<br>```js<br>device.features.has("subgroups") &&<br>device.adapterInfo.subgroupMinSize == 32 &&<br>device.adapterInfo.subgroupMaxSize == 32<br>```<br>On this NVIDIA Ampere/D3D12 adapter, Chrome reports:<br>```js<br>subgroupMinSize: 32<br>subgroupMaxSize: 128
So both `fused_rows` and `fused` variants are rejected before compilation, even though the adapter supports `subgroups` and `shader-f16`.<br>Suggested fix<br>Please add a compatible fallback or relax/add a variant for adapters where subgroup size includes 32 but `subgroupMaxSize > 32`, e.g. NVIDIA/D3D12. If the WGSL is safe for 32-lane subgroup assumptions, the guard might be closer to:<br>```js<br>subgroupMinSize = 32
Otherwise, a separate NVIDIA/D3D12 variant or non-fixed-subgroup fallback would allow the demo to run on hardware-backed WebGPU adapters that expose a subgroup range rather than fixed 32.
1,617
ansuman
@ansuman_bin
17h
Replying to @xenovacom
bro retired too early!
1,087
WuBu ⪋ WaefreBeorn 🇺🇸 👑
@waefrebeorn
19h
Replying to @xenovacom @crosstensor
thank you for releasing the work for peer review
i respect your efforts now
1,115
ZenithAi
@ZenithAiLab
5h
Replying to @xenovacom
255 tok/s in your browser. Fable 5 proved it, now you can run it. Agentic kernels = local AI unchained
249
Adria B.A.@Adria_MBA
15h
Replying to @xenovacom
That one went to 500 tok/sec
Leandro von Werra
@lvwerra
Jun 16
We launched an agent collaboration with a simple task: make Gemma 4 faster.
Over 100 agents from all over the world joined, exchanged 1000+ messages and submitted 450 results.
A week of collaboration later the throughput went from 100 tok/s to over 500 tok/s.
13
1,508
NeoLabsFlow@Sika12225983
16h
Replying to @xenovacom @ClementDelangue
Hmm, curious if it really speeds things up!
2,024
usul365
@yusufgider
3h
Replying to @xenovacom
Fable 5 WebGPU'da Gemma 4'ü 255 token/s'ye taşıdı — tarayıcıda, yerel olarak. 🚀<br>İnanmayanlar için kod açık kaynak yapıldı. Kendiniz deneyin.<br>Cihaz üzerinde çıkarım artık teori değil, gerçek. Bulut bağımlılığı bitiyor mu? 👇
127
Vabbyshabby
@vabbyshabby
9h
Replying to @xenovacom
255 tok/s on webgpu with gemma 4 is the milestone that separates a...