Apple rebuilt its on-device AI stack at WWDC 2026

ABS1 pts0 comments

Apple rebuilt its on-device AI stack at WWDC 2026 - Ziraph blog

Beta version (soft) launch soon! Subscribe to the announcement list to hear the moment the website and beta open ↗

WWDC 2026 brought no new silicon. What it brought instead was a structural rebuild of how AI runs on Apple silicon:

a new inference framework,

a new model format,

a new generation of on-device models,

and a noticeably different posture toward the cloud.

None of it was the headline - the headline was the consumer features. But the developer documentation, the session code, and one machine-learning-research post add up to a clearer roadmap than the keynote did, plus a few details that are genuinely odd.

I read this layer closely - I'm building a profiler for it - so here is what stood out: the major changes, the subtle tells, and the findings I had to double-check before believing. One ground rule up front: everything below is from Apple's own documentation, WWDC session pages, and research posts, quoted where it matters. Where something is an individual developer's claim or a forum reading rather than Apple's word, I say so. Where Apple simply does not say, I say that too.

And the biggest caveat of all: I'm in Europe, so I spent the night watching, reading, and researching - I'm sure I got something wrong due to lack of sleep. :-)

The big change: Core AI replaces Core ML for neural networks

For a decade, Core ML was the answer to “run a model on an iPhone.” At WWDC 2026 Apple introduced Core AI, and the framing is a handover, not an addition. Core AI's documentation sends the old cases back to Core ML:

“If your app uses model types other than neural networks, such as decision trees or tabular feature engineering, see Core ML.” - Apple, Core AI documentation

And Core ML's documentation sends the new ones forward:

“If your app integrates AI models using the latest architectures and inference techniques, see Core AI.” - Apple, Core ML documentation

Read together, that is a split: Core ML narrows to classic, non-neural machine learning - decision trees, tabular features - while neural networks and transformers move to Core AI, which Apple describes as the engine behind the product itself:

“Core AI allows your app to use the latest model architectures and inference techniques across the CPU, GPU, and Neural Engine.” - Apple, Core AI documentation

The subtle tell is in the tooling. Apple's new Core AI debug gauge carries a one-line restriction:

“The gauge does not support the Core ML framework.” - Apple, Core AI debug gauge documentation

The new instrumentation simply does not look at the old framework. Core ML is not deprecated - its APIs are intact, and there is real backward-compatibility value in that - but the center of gravity, and the tooling investment, has moved.

A new artifact: the .aimodel bundle

Core AI ships a new on-disk format, .aimodel, and the first odd thing about it is that it is not a file. It is a directory. Apple's open coreai-models repository treats it as one throughout - the Python exporter deletes an old one with a directory-only call, and the Swift runtime resolves it as a “.aimodel directory.” Inside the surrounding model bundle is a plain-JSON metadata.json (schema version 0.2) that records the model kind (LLM, VLM, diffusion, segmenter), the tokenizer, vocabulary size, context length, the compression preset, and which file is the model. That JSON is documented and parseable. The weight payload itself - the part that would tell you the exact per-tensor bit-widths - is written by an opaque framework call, and its byte layout is not published anywhere I could find. So the format is half-open: a readable manifest wrapped around an undocumented blob.

Models are prepared with new Python tooling - Core AI Optimization (coreai-opt, the successor to coremltools) for compression, and Core AI PyTorch Extensions (coreai-torch) to export straight from PyTorch into the format - then optionally compiled ahead of time with xcrun coreai-build compile into per-architecture .aimodelc assets. The compression menu is wider than the GGUF world's: integer weights at 2, 4, and 8 bits; float micro-formats including FP8 (E4M3) and FP4 (E2M1); block-scaled MXFP8; and palettization from 1 to 8 bits. One forum reader (HN, opinion) noted Apple is also pushing activation quantization like w4a8 / w4a16; given Apple's install base, the formats it blesses could end up shaping how sub-100B models ship to everyone.

The hardware tell: the matmul moved into the GPU

No new chip, but WWDC 2026 made the M5 and A19 GPU story explicit, and it is the clearest hardware signal of the week. From Apple's M5/A19 tech talk:

“Neural accelerators are dedicated hardware in M5 purpose built for matrix multiplication. They're built into each shader core right alongside the other GPU pipelines such as ALU, raytracing... Each shader core has its own neural...

core apple documentation model neural ldquo

Related Articles