The Third Generation of Apple's Foundation Models

Introducing the Third Generation of Apple’s Foundation Models - Apple Machine Learning Researchcontent type Featured highlight<br>Introducing the Third Generation of Apple’s Foundation Models

June 8, 2026

Our next generation of Apple Intelligence is centered around our users, integrated deeply into our operating systems, and powered by a bold new architecture with privacy at its core.

At the heart of this architecture is our third generation of Apple Foundation Models (AFM), a family of five foundation models custom-built in collaboration with Google. These span from on-device models to server-based models running on Private Cloud Compute.

Apple Foundation Models are built to unlock a wide range of helpful experiences for our users, like an entirely new Siri and intelligent tools that make everyday apps smarter and more useful.

This family of models includes two on-device models:

AFM 3 Core , the next generation of our 3-billion-parameter dense model that delivers a step up in quality.

AFM 3 Core Advanced , our most powerful on-device model. It’s natively multimodal, enabling helpful features like expressive voices and higher-accuracy dictation. Built on cutting-edge Apple research, this 20-billion-parameter model uses a sparse architecture, activating just 1 to 4 billion parameters at a time depending on the request. AFM 3 Core Advanced is unlocked by and optimized for our most capable Apple silicon systems.

Our latest Apple Foundation Models also include three server-based models running on Private Cloud Compute, which ensures that user data is never stored or shared with anyone, including Apple. These models are:

AFM 3 Cloud , our server-side workhorse, optimized for speed, efficiency, and performance.

ADM 3 Cloud (Image) , for image generation and editing, which unlocks advanced photo-editing tools, the all-new Image Playground, and more.

AFM 3 Cloud Pro , our most capable server-based model, which powers our most demanding use cases, like agentic tool use and complex reasoning.

AFM 3 Core, AFM 3 Core Advanced, and AFM 3 Cloud, along with ADM 3 Cloud, are all purpose-built for Apple silicon.

For AFM 3 Cloud Pro, we worked with Google and NVIDIA to extend Private Cloud Compute to NVIDIA GPUs in Google Cloud, while maintaining the same guarantees to protect our users’ privacy. More details are available on our Security Research Website.

Our third generation of Apple Foundation Models delivers significant advancements across capabilities and quality. In the following overview, we’ll dive deeper to explore the scalable architectures powering our on-device and server-based models, our training methodologies, and more.

Model Architectures

We designed the architectures for both our on-device and server-side models to enable powerful Apple Intelligence experiences for our users, and we integrated our latest models deep into our operating systems.

Maximizing on-device AI capabilities

One area of deep innovation is our most powerful on-device model, AFM 3 Core Advanced. Traditional large language models—whether dense or sparsely activated—require all weights to reside in active memory (DRAM), creating a massive footprint that limits scalability on consumer hardware. To break this barrier, AFM 3 Core Advanced introduces a novel sparsely activated architecture built on Instruction-Following Pruning (IFP), a technique developed by Apple researchers (see Figure 1).

Instead of forcing the entire model into DRAM, the full model is stored in flash memory (NAND). Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt. A lightweight, dense block selects a fixed set of experts during initial processing, periodically reselecting them during generation. To minimize data movement, the model relies on a high percentage of always-active “shared experts” alongside input-dependent “routed experts” swapped into DRAM only when needed.

AFM 3 Core Advanced Model Architecture

Figure 1: An illustration of the AFM 3 Core Advanced model architecture. The vast majority of the model’s parameters are “expert” weights associated with the feed-forward (FFN) blocks in a stacked transformer architecture. Given a user query, the model selectively loads a small subset of experts and patches them with shared, static weights to form a dense model in DRAM. The model periodically reselects and updates the activated experts during the token generation process.

This design also introduces crucial inference-time elasticity. Rather than using a single model for all tasks or managing an ensemble of smaller models, AFM 3 Core Advanced uses a predetermined number of active parameters tailored to each specific use case. This allows weights to be loaded incrementally across requests of varying difficulty, scaling the model size far beyond traditional DRAM limits while minimizing latency.

Scaling the server foundation

In addition to...

The Third Generation of Apple's Foundation Models

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y