Apple working to cram Gemini model into iPhone to power new Siri

TMWNN1 pts0 comments

Apple working to cram massive Gemini model into iPhone to power new Siri - Ars Technica

Skip to content

AI

Biz & IT

Cars

Culture

Gaming

Health

Policy

Science

Security

Space

Tech

Forum

Subscribe

Story text

Size

Small<br>Standard<br>Large

Width

Standard<br>Wide

Links

Standard<br>Orange

* Subscribers only

Learn more

Pin to story

Theme

Search

Sign In

Sign in dialog...

Text<br>settings

Story text

Size

Small<br>Standard<br>Large

Width

Standard<br>Wide

Links

Standard<br>Orange

* Subscribers only

Learn more

Minimize to nav

It’s impossible to totally avoid generative AI when interacting with technology anymore, but Apple has a bit less of it. That’s not entirely by choice, though. The iPhone maker has delayed the AI-enhanced Siri multiple times since first promising it in 2024, but a deal with Google will merge the iconic assistant with Gemini later this year. As we approach the Worldwide Developers Conference, Apple has been working to bring big AI smarts to the modest processing environment of a smartphone. Apple fans may not like the outcome, though.

Apple has long crowed about the privacy value of running AI locally, but a new report suggests that despite Apple’s best efforts, the iPhone’s Gemini makeover will lean heavily on Google and Nvidia in the cloud. The Information reports that Apple’s Gemini-infused Siri will run both on-device and in the cloud, an apparent reversal of its privacy-focused preference for local AI.

With every new chip announcement, we hear about how the silicon has been optimized for AI—even Apple does this with its focus on Neural Engine upgrades. You may think from the grandiose language that smartphones are equipped to handle beefy AI models, but that’s not necessarily the case. In fact, the GPUs in most phones can process more AI tokens than the AI-focused NPUs. Components like Apple’s Neural Engine are designed for contextual, efficient AI processing. Even if phones had faster AI processing, they lack the RAM to keep enormous models in memory.

Even the largest AI models are still middling assistants, and that makes local AI very challenging. The AI models that run on phones are physically smaller, featuring at most a few billion parameters. Compare that to Google’s latest Gemini models, which have trillions of parameters, The Information reports. On-device AI models are also “quantized” to run at lower precision, making them faster but affecting the accuracy of token generation. This all adds up to AIs that feel less smart than their cloud brethren, and even big cloud-based models can be pretty dumb sometimes.

The amazing, shrinking Gemini

Google has versions of Gemini optimized for mobile devices, which it calls Gemini Nano. However, these are designed for powering contextual features like Magic Cue and audio summarization. Siri, on the other hand, is supposed to be a conversational assistant—you talk to it and it does things. That’s a different experience that requires a different kind of model. On Android, Google doesn’t even bother trying to do that locally. Talking to Gemini always goes straight to the cloud.

After inking the Google deal, Apple apparently got to work distilling Google’s giant cloud-based Gemini models. Distillation is a process in which a small, less resource-intensive model learns to mimic a large, expensive one. With enough time, this can reliably transfer useful capabilities while pruning less important weights from the model. That may enable Siri to handle some tasks with private local compute, but a cloud component looks inevitable.

Processing users’ AI data in the cloud could be a problem for Apple. At WWDC, the company will probably promote its years of experience designing chips and how well that positions it for AI. However, The Information claims that Apple has struggled to even get Google’s massive undistilled Gemini models running on its custom Private Cloud Compute infrastructure, which is built on on M-series Mac chips.

When the smarter Siri rolls out, it will probably route more complex tasks to Google’s cloud infrastructure instead of Apple’s, but it won’t be running on Google TPUs. Apple has reportedly signed a deal with Nvidia to use its Confidential Computing platform for this purpose. Confidential Computing keeps data encrypted on Nvidia GPUs while it’s being processed in the cloud, which could help Apple claim it’s still sensitive to user privacy concerns. It might even retain its own Private Cloud Compute branding for the system.

The iPhone probably won’t tell you which version of Gemini is handling individual Siri requests. Device makers designing hybrid systems that rely on local and cloud-based AI like to talk about making the experience feel “seamless.” There might be clues, though.

We’re all familiar with the sluggishness of big AI models, which can churn for a long time while they generate tokens. Nvidia’s fully encrypted Confidential Compute does slow processing compared to other AI options. Users may find it...

apple gemini cloud google models siri

Related Articles