Apple Silicon's on-device AI bet: same play, bigger range
Sign Up
Three days before WWDC opens in Cupertino, and at the end of a Computex week in which NVIDIA, Qualcomm and Intel all took turns on the Taipei stages arguing — in their own words — for hybrid AI between cloud and device, Apple's silicon team is making a quieter point: their bet on running it on the device hasn't really moved.<br>The Neural Engine has been in Apple chips since 2017. Unified memory has been a foundational design choice since the first A-series part. What's changed in nine years isn't the philosophy — it's the silicon range that now delivers it, which today stretches from a $599 MacBook Neo to an M5 Max workstation chip with two dies bonded into a single package.<br>That, in essence, was Doug Brooks's argument in a briefing on the sidelines of Computex. Brooks is the senior product manager for Apple Silicon, and the conversation was on-the-record and product-focused — no demo, no marketing slide deck, no roadmap. Just the chip.<br>His opening line is worth quoting, because it's the spine of everything that followed:<br>If you want to build a great device for AI, you need to build a great computer.<br>You can read that as evasion — Apple deflecting the AI conversation back to the hardware it already makes — or you can read it as the actual thesis: on-device AI doesn't get bolted on to a phone or a Mac; it gets designed in from the silicon up, and Apple has been doing exactly that since the A11 Bionic introduced the Neural Engine in 2017. As Brooks put it, with a flicker of dryness, "well before the AI PC trend broke."<br>The same week, three other Silicon CEOs took the stage<br>Brooks's case is worth setting against the rest of Computex. The show's keynote slots belonged to NVIDIA's Jensen Huang, Qualcomm's Cristiano Amon and Intel's Lip-Bu Tan, and each of them spent two hours arguing — in their own words — for the same thing: that AI is not going to live entirely in the cloud.<br>Huang announced the NVIDIA RTX Spark, a new Windows PC superchip built with MediaTek that uses NVLink-C2C to connect a Blackwell GPU die to a Grace CPU die with coherent memory across both. "The PC is being reinvented," he said. The same keynote put NVIDIA's Vera Rubin AI-factory platform into full production. The bet is plainly on both ends — cloud-scale data centres and on-device PC silicon, working as a continuum.<br>Amon's framing was sharper. He declared 2026 the "Year of the Agent," introduced Snapdragon C for $300 entry-level Windows AI PCs and Snapdragon X2 Elite for premium ones, and unveiled Dragonfly, a brand-new data-centre AI inference chip line that marks Qualcomm's full entry into the cloud silicon market. He called the underlying strategy the "Computing Continuum" — workloads dynamically allocated across devices, the edge, and the cloud. "The agent isn't tied to the device," he said. "It moves with the user."<br>Tan's Intel keynote landed the most explicit version of the case. On stage with Perplexity CEO Aravind Srinivas, Intel ran a live hybrid-inference demo on a Core Ultra Series 3 laptop: the local model flagged what was sensitive, kept that on the device, and sent only non-sensitive material to the cloud. Tan put it plainly afterwards — "privacy, security, compliance and cost are driving the need for hybrid compute."<br>Three companies, three stages, three flavours of the same argument. AI is not all going to live in the cloud. Some of it has to live on the device, for privacy or for cost or for both, and the silicon has to deliver that across a range from a phone to a workstation.<br>That is essentially the case Apple has been making since the A11 Bionic introduced the Neural Engine in 2017, with the small detail that Apple did it without naming a strategy. Brooks did not use the phrases "Computing Continuum" or "hybrid compute" or "AI factories." His vocabulary was scalable, balanced, unified, and foundational — words Apple has used since the A11. Whether the rest of the industry has caught up by reinventing language or by adopting Apple's positions wholesale is the open question. What is no longer the question this Computex week is whether on-device AI matters.<br>The fundamentals haven't moved<br>Brooks returned repeatedly to four foundations: a scalable architecture, a balanced architecture, unified memory, and an insistence on performance per watt. None of those words have changed since the early days of Apple Silicon. What has changed is how aggressively each has been scaled.<br>The Neural Engine, in particular, has gone from a single block on a phone chip to a feature shared across A-series and M-series, with billions of Apple devices now AI-accelerated by Brooks's count. The framework story rode along in parallel — Core ML in 2017, then today's MLX open-source project, Metal Performance Shaders, the Foundation Models API and the Apple Intelligence APIs. They all sit atop the Neural Engine, the CPU, or the GPU.<br>Asked about what's actually hard about this — what...