Run local agentic AI on the Mac using MLX - WWDC26 - Videos - Apple Developer
View in English
-->
More Videos
About
Summary
Transcript
Code
Run local agentic AI on the Mac using MLX
Run AI agents locally with privacy, low latency, and offline access. Dive into how MLX advancements and Mac hardware make powerful agentic workflows possible entirely on-device. You'll explore code agents such as OpenCode, see how they integrate into Xcode, learn techniques for multi-Mac scaling, and discover how to integrate tools seamlessly — without ever leaving your machine.
Chapters
0:00 - Introduction
0:32 - The chat and agentic loop
2:42 - Local agentic AI stack
4:36 - Setting up your own agent
5:39 - Making agents fast
6:53 - Concurrency and distributed inference
9:20 - More examples
13:01 - Next steps
Resources
MLX Swift LM on GitHub
MLX Swift Examples
MLX Examples
MLX Swift
MLX LM - Python API
MLX Explore - Python API
MLX Framework
MLX
HD Video
SD Video
Related Videos
WWDC26
Explore distributed inference and training with MLX
Explore numerical computing in Swift with MLX
WWDC25
Explore large language models on Apple silicon with MLX
Get started with MLX for Apple silicon
Search this video…
Hi, I'm Angelos, an engineer on the MLX team. Today I'm going to show you how to build and run agentic AI workflows entirely on your Mac using MLX. No cloud, no API keys, just your hardware doing the work. Over the past year, AI agents have gone from research prototypes to everyday productivity tools. But before we talk about agents, let's look at what we had before.<br>Here's the chat experience you're familiar with. You send a prompt to the language model. The model sends a response back. If you need to act on that response, run a command, check a file, or fix an error, that's on you. But now you're talking to an agent. The agent talks to the model to decide what to do. Then it calls tools to actually do it: running commands, reading files, hitting APIs — It observes the results and goes back to the model to figure out the next step. User to agent. Agent to model. Agent to tools. This is the agentic loop. And it keeps cycling until your task is done. What makes this particularly exciting on Apple silicon is that the entire loop can run locally. Your data stays on your machine; AI is available anywhere at any time and there are no usage costs. Let me now show you what this looks like in practice. Here I have an agent running locally on my Mac. On my screen you can see the setup: on the left, MLX running the model, and on the right the OpenCode agent I am interacting with. I asked it to fetch the recent pull requests from our MLX repository, summarize the changes, and identify anything that needs my attention. The model reasons about the request, calls the GitHub CLI to fetch PR data, reads through the diffs, and produces a concise summary. All of this is happening locally, the model runs on my hardware and only the git commands reach the network. Well it seems like I have a lot of work to do after finishing this video. Now that you've seen what's possible, let me walk you through how we'll get there today. We'll start by introducing the local agentic AI stack, the four layers that make all of this work, from MLX at the foundation all the way up to the agent. Then I'll show you step-by-step how to set up your own local agent. After that, we'll look at how MLX gets the most out of your hardware to make agents fast. And finally, we'll go through more live demos, including building a SwiftUI app from scratch and fixing a bug in Xcode. Let's start with the stack.<br>The stack that powers local agentic AI on the Mac has four layers. Let me walk you through each one, starting from the bottom. At the bottom is MLX, our open-source array framework purpose-built for Apple silicon. It handles all the low-level computation, Metal acceleration, and memory management. This is the foundation everything else is built on. One level up, we have the language model layer. MLX-LM provides everything you need to load, run, quantize, and fine-tune large language models. It supports thousands of models from HuggingFace and gives you both CLI tools and a Python API. If you saw our sessions last year, this is what we covered in depth. But to serve an agent, we need something more: a persistent server with a standard API. That's where MLX-LM Server comes in. This is an OpenAI-compatible HTTP server that exposes your local model through a standard API. It supports structured tool calling so the model can invoke functions reliably, and reasoning models that can analyze complex problems step-by-step before responding. It's a drop-in replacement for any cloud LLM API. And at the top of the stack, we have the agent itself. This can be any framework or tool that speaks the OpenAI chat completions protocol: Xcode, OpenCode, Pi agent, a custom script, or anything else. Because MLX-LM Server provides a standard interface, any agent...