Run local agentic AI on the Mac using MLX (WWDC 2026) [video]

sebiw1 pts0 comments

Run local agentic AI on the Mac using MLX - WWDC26 - Videos - Apple Developer

View in English

-->

More Videos

About

Summary

Transcript

Code

Run local agentic AI on the Mac using MLX

Run AI agents locally with privacy, low latency, and offline access. Dive into how MLX advancements and Mac hardware make powerful agentic workflows possible entirely on-device. You'll explore code agents such as OpenCode, see how they integrate into Xcode, learn techniques for multi-Mac scaling, and discover how to integrate tools seamlessly — without ever leaving your machine.

Chapters

0:00 - Introduction

0:32 - The chat and agentic loop

2:42 - Local agentic AI stack

4:36 - Setting up your own agent

5:39 - Making agents fast

6:53 - Concurrency and distributed inference

9:20 - More examples

13:01 - Next steps

Resources

MLX Swift LM on GitHub

MLX Swift Examples

MLX Examples

MLX Swift

MLX LM - Python API

MLX Explore - Python API

MLX Framework

MLX

HD Video

SD Video

Related Videos

WWDC26

Explore distributed inference and training with MLX

Explore numerical computing in Swift with MLX

WWDC25

Explore large language models on Apple silicon with MLX

Get started with MLX for Apple silicon

Search this video…

Hi, I'm Angelos, an engineer on the MLX team. Today I'm going to show you how to build and run agentic AI workflows entirely on your Mac using MLX. No cloud, no API keys, just your hardware doing the work. Over the past year, AI agents have gone from research prototypes to everyday productivity tools. But before we talk about agents, let's look at what we had before.<br>Here's the chat experience you're familiar with. You send a prompt to the language model. The model sends a response back. If you need to act on that response, run a command, check a file, or fix an error, that's on you. But now you're talking to an agent. The agent talks to the model to decide what to do. Then it calls tools to actually do it: running commands, reading files, hitting APIs — It observes the results and goes back to the model to figure out the next step. User to agent. Agent to model. Agent to tools. This is the agentic loop. And it keeps cycling until your task is done. What makes this particularly exciting on Apple silicon is that the entire loop can run locally. Your data stays on your machine; AI is available anywhere at any time and there are no usage costs. Let me now show you what this looks like in practice. Here I have an agent running locally on my Mac. On my screen you can see the setup: on the left, MLX running the model, and on the right the OpenCode agent I am interacting with. I asked it to fetch the recent pull requests from our MLX repository, summarize the changes, and identify anything that needs my attention. The model reasons about the request, calls the GitHub CLI to fetch PR data, reads through the diffs, and produces a concise summary. All of this is happening locally, the model runs on my hardware and only the git commands reach the network. Well it seems like I have a lot of work to do after finishing this video. Now that you've seen what's possible, let me walk you through how we'll get there today. We'll start by introducing the local agentic AI stack, the four layers that make all of this work, from MLX at the foundation all the way up to the agent. Then I'll show you step-by-step how to set up your own local agent. After that, we'll look at how MLX gets the most out of your hardware to make agents fast. And finally, we'll go through more live demos, including building a SwiftUI app from scratch and fixing a bug in Xcode. Let's start with the stack.<br>The stack that powers local agentic AI on the Mac has four layers. Let me walk you through each one, starting from the bottom. At the bottom is MLX, our open-source array framework purpose-built for Apple silicon. It handles all the low-level computation, Metal acceleration, and memory management. This is the foundation everything else is built on. One level up, we have the language model layer. MLX-LM provides everything you need to load, run, quantize, and fine-tune large language models. It supports thousands of models from HuggingFace and gives you both CLI tools and a Python API. If you saw our sessions last year, this is what we covered in depth. But to serve an agent, we need something more: a persistent server with a standard API. That's where MLX-LM Server comes in. This is an OpenAI-compatible HTTP server that exposes your local model through a standard API. It supports structured tool calling so the model can invoke functions reliably, and reasoning models that can analyze complex problems step-by-step before responding. It's a drop-in replacement for any cloud LLM API. And at the top of the stack, we have the agent itself. This can be any framework or tool that speaks the OpenAI chat completions protocol: Xcode, OpenCode, Pi agent, a custom script, or anything else. Because MLX-LM Server provides a standard interface, any agent...

agent model agentic local agents from

Related Articles