Putting Apple's on-device Foundation Models into three native Mac apps | Empiric Apps
Blog · June 10, 2026<br>Putting Apple's on-device Foundation Models into three native Mac apps<br>Over the last few weeks I added on-device AI to all three of my macOS apps - a Homebrew GUI (Tappie), a dual-pane file manager (EmpiricCommander), and a Docker and Kubernetes GUI (Zenithal). They run on Apple's Foundation Models framework: the same small language model that powers Apple Intelligence, running entirely on the device. No server, no API key, no per-token bill, and nothing leaves the Mac.<br>The interesting part was not the API. It was deciding what a small on-device model is actually allowed to do in a tool people use on their real files and containers. Here is what worked, the rule I applied everywhere, and the features I cut because the model could not do them honestly.<br>One rule everywhere: the model proposes, the app disposes<br>The framing that made all of this safe is simple. The model never performs an action and never produces a fact the app then trusts. It only ever proposes something, and the proposal is constrained to a set of options the app already knows how to validate. The user confirms through the same UI they would have used by hand. If the model returns garbage, the worst case is a proposal that fails validation and is discarded - never a renamed-into-oblivion folder or a hallucinated security finding.<br>Concretely, every feature is double-gated. It only appears when SystemLanguageModel.default.availability reports the model is present (Apple silicon, macOS 26, Apple Intelligence turned on), behind a compile-time #available check. When the model is not there, the app behaves exactly as it always did. AI is strictly additive over the deterministic features that were already shipping.<br>Tappie: plain language into a real filter<br>Tappie is a GUI over Homebrew. It already had an advanced filter with structured predicates - installed state, outdated, cask vs formula, license, tap, and so on. The AI feature, Smart Filter, does one thing: it turns a phrase like "casks with updates, MIT licensed" into that existing structured filter.<br>It does not return a list of packages. It returns a filter object, which the deterministic engine then evaluates and previews. The resolved filter shows up as an editable chip, so the user sees exactly what was applied and can adjust it. The model is a parser from English to a schema I already trusted - not the thing deciding which packages match.<br>EmpiricCommander: classify into a closed set, never compose the dangerous part<br>The batch rename feature is where this mattered most. Renaming files in bulk is exactly the kind of operation where a clever-but- wrong AI suggestion does real damage. So before adding any AI, I rebuilt the rename engine itself. It used to take raw modes (find/replace, regex, sequential). I replaced those with first-class operations: add prefix or suffix, remove text, change case, number sequentially, change extension - plus a regex escape hatch for power users.<br>The AI command bar only classifies a plain- language instruction into that closed set of operations and extracts literal arguments (the prefix string, the casing, the start number). It never composes a regular expression. That is a deliberate boundary: the model cannot author the one construct that could silently mangle a thousand filenames. A destructive rule is structurally impossible because the model has no path to produce one - it can only fill in arguments to operations the engine already validates and previews.<br>The second feature is read-only by construction: AI Summary in the file preview. Select text files - code, config, JSON, CSV, Markdown - and get a short private summary of each without opening them. The input is capped to the model's context window and the app discloses when it truncated. Read-only means there is nothing to undo and nothing to get wrong beyond the summary text itself.<br>Zenithal: explaining failures and triaging scans<br>Zenithal manages Docker and Kubernetes. Two AI surfaces shipped. The first is "Explain this error" on a failed Docker build: it takes the build output and returns a plain-language explanation of what went wrong and how to fix it. Pure text-in, text-out, no actions taken.<br>The second is more interesting and was my first use of guided generation with @Generable: "Triage with AI" over a Trivy or Grype vulnerability scan. The model produces a structured, prioritized summary of the findings. The catch is obvious - a model summarizing a security report could invent a CVE that sounds plausible. So there is an anti-hallucination guard: every CVE the model cites is validated against the actual set of findings in the scan. Anything it made up is dropped before it reaches the UI. The model reorders and explains what is really there; it is not allowed to add to the list.<br>What I cut, and why<br>The honest part. A small on-device model is not a frontier model, and pretending otherwise...