Beyond Fable: Can a Local LLM Replace Cloud AI for Security Code Reviews

Beyond Fable: Can a Local LLM Replace Cloud AI for Security Code Reviews - SRLabs Research

SRLabs — Home

Get help now — 24/7 IR

Back Software AssuranceCode AuditSecure DevelopmentAI<br>2026-06-22 • 20 minute read

Karsten Nohl @karsten-nohl Chief Innovation Officer Allurity

The Problem

Security code review is one of the most valuable — and traditionally labor-intensive — services in cyber security. LLMs have become tireless wingmen in this process: They scan thousands of lines of code, cross-reference CWE databases, and surface patterns that even experienced reviewers might miss. But there's a catch.

Many pentest recipients do not want their source code shared with cloud-hosted services — particularly in finance, government, and critical infrastructure. Sending proprietary code to a third-party LLM creates confidentiality and data residency risks that contractual safeguards with the LLM provider alone cannot fully mitigate.

The resulting dilemma: The best LLMs are cloud-hosted. Those companies who need security reviews the most, often forgo these leading capabilities.

How big is the lead of cloud-hosted models really? We set out to answer a practical question: can a locally-hosted open-weight model produce security findings comparable to frontier cloud models?

Conclusion

We find the answer is: almost — but only with the right scaffolding.

We ran a series of experiments testing the limits of local LLM and found that they work best in tandem with cloud-based frontier models, but without disclosing source code to the cloud:

A Qwen3.6-35B-A3B model with only ~3B active parameters, running entirely on a Mac laptop with no source code leaving the machine, produced finding sets comparable in size to frontier cloud models (GLM-5, Claude Opus 4.6) on both a fintech app and a voting app, with some unique findings of its own. It required zero human nudges and completed each codebase in under 90 minutes. For the central task — reading code, discovering vulnerabilities, classifying severity, triaging CVE output — a local model is now in the same league as frontier models.

A caveat: Finding count parity is not capability parity. The claim is that a local model is competitive enough to be useful as part of the pipeline, and that its findings are perceived as equally impactful by experts. This study focuses on the quantitative side, but finding quality was validated by both pentest experts and a developer team.

What a local model does not yet do as well is design the review and consolidate the results . The most effective pipeline we found delegates both of these orchestration tasks to a cloud frontier model — but in neither stage does the cloud see source code. We call this Source-local : the proprietary source code never leaves the machine. Metadata does cross to the cloud (file tree, schema, routes, dependency manifests, and the generated step prompts), which can carry internal names, directory structure, and architecture. "No source leaves the building" is the accurate promise — "nothing leaves" is not.

The scaffolding that makes this work has three parts:

Structured decomposition and prompt generation — a cloud model breaks the review into focused steps and creates step prompts from metadata only (file tree, schema, routes — no source code)

Local tool and LLM output — the prompts execute locally, run standard security tools (e.g., bundler-audit, npm audit, Semgrep, Brakeman) and feed their JSON output to the local model for contextual triage and additional bug hunting

Report consolidation - a final cloud pass merges the step-level findings into a delivery-ready report.

Parts 1 and 3 require no source code exposure to the cloud; Part 2 runs entirely locally.

The resulting best practice is: cloud for prompt engineering, local for execution, cloud for consolidation. The cloud model never sees source code — it designs the review. The local model never needs broad architectural reasoning — it executes focused checks against bundled files.

Figure 1. The Source-local pipeline. The cloud orchestrator designs the review (stage 1) and consolidates findings into a report (stage 3) from metadata only; the local model reads the source and runs the security tools (stage 2). Only step prompts and step-level findings cross the trust boundary — the source code never leaves the machine.

Leveraging Fable 5: the cloud-based orchestration layer is model-agnostic. The orchestrator in stages 1 and 3 need not be an unrestricted frontier model; a model with cybersecurity guardrails handles the job just fine. Claude Fable 5, which ships with deliberate cyber restrictions, designs the review prompts and consolidates the findings with no refusal and no loss of quality, fully matching Claude Opus 4.8 in those roles. This is unsurprising: designing and consolidating a defensive review is knowledge-and-structure work, not...

Beyond Fable: Can a Local LLM Replace Cloud AI for Security Code Reviews

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi