Frame: Grounding LLM Vulnerability Detection with a Sound Separation-Logic Core

codelion1 pts0 comments

Frame: Grounding LLM Vulnerability Detection with a Sound Separation-Logic Core — Lambda Security Skip to content<br>Abstract

Static application security testing lives with a tension between recall and precision. Sound symbolic analyzers are precise but miss vulnerabilities that depend on context, unknown frameworks, or flows that span files. Pattern scanners cast a wider net and pay for it in false positives. Pure-LLM scanners are broad but ungrounded and non-deterministic. Frame is a neuro-symbolic SAST built around one idea: keep a sound symbolic engine as the backbone, add an LLM for breadth, and have the symbolic engine ground and verify what the LLM proposes. The symbolic core does taint analysis and separation-logic verification with Z3. An optional LLM layer detects vulnerabilities the core misses, including cross-file flows, via an agentic loop that reads and greps the repository. It checks each proposal against the core’s own sink model and tiers findings so LLM output is never confused with proven results. A final pass triages away false positives. On the Endor Labs real-world corpus (five production applications, 193 model-judged pooled vulnerabilities) Frame’s full mode reaches 0.67 recall at 0.51 precision (F1 0.58) against Semgrep OSS’s 0.52 / 0.40 / 0.45. The LLM layer recovers about 65 confirmed vulnerabilities that both the symbolic engine and Semgrep miss, across Java, JavaScript/TypeScript, and C#, including a C# application where both traditional engines report nothing. The whole layer runs on a local model. We report the limitations of the evaluation candidly, including the fact that the pooled ground truth is enriched by Frame’s own LLM detection.

1. Introduction

A security scanner that a developer trusts has to be both precise and thorough. In practice tools sit at one end of that trade-off. Sound symbolic analysis proves properties and stays precise, but it only finds what its models can express: a taint spec it lacks, a framework it does not know, or a flow that crosses a file boundary the analysis did not follow. Pattern-based scanners match syntactic signatures across everything, so they surface more but bury the real findings in noise. A third option, prompting a large language model to find bugs, is broad and context-aware but ungrounded: it hallucinates flows, it is non-deterministic, and it offers no proof.

Frame takes a neuro-symbolic position. The symbolic engine is the sound backbone and stays the default. An LLM is added for what it is good at, breadth and context, and the symbolic engine is used to ground it. Concretely:

The LLM detects vulnerabilities the symbolic core misses. For flows that span files it runs as an agent with read_file and grep tools over the repository.

Every LLM finding is checked against Frame’s own sink model. A finding whose claimed sink is one the symbolic engine recognizes is promoted to a higher-confidence tier. LLM output and proven symbolic output are never merged into one undifferentiated list.

A triage pass drops confident false positives, recovering precision.

This report describes the architecture, the local-LLM implementation, and an evaluation on real-world code. Contributions:

A neuro-symbolic SAST design where the symbolic engine grounds and tiers LLM findings rather than the LLM overriding the symbolic result.

An agentic, cross-file detection loop driven by a local, OpenAI-compatible model.

Symbolic sink-verification of LLM claims, including the cross-file case, which yields a high-precision subset.

A real-world evaluation with an explicit, honest account of its limitations.

The headline is one table. Everything after it is how we got there and why the numbers should be read with care.

ScannerRecallPrecisionF1Frame, symbolic core only0.370.450.41Frame, + LLM detection0.710.460.56Frame, + LLM detection + triage0.670.510.58Semgrep OSS (p/default)0.520.400.45Endor AI SAST (published, different ground truth)0.440.500.47<br>2. Motivating examples

Precision: a finding that is real-looking but not real. Frame’s symbolic layer, run over a real application, flags a regular-expression denial-of-service pattern inside a bundled, minified JavaScript library it ships. The pattern is there, but the library code is not reachable from any request the application serves. Across the corpus, 264 of Frame’s 371 symbolic false positives are exactly this class: CWE-1333 (ReDoS) and CWE-1321 (prototype pollution) matched inside vendored libraries. A tool that reports them is not wrong about the syntax; it is wrong about the context. Triage exists to remove them.

Recall: a vulnerability the symbolic engine cannot express. Consider this handler from the C# application in the corpus (DotNetFlicks.Web/Controllers/MovieController.cs):

/*Sprinkle on some CSRF<br>*[ValidateAntiForgeryToken]<br>*/<br>[HttpPost]<br>public ActionResult Edit(EditMovieViewModel vm)<br>...<br>This is a state-changing POST action whose anti-forgery-token attribute has been commented out. It is...

symbolic frame engine core real sound

Related Articles