We Cut Semgrep's Taint Analysis Time by 75%

ajbt2001282 pts0 comments

How Semgrep Cut Taint Analysis Time by 75% | Semgrep

At RSA, we launched Semgrep Multimodal to combine AI reasoning with rule-based detection Learn More →

Products

Semgrep Code<br>Find and fix the issues that matter in your code (SAST)

Semgrep Supply Chain<br>Fix vulnerabilities in open source  dependencies and block malware

Semgrep Secrets<br>Find and fix hardcoded secrets with semantic analysis

Multimodal<br>Combine AI reasoning with rule-based analysis for detection, triage, and remediation

Semgrep AppSec Platform<br>Automate, manage, and enforce security across your organization

Semgrep Workflows<br>Build and deploy security pipelines that combine static analysis with AI at scale

Product Updates<br>Stay up to date on changes to the Semgrep platform, big and small

Solutions

Semgrep Guardian<br>Scan and fix AI-generated code the moment it's written

Open-Source Malware Protection<br>Protect against software supply chain attacks

Static application security testing<br>Increase security while accelerating development

OWASP Top 10<br>Prevent the most critical web application security risks

Secure Guardrails<br>Protect Your Code with Secure Guardrails

Fintech<br>Mitigate software supply chain risks

SaaS & Cloud<br>Increase security while accelerating development

Resources

Docs<br>Want to read all the docs? Start here

Blog<br>Get the latest news about Semgrep

ROI Calculator<br>See how Semgrep can save you time and money

Community Slack<br>Join the friendly Slack group to ask questions or share feedback

Events<br>Join us at a Semgrep Event!

Case Studies<br>See why users love Semgrep

Video Library<br>View our library of on-demand webinars

Community Edition

Company

About<br>The Semgrep story & values

Careers<br>Join the team!

Partners<br>Become a Semgrep partner

Pricing

Sign in

Product support

Contact us

Book demo

Try for free

Application Security

How We Cut Semgrep’s Taint Analysis Time by 75%

Semgrep Pro Engine 1.158.0 ships a redesigned taint analysis engine delivering up to 75% faster full scans. By rearchitecting how taint analysis runs across files, P95 scan times dropped from 10 minutes to 7:30, P99 became significantly more consistent, and some large repos saw 3x+ improvements.

Austin Theriault

June 5th, 2026

Semgrep Pro Engine 1.158.0 and onwards is now shipping a redesigned taint analysis engine resulting in up to 75% speedups on full scans. This is the second in a three part series on improving the performance of Semgrep.<br>In the previous blog post we discussed a new continuous profiler we released for OCaml, called Pyro Caml. The motivation behind building it was so we could improve the performance of Semgrep, whose core analysis engine is written in OCaml. In this blog post we’ll explore how we used it to validate where we thought our biggest bottleneck was, and how doing something once instead of twice is a great way to improve the performance of your programs. Specifically, the 95th percentile of Semgrep scan times went from 10 minutes, to 7 minutes 30 seconds, and our P99 went from a very noisy ~45 minutes on average, to a much more consistent 35 minutes. Additionally, the number of scans reaching the max allowable scan time dropped significantly.<br>Motivation: Why Taint Analysis Was Costing a Third of Our CPU Time<br>How Taint Analysis Works: Sources, Sinks, and Data Flow<br>The Semgrep engine has something called a matching engine that runs a set of rules. There’s a bunch of great explanations of matching out there, but in short, Semgrep will take patterns like do_thing( … ) and flag code such as do_thing(arg1, arg2) or do_thing(x, y). This is useful for finding all sorts of vulnerabilities, but it was supercharged when we released support for taint analysis, all the way back in 2021. This would let you find code patterns such as:

const s = new Sandbox();<br>var user_input = "lol(" + req.query.userInput + ")";<br>var code = Math.random() > 0.5 ? user_input : "all good";<br>// ruleid:express-sandbox-code-injection<br>s.run(code);

Specifically, you could write patterns that track the flow of data through a program, and flag places they might “taint” in that flow, such as tracking user input flowing into an arbitrary eval function, like in the above example.<br>This first pass of taint analysis was only intra-procedural , meaning that within a file we could usually detect how the data flows. If the user input flowed into some function that was defined in another file though, and that other function had an eval, we wouldn’t detect it. This is called interfile analysis, for obvious reasons.<br>Many users had been asking for exactly this, and so two years later, in February 2023, we released the pro engine, the first version of Semgrep capable of tracking data flows across files.<br>Here's roughly how that interfile analysis worked. First, we'd take the source code and build a naming environment: a mapping that tells us the function foo called in file1 is the same foo defined in file2. With that in place, we'd compute the taint configs for the rule. Every...

semgrep analysis taint code engine security

Related Articles