How Semgrep Cut Taint Analysis Time by 75% | Semgrep
At RSA, we launched Semgrep Multimodal to combine AI reasoning with rule-based detection Learn More →
Products
Semgrep Code<br>Find and fix the issues that matter in your code (SAST)
Semgrep Supply Chain<br>Fix vulnerabilities in open source dependencies and block malware
Semgrep Secrets<br>Find and fix hardcoded secrets with semantic analysis
Multimodal<br>Combine AI reasoning with rule-based analysis for detection, triage, and remediation
Semgrep AppSec Platform<br>Automate, manage, and enforce security across your organization
Semgrep Workflows<br>Build and deploy security pipelines that combine static analysis with AI at scale
Product Updates<br>Stay up to date on changes to the Semgrep platform, big and small
Solutions
Semgrep Guardian<br>Scan and fix AI-generated code the moment it's written
Open-Source Malware Protection<br>Protect against software supply chain attacks
Static application security testing<br>Increase security while accelerating development
OWASP Top 10<br>Prevent the most critical web application security risks
Secure Guardrails<br>Protect Your Code with Secure Guardrails
Fintech<br>Mitigate software supply chain risks
SaaS & Cloud<br>Increase security while accelerating development
Resources
Docs<br>Want to read all the docs? Start here
Blog<br>Get the latest news about Semgrep
ROI Calculator<br>See how Semgrep can save you time and money
Community Slack<br>Join the friendly Slack group to ask questions or share feedback
Events<br>Join us at a Semgrep Event!
Case Studies<br>See why users love Semgrep
Video Library<br>View our library of on-demand webinars
Community Edition
Company
About<br>The Semgrep story & values
Careers<br>Join the team!
Partners<br>Become a Semgrep partner
Pricing
Sign in
Product support
Contact us
Book demo
Try for free
Application Security
How We Cut Semgrep’s Taint Analysis Time by 75%
Semgrep Pro Engine 1.158.0 ships a redesigned taint analysis engine delivering up to 75% faster full scans. By rearchitecting how taint analysis runs across files, P95 scan times dropped from 10 minutes to 7:30, P99 became significantly more consistent, and some large repos saw 3x+ improvements.
Austin Theriault
June 5th, 2026
Semgrep Pro Engine 1.158.0 and onwards is now shipping a redesigned taint analysis engine resulting in up to 75% speedups on full scans. This is the second in a three part series on improving the performance of Semgrep.<br>In the previous blog post we discussed a new continuous profiler we released for OCaml, called Pyro Caml. The motivation behind building it was so we could improve the performance of Semgrep, whose core analysis engine is written in OCaml. In this blog post we’ll explore how we used it to validate where we thought our biggest bottleneck was, and how doing something once instead of twice is a great way to improve the performance of your programs. Specifically, the 95th percentile of Semgrep scan times went from 10 minutes, to 7 minutes 30 seconds, and our P99 went from a very noisy ~45 minutes on average, to a much more consistent 35 minutes. Additionally, the number of scans reaching the max allowable scan time dropped significantly.<br>Motivation: Why Taint Analysis Was Costing a Third of Our CPU Time<br>How Taint Analysis Works: Sources, Sinks, and Data Flow<br>The Semgrep engine has something called a matching engine that runs a set of rules. There’s a bunch of great explanations of matching out there, but in short, Semgrep will take patterns like do_thing( … ) and flag code such as do_thing(arg1, arg2) or do_thing(x, y). This is useful for finding all sorts of vulnerabilities, but it was supercharged when we released support for taint analysis, all the way back in 2021. This would let you find code patterns such as:
const s = new Sandbox();<br>var user_input = "lol(" + req.query.userInput + ")";<br>var code = Math.random() > 0.5 ? user_input : "all good";<br>// ruleid:express-sandbox-code-injection<br>s.run(code);
Specifically, you could write patterns that track the flow of data through a program, and flag places they might “taint” in that flow, such as tracking user input flowing into an arbitrary eval function, like in the above example.<br>This first pass of taint analysis was only intra-procedural , meaning that within a file we could usually detect how the data flows. If the user input flowed into some function that was defined in another file though, and that other function had an eval, we wouldn’t detect it. This is called interfile analysis, for obvious reasons.<br>Many users had been asking for exactly this, and so two years later, in February 2023, we released the pro engine, the first version of Semgrep capable of tracking data flows across files.<br>Here's roughly how that interfile analysis worked. First, we'd take the source code and build a naming environment: a mapping that tells us the function foo called in file1 is the same foo defined in file2. With that in place, we'd compute the taint configs for the rule. Every...