Semgrep Scan Speed: How We Cut Hours Off File Targeting | Semgrep | Semgrep
At RSA, we launched Semgrep Multimodal to combine AI reasoning with rule-based detection Learn More →
Products
Semgrep Code<br>Find and fix the issues that matter in your code (SAST)
Semgrep Supply Chain<br>Fix vulnerabilities in open source dependencies and block malware
Semgrep Secrets<br>Find and fix hardcoded secrets with semantic analysis
Semgrep Guardian<br>Scan and fix AI-generated code the moment it's written
Multimodal<br>Combine AI reasoning with rule-based analysis for detection, triage, and remediation
Semgrep AppSec Platform<br>Automate, manage, and enforce security across your organization
Semgrep Workflows<br>Build and deploy security pipelines that combine static analysis with AI at scale
Product Updates<br>Stay up to date on changes to the Semgrep platform, big and small
Solutions
Open-Source Malware Protection<br>Protect against software supply chain attacks
Static application security testing<br>Increase security while accelerating development
OWASP Top 10<br>Prevent the most critical web application security risks
Secure Guardrails<br>Protect Your Code with Secure Guardrails
Fintech<br>Mitigate software supply chain risks
SaaS & Cloud<br>Increase security while accelerating development
Resources
Docs<br>Want to read all the docs? Start here
Blog<br>Get the latest news about Semgrep
ROI Calculator<br>See how Semgrep can save you time and money
Community Slack<br>Join the friendly Slack group to ask questions or share feedback
Events<br>Join us at a Semgrep Event!
Case Studies<br>See why users love Semgrep
Video Library<br>View our library of on-demand webinars
Community Edition
Company
About<br>The Semgrep story & values
Careers<br>Join the team!
Partners<br>Become a Semgrep partner
Pricing
Sign in
Product support
Contact us
Book demo
Try for free
Application Security
Making Semgrep rip: How Ripgrep inspired us to shave hours off (some) scans
Semgrep 1.162.0 fixes a file targeting bottleneck that caused scans on large repos to take hours. By replacing expensive regex calls with string comparisons and a constant-time hash table lookup, P99 diff scan times dropped from ~60 minutes to under 12, with one customer repo improving from 7.5 hours to under 2 minutes.
Ben Kettle
June 10th, 2026
TL;DR: Semgrep's file targeting step, which filters files against ignore patterns before scanning, had edge cases which could cause some scans to take hours. Tracing data showed the culprit: for large repos, Semgrep was making millions of regex calls to match file paths against ignore patterns. We fixed this by replacing most regex lookups with equivalent string comparisons and building a hash table index to check arbitrarily many patterns in constant time. With these changes, a customer repo that previously spent 7.5 hours on the problematic file targeting step now spends under 2 minutes. Across our fleet, the 99th percentile diff scan duration dropped from nearly an hour to under 12 minutes. These improvements shipped in Semgrep 1.162.0 and are available in both the open-source Semgrep CE and Semgrep Pro.<br>Semgrep supports ignores from .gitignore, .semgrepignore, our web backend, or from a list of default ignores. These can be customized to tune findings: if you don't care about findings in test files you can exclude tests/, if you don't care about findings in generated protobuf code you can exclude protos/, or if you know that you only use Python for scripting and never deploy it you can exclude .py. Semgrep ignores certain low-signal files like .min.js out of the box.<br>To handle these ignores, one of the first steps of a Semgrep scan is file targeting. This step collects all files in the project and applies gitignore and semgrepignore filters to narrow them down to the set of targets that should be scanned.<br>Unfortunately, tracing data from our scanning fleet showed that this file targeting step was taking inordinately long on many repos. While the median duration of the file targeting step is only 30ms, the 90th percentile duration for the step is around a minute, the 99th percentile is in the tens of minutes, and the maximum is in hours.<br>While most scans are not affected, this slowness has a real effect on both customers and Semgrep's bottom line. Customers depend on Semgrep to quickly scan their developers' PRs and, with Semgrep Managed Scans, Semgrep pays for far more compute minutes on beefy scan instances than we feel this trivial pattern matching deserves.<br>What is .gitignore? What is .semgrepignore?<br>Semgrep respects both .gitignore and .semgrepignore: if a file is ignored by either of these, it will not be scanned. Git reads .gitignore files from the filesystem and uses the patterns contained inside to determine which files should be versioned by git. Semgrep follows the .gitignore specification for its own ignore file, .semgrepignore, that allows users to exclude files from scans that they do not want excluded by Git. Semgrep also supports ignoring...