Making Semgrep rip: How Ripgrep inspired us to shave hours off (some) scans

Semgrep Scan Speed: How We Cut Hours Off File Targeting | Semgrep | Semgrep

At RSA, we launched Semgrep Multimodal to combine AI reasoning with rule-based detection Learn More →

Products

Semgrep Code Find and fix the issues that matter in your code (SAST)

Semgrep Supply Chain Fix vulnerabilities in open source dependencies and block malware

Semgrep Secrets Find and fix hardcoded secrets with semantic analysis

Semgrep Guardian Scan and fix AI-generated code the moment it's written

Multimodal Combine AI reasoning with rule-based analysis for detection, triage, and remediation

Semgrep AppSec Platform Automate, manage, and enforce security across your organization

Semgrep Workflows Build and deploy security pipelines that combine static analysis with AI at scale

Product Updates Stay up to date on changes to the Semgrep platform, big and small

Solutions

Open-Source Malware Protection Protect against software supply chain attacks

Static application security testing Increase security while accelerating development

OWASP Top 10 Prevent the most critical web application security risks

Secure Guardrails Protect Your Code with Secure Guardrails

Fintech Mitigate software supply chain risks

SaaS & Cloud Increase security while accelerating development

Resources

Docs Want to read all the docs? Start here

Blog Get the latest news about Semgrep

ROI Calculator See how Semgrep can save you time and money

Community Slack Join the friendly Slack group to ask questions or share feedback

Events Join us at a Semgrep Event!

Case Studies See why users love Semgrep

Video Library View our library of on-demand webinars

Community Edition

Company

About The Semgrep story & values

Careers Join the team!

Partners Become a Semgrep partner

Pricing

Product support

Book demo

Try for free

Application Security

Making Semgrep rip: How Ripgrep inspired us to shave hours off (some) scans

Semgrep 1.162.0 fixes a file targeting bottleneck that caused scans on large repos to take hours. By replacing expensive regex calls with string comparisons and a constant-time hash table lookup, P99 diff scan times dropped from ~60 minutes to under 12, with one customer repo improving from 7.5 hours to under 2 minutes.

Ben Kettle

June 10th, 2026

TL;DR: Semgrep's file targeting step, which filters files against ignore patterns before scanning, had edge cases which could cause some scans to take hours. Tracing data showed the culprit: for large repos, Semgrep was making millions of regex calls to match file paths against ignore patterns. We fixed this by replacing most regex lookups with equivalent string comparisons and building a hash table index to check arbitrarily many patterns in constant time. With these changes, a customer repo that previously spent 7.5 hours on the problematic file targeting step now spends under 2 minutes. Across our fleet, the 99th percentile diff scan duration dropped from nearly an hour to under 12 minutes. These improvements shipped in Semgrep 1.162.0 and are available in both the open-source Semgrep CE and Semgrep Pro. Semgrep supports ignores from .gitignore, .semgrepignore, our web backend, or from a list of default ignores. These can be customized to tune findings: if you don't care about findings in test files you can exclude tests/, if you don't care about findings in generated protobuf code you can exclude protos/, or if you know that you only use Python for scripting and never deploy it you can exclude .py. Semgrep ignores certain low-signal files like .min.js out of the box. To handle these ignores, one of the first steps of a Semgrep scan is file targeting. This step collects all files in the project and applies gitignore and semgrepignore filters to narrow them down to the set of targets that should be scanned. Unfortunately, tracing data from our scanning fleet showed that this file targeting step was taking inordinately long on many repos. While the median duration of the file targeting step is only 30ms, the 90th percentile duration for the step is around a minute, the 99th percentile is in the tens of minutes, and the maximum is in hours. While most scans are not affected, this slowness has a real effect on both customers and Semgrep's bottom line. Customers depend on Semgrep to quickly scan their developers' PRs and, with Semgrep Managed Scans, Semgrep pays for far more compute minutes on beefy scan instances than we feel this trivial pattern matching deserves. What is .gitignore? What is .semgrepignore? Semgrep respects both .gitignore and .semgrepignore: if a file is ignored by either of these, it will not be scanned. Git reads .gitignore files from the filesystem and uses the patterns contained inside to determine which files should be versioned by git. Semgrep follows the .gitignore specification for its own ignore file, .semgrepignore, that allows users to exclude files from scans that they do not want excluded by Git. Semgrep also supports ignoring...

Making Semgrep rip: How Ripgrep inspired us to shave hours off (some) scans

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs