SafeRE: Building a production-quality regex library with agents

eaftan1 pts0 comments

SafeRE: Building a production-quality regex library with agents - Eddie Aftandilian

June 20, 2026 Part of the SafeRE series<br>SafeRE: Building a production-quality regex library with agents<br>This is the first in a series of blog posts about SafeRE, my<br>linear-time regular expression library for Java.

A few months ago, I was having coffee with a friend, and we were talking about how good AI agents<br>had gotten. Recent frontier models like Opus 4.6 and GPT-5.5 felt like a step change: not just<br>better at small coding tasks, but much more capable of working through complex, long-running tasks.<br>I started to wonder: could I build a substantial, production-quality software project purely with<br>agents, with no human-written code at all? How would I ensure correctness if I wasn’t writing every<br>line myself? Could agents make a project like this feasible to attempt in my spare time?

I decided to try an experiment.

When I worked on the Java team at Google, we considered building a linear-time regular expression<br>library in pure Java. A bit of background: many popular regular expression libraries use<br>backtracking engines,<br>which can take exponential time on some patterns and inputs. Attackers can<br>exploit that behavior by sending inputs that cause a service to burn huge amounts of CPU evaluating<br>a regex – a class of attacks known as regular expression denial of service, or<br>ReDoS. A linear-time regex library avoids that failure mode by<br>ensuring that matching time grows linearly with the size of the input. While this might sound like a<br>niche concern, it was a real problem at Google.

Building a new regex library would have been a lot of work. We estimated it at roughly two<br>engineer-years. We couldn’t justify the investment, so we never built it. But I could never fully<br>let go of the idea. It’s the kind of project that’s the reason I got into this field: using<br>computer science to solve a real-world problem.

Perhaps naively, I thought I could build this library in my spare time with agents doing the bulk of<br>the work. So I decided to try it.

The outcome is SafeRE, which is open-source and available at<br>github.com/eaftan/safere.

When I say SafeRE was built with agents, I don’t mean that I told an agent “go build a regex engine”<br>and came back a week later to a finished project. I mean that agents wrote the code, while I<br>directed the work: breaking down tasks, reviewing code, steering the agents when they went<br>in the wrong direction, and shaping how I wanted them to approach the problem. My role was somewhere<br>between tech lead and pair programmer.

Suitability for agents

I initially chose this project because it seemed well-suited to agents. In reality, it turned out to<br>be much harder than I expected. I was overly optimistic at the start.

Why did it seem well-suited?

While it’s technically difficult to build a linear-time regular expression library, the core ideas<br>are well understood. There are existing libraries, RE2 in<br>particular, that SafeRE could learn from. Russ Cox, the author of RE2, also wrote an excellent<br>series of blog posts explaining the ideas behind it. So while the<br>work is difficult, it is not research. We don’t have to invent new techniques to do this.

SafeRE owes a huge debt to RE2. The project started as a Java port of RE2, and I intentionally kept<br>RE2’s license and license header to make that lineage clear. As the project evolved, SafeRE diverged<br>from RE2 because the goal shifted from “RE2 in Java” to drop-in compatibility with<br>java.util.regex, whose semantics are often different. But RE2 was the starting point, both<br>technically and intellectually.

Regular expression engines are also unusually testable. They are deterministic and self-contained.<br>You don’t have to wire together a distributed system to test them. There are also extensive<br>open-source test suites that can be reused or adapted, where licenses permit and with appropriate<br>attribution.

Why was it hard?

This is the part where I was overconfident. Regular expressions are a type of programming language,<br>and they are very widely used. The popular implementations are incredibly battle-tested. My stated<br>goal was for SafeRE to be a drop-in replacement for java.util.regex. That meant SafeRE had to be<br>in the same neighborhood as the Java standard library’s regex implementation for correctness.

java.util.regex has been around since Java 1.4 in<br>2002<br>and has widespread usage. SafeRE was<br>built from scratch. To be viable for production usage, I was going to have to polish it to an<br>incredibly high standard. This turned out to be where I spent most of my time on the project.

A concrete example: SafeRE inherited support for POSIX bracket classes from RE2. In RE2, expressions<br>like [[:lower:]] and [[:digit:]] have special meaning. Java’s regex library accepts those<br>strings, but doesn’t treat them as POSIX bracket classes. In Java, POSIX-style character properties<br>are written with escapes like \p{Lower}. So this was not a parser error or a missing feature. It<br>was...

safere regex agents java library time

Related Articles