GitHub - mnemnion/mvzr: Minimum Viable Zig Regex · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
mnemnion
mvzr
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star<br>128
trunk
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>144 Commits<br>144 Commits
src
src
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
build.zig
build.zig
build.zig.zon
build.zig.zon
pathos.out
pathos.out
View all files
Repository files navigation
mvzr: The Minimum Viable Zig Regex Library
Finding myself in need of a regular expressions library for a Zig<br>project, and needing it to build regex at runtime, not just comptime,<br>I ended up speedrunning a little library for just that purpose.
This is that library. It's a simple bytecode-based VM, inspired by<br>LPEG. Under 2000<br>lines of load-bearing code, no dependencies other than std.
The provided Regex type allows 64 'operations' and 8 unique ASCII<br>character sets. If you would like more, or less, you can call<br>SizedRegex(num_ops, num_sets) to customize the type.
Installation
Drop the file into your project, or use the Zig build system:
zig fetch --save "https://github.com/mnemnion/mvzr/archive/refs/tags/v0.3.9.tar.gz"
I'll do my best to keep that URL fresh, but it pays to check over here:
For the latest release version.
v0.3.9 only differs from v0.3.8 in metadata, marking it as<br>Zig 0.16 compatible. It works fine with Zig 0.15.2, but has the<br>.minimum_zig_version field in the Zon file set higher to cooperate<br>with modern practices.
Features
Zero allocation, comptime and runtime compiling and matching
X operations per regex
Y character sets per regex
Greedy qualifiers: *, +, ?
Lazy qualifiers: *?, +?, ??
Possessive/eager qualifiers: *+, ++, ?+
Alternation: foo|bar|baz
Grouping foo|(bar|baz)+|quux
Sets: [abc], [^abc], [a-z], [^a-z], [\w+-], [\x04-\x1b]
Built-in character groups (ASCII): \w, \W, \s, \S, \d, \D
Escape sequences: \t, \n, \r, \xXX hex format
Same set as Zig: if you need the weird C ones, use \x format
Begin and end ^ and $
Word boundaries \b, \B
{M}, {M,}, {M,N}, {,N}
Limitations and Quirks
Minimal multibyte / Unicode support
This has improved somewhat. A regex like λ? now matches an<br>optional lambda, not just an optional final byte. Additionally,<br>ranges of bytes greater than 0x7f are now supported, this (with<br>some care) can match certain sets: for instance (\xce[\x91- \xa9])+ will match a string of uppercase Greek letters,<br>\xc2[\x80-\x9f] matches a C1 control code, and so on. But<br>you'll still need to work at the byte level, and use \x format,<br>to do these tasks.
No fancy modifiers (you want case-insensitive, great, lowercase your<br>string)
. matches any one byte. [^\n\r] works fine if that's not what you<br>want
Or split into lines first, divide and conquer
Note: $ permits a final newline, but ^ must be the beginning<br>of a string, and $ only matches a final newline.
Backtracks (sorry. For this design to work without backtracking,<br>we need async back)
Compiler does some best-effort validation but I haven't really pounded<br>on it
No capture groups. Divide and conquer
As long as you color within the lines, it should be fine.
This library is not intended for use where an attacker could conceivably<br>control the regex pattern.
Much like managing your own memory, if you know your tools and are smart<br>about it, you can get a lot done with mvzr.
Interface
mvzr.Regex is available at comptime or runtime, and returns an<br>mvzr.Match, consisting of a .slice field containing the match,<br>as well as the .start and .end locations in the haystack. This<br>is a borrowed slice, to own it, call match.toOwnedMatch(allocator),<br>and deallocate later with match.deinit(allocator), or just free the<br>.slice.
Similarly, if you need to store a Regex or SizedRegex for<br>later, call regex.toOwnedRegex(allocator), freeing later with<br>allocator.destroy(heap_regex).
// aka SizedRegex(64, 8)<br>const regex: mvzr.Regex = mvzr.compile(patt_str).?;<br>// or mvzr.Regex.compile(patt_str)<br>const match: mvzr.Match =...