LPeg – Parsing Expression Grammars for Lua

tosh1 pts0 comments

LPeg - Parsing Expression Grammars For Lua

LPeg

Parsing Expression Grammars For Lua, version 1.1

LPeg

Home

Introduction

Functions

Basic Constructions

Grammars

Captures

Some Examples

The re Module

Download

License

Introduction

LPeg is a pattern-matching library for Lua,<br>based on

Parsing Expression Grammars (PEGs).<br>This text is a reference manual for the library.<br>For those starting with LPeg,

Mastering LPeg presents a good tutorial.<br>For a more formal treatment of LPeg,<br>as well as some discussion about its implementation,<br>see

A Text Pattern-Matching Tool based on Parsing Expression Grammars.<br>You may also be interested in my<br>talk about LPeg<br>given at the III Lua Workshop.

Following the Snobol tradition,<br>LPeg defines patterns as first-class objects.<br>That is, patterns are regular Lua values<br>(represented by userdata).<br>The library offers several functions to create<br>and compose patterns.<br>With the use of metamethods,<br>several of these functions are provided as infix or prefix<br>operators.<br>On the one hand,<br>the result is usually much more verbose than the typical<br>encoding of patterns using the so called<br>regular expressions<br>(which typically are not regular expressions in the formal sense).<br>On the other hand,<br>first-class patterns allow much better documentation<br>(as it is easy to comment the code,<br>to break complex definitions in smaller parts, etc.)<br>and are extensible,<br>as we can define new functions to create and compose patterns.

For a quick glance of the library,<br>the following table summarizes its basic operations<br>for creating patterns:

Operator Description<br>lpeg.P(string)<br>Matches string literally<br>lpeg.P(n)<br>Matches exactly n characters<br>lpeg.S(string)<br>Matches any character in string (Set)<br>lpeg.R("xy")<br>Matches any character between x and y (Range)<br>lpeg.utfR(cp1, cp2)<br>Matches an UTF-8 code point between cp1 and<br>cp2<br>patt^n<br>Matches at least n repetitions of patt<br>patt^-n<br>Matches at most n repetitions of patt<br>patt1 * patt2<br>Matches patt1 followed by patt2<br>patt1 + patt2<br>Matches patt1 or patt2<br>(ordered choice)<br>patt1 - patt2<br>Matches patt1 if patt2 does not match<br>-patt<br>Equivalent to ("" - patt)<br>#patt<br>Matches patt but consumes no input<br>lpeg.B(patt)<br>Matches patt behind the current position,<br>consuming no input

As a very simple example,<br>lpeg.R("09")^1 creates a pattern that<br>matches a non-empty sequence of digits.<br>As a not so simple example,<br>-lpeg.P(1)<br>(which can be written as lpeg.P(-1),<br>or simply -1 for operations expecting a pattern)<br>matches an empty string only if it cannot match a single character;<br>so, it succeeds only at the end of the subject.

LPeg also offers the re module,<br>which implements patterns following a regular-expression style<br>(e.g., [09]+).<br>(This module is 270 lines of Lua code,<br>and of course it uses LPeg to parse regular expressions and<br>translate them to regular LPeg patterns.)

Functions

lpeg.match (pattern, subject [, init])

The matching function.<br>It attempts to match the given pattern against the subject string.<br>If the match succeeds,<br>returns the index in the subject of the first character after the match,<br>or the captured values<br>(if the pattern captured any value).

An optional numeric argument init makes the match<br>start at that position in the subject string.<br>As in the Lua standard libraries,<br>a negative value counts from the end.

Unlike typical pattern-matching functions,<br>match works only in anchored mode;<br>that is, it tries to match the pattern with a prefix of<br>the given subject string (at position init),<br>not with an arbitrary substring of the subject.<br>So, if we want to find a pattern anywhere in a string,<br>we must either write a loop in Lua or write a pattern that<br>matches anywhere.<br>This second approach is easy and quite efficient;<br>see examples.

lpeg.type (value)

If the given value is a pattern,<br>returns the string "pattern".<br>Otherwise returns nil.

lpeg.version

A string (not a function) with the running version of LPeg.

lpeg.setmaxstack (max)

Sets a limit for the size of the backtrack stack used by LPeg to<br>track calls and choices.<br>(The default limit is 400.)<br>Most well-written patterns need little backtrack levels and<br>therefore you seldom need to change this limit;<br>before changing it you should try to rewrite your<br>pattern to avoid the need for extra space.<br>Nevertheless, a few useful patterns may overflow.<br>Also, with recursive grammars,<br>subjects with deep recursion may also need larger limits.

Basic Constructions

The following operations build patterns.<br>All operations that expect a pattern as an argument<br>may receive also strings, tables, numbers, booleans, or functions,<br>which are translated to patterns according to<br>the rules of function lpeg.P.

lpeg.P (value)

Converts the given value into a proper pattern,<br>according to the following rules:

If the argument is a pattern,<br>it is returned unmodified.

If the argument is a string,<br>it is translated to a pattern that matches the string literally.

If the argument is a non-negative number n,<br>the result is a pattern that matches exactly n...

lpeg pattern matches patterns string patt

Related Articles