I-Regexp: An Interoperable Regular Expression Format

tonyg1 pts0 comments

RFC 9485 - I-Regexp: An Interoperable Regular Expression Format

Light

Dark

Auto

RFC 9485<br>I-Regexp<br>October 2023

Bormann & Bray<br>Standards Track<br>[Page]

Stream:<br>Internet Engineering Task Force (IETF)<br>RFC:<br>9485<br>Category:<br>Standards Track<br>Published:

October 2023

ISSN:<br>2070-1721<br>Authors:

C. Bormann

Universität Bremen TZI

T. Bray

Textuality

RFC 9485

I-Regexp: An Interoperable Regular Expression Format

Abstract

This document specifies I-Regexp, a flavor of regular expression that is<br>limited in scope with the goal of interoperation across many different<br>regular expression libraries.¶

Status of This Memo

This is an Internet Standards Track document.¶

This document is a product of the Internet Engineering Task Force<br>(IETF). It represents the consensus of the IETF community. It has<br>received public review and has been approved for publication by<br>the Internet Engineering Steering Group (IESG). Further<br>information on Internet Standards is available in Section 2 of<br>RFC 7841.¶

Information about the current status of this document, any<br>errata, and how to provide feedback on it may be obtained at<br>https://www.rfc-editor.org/info/rfc9485.¶

Copyright Notice

Copyright (c) 2023 IETF Trust and the persons identified as the<br>document authors. All rights reserved.¶

This document is subject to BCP 78 and the IETF Trust's Legal<br>Provisions Relating to IETF Documents<br>(https://trustee.ietf.org/license-info) in effect on the date of<br>publication of this document. Please review these documents<br>carefully, as they describe your rights and restrictions with<br>respect to this document. Code Components extracted from this<br>document must include Revised BSD License text as described in<br>Section 4.e of the Trust Legal Provisions and are provided without<br>warranty as described in the Revised BSD License.¶

Table of Contents

1. Introduction

This specification describes an interoperable regular expression (abbreviated as "regexp") flavor, I-Regexp.¶

I-Regexp does not provide advanced regular expression features such as capture groups, lookahead, or backreferences.<br>It supports only a Boolean matching capability, i.e., testing whether a given regular expression matches a given piece of text.¶

I-Regexp supports the entire repertoire of Unicode characters (Unicode<br>scalar values); both the I-Regexp strings themselves and the strings<br>they are matched against are sequences of Unicode scalar values (often<br>represented in UTF-8 encoding form [STD63] for interchange).¶

I-Regexp is a subset of XML Schema Definition (XSD) regular expressions [XSD-2].¶

This document includes guidance for converting I-Regexps for use with several well-known regular expression idioms.¶

The development of I-Regexp was motivated by the work of the JSONPath Working Group (WG). The WG wanted to include support for the use of regular expressions in JSONPath filters<br>in its specification [JSONPATH-BASE], but was unable to find a useful<br>specification for regular expressions that would be interoperable across the popular libraries.¶

1.1. Terminology

This document uses the abbreviation "regexp" for what is usually<br>called a "regular expression" in programming.<br>The term "I-Regexp" is used as a noun meaning a character string (sequence of<br>Unicode scalar values) that conforms to the requirements<br>in this specification; the plural is "I-Regexps".¶

This specification uses Unicode terminology; a good entry point is provided by [UNICODE-GLOSSARY].¶

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",<br>"MAY", and "OPTIONAL" in this document are to be interpreted as<br>described in BCP 14 [RFC2119] [RFC8174]<br>when, and only when, they appear in all capitals, as shown here.¶

The grammatical rules in this document are to be interpreted as ABNF,<br>as described in [RFC5234] and [RFC7405], where the "characters" of<br>Section 2.3 of [RFC5234] are Unicode scalar values.¶

2. Objectives

I-Regexps should handle the vast majority of practical cases where a<br>matching regexp is needed in a data-model specification or a query-language expression.¶

At the time of writing, an editor of this document conducted a survey of the regexp syntax<br>used in recently published RFCs. All examples found there should be covered by I-Regexps,<br>both syntactically and with their intended semantics.<br>The exception is the use of multi-character escapes, for which<br>workaround guidance is provided in Section 5.¶

3. I-Regexp Syntax

An I-Regexp MUST conform to the ABNF specification in<br>Figure 1.¶

i-regexp = branch *( "|" branch )<br>branch = *piece<br>piece = atom [ quantifier ]<br>quantifier = ( "*" / "+" / "?" ) / range-quantifier<br>range-quantifier = "{" QuantExact [ "," [ QuantExact ] ] "}"<br>QuantExact = 1*%x30-39 ; '0'-'9'

atom = NormalChar / charClass / ( "(" i-regexp ")" )<br>NormalChar = ( %x00-27 / "," / "-" / %x2F-3E ; '/'-'>'<br>/ %x40-5A ; '@'-'Z'<br>/ %x5E-7A ; '^'-'z'<br>/ %x7E-D7FF ; skip surrogate code points<br>/ %xE000-10FFFF )<br>charClass = "." /...

regexp document regular expression ietf specification

Related Articles