What does grep stand for, and the 75 year history of the regular expression

What does grep stand for, and the seventy-five-year history of the regular expression | Mart Traagel

Joe Condon and Ken Thompson working on Belle, 1977 — picture courtesy of Bell Laboratories. Nine years before this photo, Thompson had implemented Stephen Kleene's 1951 regex notation in the QED editor on MIT's CTSS; four years before this photo, he extracted the g/re/p command from QED's Unix successor ed as a standalone utility. The name grep is shorter than the explanation. It comes from a single command in the ed line editor: g/re/p, meaning globally search for a regular expression and print every matching line. Ken Thompson extracted that command into a standalone program in 1973 and the name travelled with the binary. Almost every Linux distribution in 2026 ships the same four letters, doing roughly the same thing, against text streams that no longer look anything like the punched cards Thompson was running it on. The reason the name survived is the same reason the underlying notation survived: a 1951 mathematician wrote down a tiny grammar for describing patterns, and seventy-five years of computing has not produced a better one. ▶ Try the regex tester — live match highlighting, capture groups, and a token-by-token explanation of any pattern. Kleene's 1951 memo The mathematical foundation predates Unix by twenty years. In 1951, Stephen Cole Kleene published a RAND Corporation memorandum titled Representation of Events in Nerve Nets and Finite Automata. The memo expanded in 1956 into a chapter in the Automata Studies volume from Princeton. Kleene was extending Warren McCulloch and Walter Pitts' 1943 paper on neural nets — an attempt to formalise how a network of binary-threshold neurons could compute — and the contribution Kleene made was a notation for describing the set of inputs a finite automaton would accept. He called the sets regular events; the algebra he invented to describe them used three operations: union (now |), concatenation (implicit), and the star (*) for "zero or more repetitions". Those three operators are enough to describe any language recognisable by a finite automaton, and the proof that they suffice is the Kleene theorem. The mathematical category exists exactly because Kleene drew the line: a regular language is one a finite automaton can recognise, and Kleene's algebra is the notation for spelling out which one. The whole thing was an exercise in theoretical computer science. Nobody implemented it on a computer for seventeen years. Thompson and CTSS, 1968 Ken Thompson read Kleene's work in the 1960s while at Bell Labs and recognised that the algebra was the right notation for pattern matching in a text editor. He wrote up the implementation in Regular Expression Search Algorithm in CACM in June 1968 — the first published software implementation of regex. The trick was a just-in-time compiler: Thompson translated each regex into native IBM 7094 machine code at the moment the user typed it, ran the compiled matcher against the text, and discarded the code at the end of the search. The technique is now known as Thompson's construction; it underlies the linear-time matching algorithms that modern engines use seven decades later. Thompson built the matcher into QED, the time-sharing editor on the Compatible Time-Sharing System at MIT, and from QED it travelled into ed, the line editor that shipped with the first Unix in 1971. Inside ed, the command g/regex/p would print every line in the buffer that matched a given pattern. The command was idiomatic enough that in 1973 Thompson extracted it as a standalone Unix utility, named after the syntax: grep. What grep actually stands for The Unix lore here is more boring than the modern repackaging suggests. The four letters are not an acronym for any phrase. They are the literal characters of the ed command, in order: g, the global modifier; re, the regex; p, the print action. Strip the slashes and what is left is grep. The name preserves the command's lineage rather than describing the program's purpose. Three later variants kept the naming game: egrep (1973) — extended grep, supporting alternation and grouping that the original grep could not match. Now usually a synonym for grep -E. fgrep — fixed-string grep, for literal-string searches without regex compilation. Now grep -F. pgrep — process grep, for matching processes by name. A 1990s addition, not from Thompson. The grep name is older than most engineers who reach for it daily. The name will outlive the binary. POSIX standardises regex, twice By 1988 every Unix had a regex implementation and every implementation differed slightly. The POSIX standardisation effort — the IEEE 1003 working group also responsible for codifying the rest of the Unix interface — formalised regex in IEEE 1003.2 in 1992 under two specifications: POSIX BRE (Basic Regular Expressions) — the older grep flavour. Special characters like +, ?, (, ) are literal unless escaped with backslash. Mostly preserved for...

What does grep stand for, and the 75 year history of the regular expression

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits