(How to Write a (Lisp) Interpreter (in Python))
(How to Write a (Lisp) Interpreter (in Python))
This page has two purposes: to describe how to implement computer<br>language interpreters in general, and in particular to build an interpreter<br>for most of the Scheme<br>dialect of Lisp using Python 3 as the implementation language.<br>I call my language and interpreter Lispy (lis.py ). Years ago, I showed how to write a semi-practical Scheme interpreter Java and in in Common<br>Lisp). This time around the goal is to demonstrate, as concisely<br>and simply as possible, what<br>Alan Kay called "Maxwell's Equations of Software."
Why does this matter? As Steve<br>Yegge said, "If you don't know how compilers work, then you<br>don't know how computers work." Yegge describes 8 problems that<br>can be solved with compilers (or equally well with interpreters, or<br>with Yegge's<br>typical heavy dosage of cynicism).
Syntax and Semantics of Scheme Programs
The syntax of a language is the arrangement of characters to form correct statements or expressions; the<br>semantics is the meaning of those statements or expressions. For example, in the language of<br>mathematical expressions (and in many programming languages), the syntax for adding one plus two is "1 +<br>2" and the semantics is the application of the addition operation to the two numbers, yielding the value 3. We say we<br>are evaluating an expression when we determine its<br>value; we would say that "1 + 2" evaluates to 3, and write<br>that as "1 + 2" ⇒ 3.
Scheme syntax is different from most other programming languages. Consider:
Java Scheme
if (x.val() > 0) {
return fn(A[i] + 3 * i,
new String[] {"one", "two"});
(if (> (val x) 0)
(fn (+ (aref A i) (* 3 i))
(quote (one two)))
Java has a wide variety<br>of syntactic conventions (keywords, infix operators, three kinds of brackets,<br>operator precedence, dot notation, quotes, commas,<br>semicolons), but Scheme syntax is much simpler:
Scheme programs consist solely of expressions. There is no statement/expression distinction.<br>Numbers (e.g. 1) and symbols (e.g. A) are called atomic expressions;<br>they cannot be broken into pieces. These are similar to their Java counterparts, except that in<br>Scheme, operators such as + and > are symbols too, and are treated the same<br>way as A and fn.<br>Everything else is a list expression: a "(", followed by zero or more expressions,<br>followed by a ")". The first element of the list determines what it means:
A list starting with a keyword, e.g. (if ...), is a special form;<br>the meaning depends on the keyword.<br>A list starting with a non-keyword, e.g. (fn ...), is a function call.
The beauty of Scheme is that the full language only needs 5 keywords and 8 syntactic<br>forms. In comparison, Python has 33 keywords and 110<br>syntactic forms,<br>and Java has 50 keywords and 133 syntactic forms.<br>All those parentheses<br>may seem intimidating, but Scheme syntax has the virtues of<br>simplicity and consistency. (Some have joked that "Lisp" stands for<br>"L ots<br>of I rritating S illy P arentheses"; I think it stand for<br>"L isp<br>I s S yntactically P ure".)
In this page we will cover all the important points of the Scheme language and its interpretation<br>(omitting some minor details), but we will take two steps to get there,<br>defining a simplified language first, before defining the near-full Scheme language.
Language 1: Lispy Calculator
Lispy Calculator is a subset of Scheme using only five syntactic forms (two atomic, two special forms, and the procedure call).<br>Lispy Calculator lets you do any computation you could do on a typical calculator—as long as you are comfortable with prefix notation.<br>And you can do two things that are not offered in typical calculator languages: "if" expressions, and the definition of new variables.<br>Here's an example program, that computes the area of a circle of radius 10, using the formula π r2:
(define r 10)<br>(* pi (* r r))
Here is a table of all the allowable expressions:
ExpressionSyntaxSemantics and Example
variable referencesymbolA symbol is interpreted as a variable name;<br>its value is the variable's<br>value.<br>Example: r ⇒ 10 (assuming r was previously defined to be 10)
constant<br>literalnumberA number<br>evaluates to itself.<br>Examples: 12 ⇒ 12 or<br>-3.45e+6 ⇒ -3.45e+6
conditional(if test conseq<br>alt) Evaluate test; if true,<br>evaluate and return conseq; otherwise<br>alt.<br>Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6
definition<br>(define symbol exp)<br>Define a new variable and give it<br>the value of evaluating the expression exp.
Examples: (define r 10)
procedure<br>call(proc arg...) If proc is<br>anything other than one of the symbols if, define,<br>or quote then it is treated as a procedure. Evaluate proc<br>and all the args, and then the procedure is applied to the list of arg values.<br>Example: (sqrt (* 2 8)) ⇒ 4.0
In the Syntax column of this table, symbol must be a<br>symbol,<br>number must be an integer or floating point number,<br>and the other italicized words can be any<br>expression. The notation arg... means zero or more...