(An ((Even Better) Lisp) Interpreter (in Python))
(An ((Even Better) Lisp) Interpreter (in Python))
In a previous essay I showed how to write a simple<br>Lisp interpreter in 90 lines of Python: lis.py .<br>In this essay I make the implementation, lispy.py , three times more complicated,<br>but more complete. Each section handles an addition.
(1) New data types: string, boolean, complex, port
Adding a new data type to Lispy has three parts: the internal<br>representation of the data, the procedures that<br>operate on it, and the syntax for reading and writing it. Here we<br>add four types (using Python's native representation for all but input ports):
strings : string literals are enclosed in double-quotes. Within a<br>string, a \n means a newline and a \" means a double-quote.<br>booleans : The syntax is #t and #f for True<br>and False, and the predicate is boolean?.<br>complex numbers : we use the functions in the<br>cmath module rather than the math module to<br>support complex numbers. The syntax allows constants like 3+4i.<br>ports : No syntax to add, but procedures port?, load, open-input-file, close-input-port,<br>open-output-file, close-output-port, read, read-char, write and display. Output ports are represented as Python file objects, and input ports<br>are represented by a class, InputPort which wraps a file object and<br>also keeps track of the last line of text read. This is convenient because<br>Scheme input ports need to be able to read expressions as well as raw characters and our tokenizer works on a whole line, not individual characters.
Now, an old data type that becomes new:<br>symbol :<br>In the previous version of Lispy, symbols were implemented<br>as strings. Now that we have strings, symbols will be implemented<br>as a separate class (which derives from str). That means we<br>no longer can write if x[0] == 'if', because 'if'<br>is now a string, not a symbol. Instead we write if x[0] is<br>_if and define _if as Sym('if'), where Sym<br>manages a symbol table of unique symbols.
Here is the<br>implementation of the new Symbol class:
class Symbol(str): pass
def Sym(s, symbol_table={}):<br>"Find or create unique Symbol entry for str s in symbol table."<br>if s not in symbol_table: symbol_table[s] = Symbol(s)<br>return symbol_table[s]
_quote, _if, _set, _define, _lambda, _begin, _definemacro, = map(Sym,<br>"quote if set! define lambda begin define-macro".split())
_quasiquote, _unquote, _unquotesplicing = map(Sym,<br>"quasiquote unquote unquote-splicing".split())
We'll show the rest soon.
(2) New syntax: strings, comments, quotes, # literals
The addition of strings complicates tokenization. No longer can<br>spaces delimit tokens, because spaces can appear inside strings.<br>Instead we use a complex regular expression to break the input into<br>tokens. In Scheme a comment consists of a semicolon to the end of<br>line; we gather this up as a token and then ignore the token. We also<br>add support for six new tokens: #t #f ' ` , ,@<br>The tokens #t<br>and #f are the True and False literals, respectively.<br>The single quote mark serves to quote the following expression. The syntax<br>'exp is completely equivalent to (quote<br>exp). The backquote character ` is called<br>quasiquote in Scheme; it is similar to ' except that<br>within a quasiquoted expression, the notation ,exp<br>means to insert the value of exp (rather than the literal exp),<br>and ,@exp means that exp should evaluate to a<br>list, and all the items of the list are inserted.
In the previous version of Lispy, all input was read from strings. In<br>this version we have introduced ports (also known as file objects or<br>streams) and will read from them. This makes the<br>read-eval-print-loop (repl) much more convenient: instead of insisting that<br>an input expression must fit on one line, we can now read tokens until we get a complete expression, even if it spans several lines. Also, errors<br>are caught and printed, much as the Python interactive loop does. Here is the InPort (input port) class:
class InPort(object):<br>"An input port. Retains a line of chars."<br>tokenizer = r'''\s*(,@|[('`,)]|"(?:[\\].|[^\\"])*"|;.*|[^\s('"`,;)]*)(.*)'''<br>def __init__(self, file):<br>self.file = file; self.line = ''<br>def next_token(self):<br>"Return the next token, reading new text into line buffer if needed."<br>while True:<br>if self.line == '': self.line = self.file.readline()<br>if self.line == '': return eof_object<br>token, self.line = re.match(InPort.tokenizer, self.line).groups()<br>if token != '' and not token.startswith(';'):<br>return token
The basic design for the read function follows a suggestion (with working code) from Darius Bacon (who contributed several other improvements as well).
eof_object = Symbol('#') # Note: uninterned; can't be read
def readchar(inport):<br>"Read the next character from an input port."<br>if inport.line != '':<br>ch, inport.line = inport.line[0], inport.line[1:]<br>return ch<br>else:<br>return inport.file.read(1) or eof_object
def read(inport):<br>"Read a Scheme expression from an input port."<br>def read_ahead(token):<br>if '(' == token:<br>L = []<br>while True:<br>token = inport.next_token()<br>if...