(An ((Even Better) Lisp) Interpreter (In Python))

(An ((Even Better) Lisp) Interpreter (in Python))

In a previous essay I showed how to write a simple Lisp interpreter in 90 lines of Python: lis.py . In this essay I make the implementation, lispy.py , three times more complicated, but more complete. Each section handles an addition.

(1) New data types: string, boolean, complex, port

Adding a new data type to Lispy has three parts: the internal representation of the data, the procedures that operate on it, and the syntax for reading and writing it. Here we add four types (using Python's native representation for all but input ports):

strings : string literals are enclosed in double-quotes. Within a string, a \n means a newline and a \" means a double-quote. booleans : The syntax is #t and #f for True and False, and the predicate is boolean?. complex numbers : we use the functions in the cmath module rather than the math module to support complex numbers. The syntax allows constants like 3+4i. ports : No syntax to add, but procedures port?, load, open-input-file, close-input-port, open-output-file, close-output-port, read, read-char, write and display. Output ports are represented as Python file objects, and input ports are represented by a class, InputPort which wraps a file object and also keeps track of the last line of text read. This is convenient because Scheme input ports need to be able to read expressions as well as raw characters and our tokenizer works on a whole line, not individual characters.

Now, an old data type that becomes new: symbol : In the previous version of Lispy, symbols were implemented as strings. Now that we have strings, symbols will be implemented as a separate class (which derives from str). That means we no longer can write if x[0] == 'if', because 'if' is now a string, not a symbol. Instead we write if x[0] is _if and define _if as Sym('if'), where Sym manages a symbol table of unique symbols.

Here is the implementation of the new Symbol class:

class Symbol(str): pass

def Sym(s, symbol_table={}): "Find or create unique Symbol entry for str s in symbol table." if s not in symbol_table: symbol_table[s] = Symbol(s) return symbol_table[s]

_quote, _if, _set, _define, _lambda, _begin, _definemacro, = map(Sym, "quote if set! define lambda begin define-macro".split())

_quasiquote, _unquote, _unquotesplicing = map(Sym, "quasiquote unquote unquote-splicing".split())

We'll show the rest soon.

(2) New syntax: strings, comments, quotes, # literals

The addition of strings complicates tokenization. No longer can spaces delimit tokens, because spaces can appear inside strings. Instead we use a complex regular expression to break the input into tokens. In Scheme a comment consists of a semicolon to the end of line; we gather this up as a token and then ignore the token. We also add support for six new tokens: #t #f ' ` , ,@ The tokens #t and #f are the True and False literals, respectively. The single quote mark serves to quote the following expression. The syntax 'exp is completely equivalent to (quote exp). The backquote character ` is called quasiquote in Scheme; it is similar to ' except that within a quasiquoted expression, the notation ,exp means to insert the value of exp (rather than the literal exp), and ,@exp means that exp should evaluate to a list, and all the items of the list are inserted.

In the previous version of Lispy, all input was read from strings. In this version we have introduced ports (also known as file objects or streams) and will read from them. This makes the read-eval-print-loop (repl) much more convenient: instead of insisting that an input expression must fit on one line, we can now read tokens until we get a complete expression, even if it spans several lines. Also, errors are caught and printed, much as the Python interactive loop does. Here is the InPort (input port) class:

class InPort(object): "An input port. Retains a line of chars." tokenizer = r'''\s*(,@|[('`,)]|"(?:[\\].|[^\\"])*"|;.*|[^\s('"`,;)]*)(.*)''' def __init__(self, file): self.file = file; self.line = '' def next_token(self): "Return the next token, reading new text into line buffer if needed." while True: if self.line == '': self.line = self.file.readline() if self.line == '': return eof_object token, self.line = re.match(InPort.tokenizer, self.line).groups() if token != '' and not token.startswith(';'): return token

The basic design for the read function follows a suggestion (with working code) from Darius Bacon (who contributed several other improvements as well).

eof_object = Symbol('#') # Note: uninterned; can't be read

def readchar(inport): "Read the next character from an input port." if inport.line != '': ch, inport.line = inport.line[0], inport.line[1:] return ch else: return inport.file.read(1) or eof_object

def read(inport): "Read a Scheme expression from an input port." def read_ahead(token): if '(' == token: L = [] while True: token = inport.next_token() if...

(An ((Even Better) Lisp) Interpreter (In Python))

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan