Bicameral, Not HomoiconicBicameral, Not Homoiconicđź”—<br>1 (Weak) Homoiconicity
2 (Strong) Homoiconicity
3 The Parsing Pipeline
4 The Bicameral Analogy
5 Bicameral Syntax
6 Back to Lisps
7 What About Other Languages?<br>If you spend enough time reading internet discussions of programming<br>languages, you’ll learn that Lispy languages have a special property:<br>they are homoiconic. This property is vested with mystical<br>powers that both enrich Lisps and debase its competitors.<br>I have programmed in, and built, Lisps since the late 1980s. My blog<br>is called “parenthetically speaking”. And yet I’m here to tell you<br>that this term is mostly nonsense. However, there is something<br>special—something far less mystical, but also very powerful and<br>useful—about Lisps. It’s worth understanding what that is and<br>transporting its essence to other languages.<br>1 (Weak) Homoiconicityđź”—<br>What, supposedly, is homoiconicity? You will hear things like: the<br>property that “a property of some programming languages that allows<br>programs to be represented as data within the language”, or with<br>“represented” substituted by “manipulated”, or more simply as<br>“code as data”.<br>Let’s tease these apart a bit. Consider the following Python code:
hello = 1
This is clearly a program. But can I represent this as a datum<br>within the language? Sure:
'hello = 1'
is a perfectly good representation. (Well, it may be good but<br>it’s not great; we’ll return to that!) Can I manipulate<br>it? Sure, I can concatenate strings to create it:
'hello' + ' = ' + '1'
will produce that program, and
'hello = 1'.split(' ')
will take it apart into constituent pieces.
Does that make Python homoiconic?<br>Of course, there’s nothing special about Python here. We can use<br>JavaScript to represent and manipulate JavaScript programs, C to do<br>the same to C programs, and so on. Essentially, any programming<br>language with a string datatype seems to be homoiconic. Heck, we<br>didn’t even need strings: we could just as well have represented the<br>programs as numbers (e.g., using<br>Gödel numbering).<br>One of the traits of a good definition is that it be non-trivial: it<br>must capture some things but it must also exclude some things. It’s<br>not clear that this notion of homoiconicity excludes much of anything.
2 (Strong) Homoiconicityđź”—<br>But there’s a reasonable objection to what we wrote above. All that<br>we’ve done is written, combined, and taken apart strings. But<br>strings are not necessarily programs; strings are just strings,<br>a form of data. Data are data, but programs—entities<br>that we can run—seem to be a separate thing.<br>How do we turn data into programs? We do need some language support<br>for that. We need something that will take some agreed-upon data<br>representation of a program and treat it like a program,<br>i.e., do whatever the program would have done. Typically, this is a<br>function called eval: it evaluates the datum, performing the<br>effects described in the datum, just as if it were a program. (Note<br>that eval really treats “data as code”, not “code as<br>data”.)<br>So maybe eval is the real characteristic of homoiconic<br>languages? Maybe. It’s certainly true that eval is a<br>distinctive feature, and some languages have it while others don’t:<br>that is, it non-trivially distinguishes between languages. But it’s<br>worth noting:
Many languages, including Python and JavaScript, have an<br>eval. If they’re all homoiconic, then clearly this isn’t a<br>particularly Lispy trait.
eval interacts poorly with its lexical environment,<br>thereby making it hard to even program with effectively.<br>We showed that JavaScript’s<br>eval is not one but four operations and there are eight<br>contexts that determine which of the four to use. This kind of<br>complexity is overwhelming.
The complexity might be worth it if eval were a good<br>idea, but it’s often a bad idea in programs! It makes code statically<br>invisible, making every other aspect of program management—static<br>analysis, compilation, security checking, and more—much, much harder<br>(or, for some important and useful kinds of analysis, impossible).
This seems like a disappointing way to end: homoiconic languages are<br>ones that have a complex, excessively-powerful feature that we<br>probably shouldn’t use but is anyway found in many languages that are<br>not Lispy at all…which certainly doesn’t seem to be a good way to<br>describe what makes Lispy languages distinctive.
But this just shows why we shouldn’t be talking about homoiconicity at<br>all. Let’s talk about what’s actually interesting instead.
3 The Parsing Pipelineđź”—<br>Let’s talk briefly about the classical parsing pipeline. For decades,<br>we’ve been taught to think of parsing a program as having two phases:<br>tokenization (sometimes colloquially called “lexing”) followed by<br>parsing (not colloquially called...