Introduction - Create Your Own Programming Language with Rust
Keyboard shortcuts
Press ← or → to navigate between chapters
Press S or / to search in the book
Press ? to show this help
Press Esc to hide this help
Auto
Light
Rust
Coal
Navy
Ayu
Create Your Own Programming Language with Rust
Materials in this book are distributed under the terms of Creative Commons BY-NC-SA 4.0
Motivations and Goals
This book arises from my frustration of not finding modern, clear, and concise teaching materials that are readily accessible to beginners like me who want to learn how to create their own programming language.
“If you don’t know how compilers work, then you don’t know how computers work” 1
“If you can’t explain something in simple terms, you don’t understand it” 2
Pedagogically, one of the most effective methods of teaching is co-creating interactively. Introducing the core aspects around the simplest example (here, our calculator language) helps build knowledge and confidence. We use mature technologies instead of reinventing the wheel.
Getting Started
This book assumes basic knowledge of Rust. If you’re new to Rust, start with the official Rust book.
The code and materials are available on GitHub. To follow along:
git clone https://github.com/ehsanmok/create-your-own-lang-with-rust<br>cd create-your-own-lang-with-rust
Calculator and Firstlang (stable Rust)
These projects work with stable Rust 1.70+ and require no external dependencies:
# Calculator - interpreter mode<br>cd calculator<br>cargo run --bin main examples/simple.calc
# Calculator - VM mode<br>cargo run --bin main --features vm examples/simple.calc
# Firstlang - interpreter<br>cd firstlang<br>cargo run -- examples/fibonacci.fl<br>cargo run # REPL
Secondlang and Thirdlang (nightly Rust + LLVM)
These projects require nightly Rust and LLVM for JIT compilation:
# Install nightly Rust<br>rustup toolchain install nightly
# Install LLVM (macOS)<br>brew install llvm
# Install LLVM (Debian/Ubuntu) - see https://apt.llvm.org/
Check your LLVM version with llvm-config --version and update the inkwell dependency in Cargo.toml to match:
LLVM Versioninkwell feature<br>20.xllvm20-1<br>19.xllvm19-1<br>18.xllvm18-1
For example, with LLVM 20:
inkwell = { version = "0.7", features = ["llvm20-1"] }
# Secondlang<br>cd secondlang<br>rustup run nightly cargo run -- examples/fibonacci.sl<br>rustup run nightly cargo run -- --ir examples/fibonacci.sl # view LLVM IR
# Thirdlang<br>cd thirdlang<br>rustup run nightly cargo run --bin thirdlang -- examples/point.tl<br>rustup run nightly cargo run --bin thirdlang -- examples/counter.tl
Learning Progression
We build four languages, each building on concepts from the previous:
LanguageGrammarNew ConceptsExecution<br>Calculator 18 linesPEG basics, AST, operatorsInterpreter, VM, JIT<br>Firstlang 70 linesVariables, functions, control flow, recursionTree-walking interpreter<br>Secondlang 77 linesTypes, type inference, optimization passesLLVM JIT compilation<br>Thirdlang 140 linesClasses, methods, constructors, memory managementLLVM JIT compilation
Part I: Calculator
We start with the simplest possible language: integer arithmetic with + and -. The grammar fits in 18 lines:
Program = _{ SOI ~ Expr ~ EOF }<br>Expr = { UnaryExpr | BinaryExpr | Term }<br>Term = _{Int | "(" ~ Expr ~ ")" }<br>...
This minimal language lets us focus on the fundamentals: what is a grammar? How does pest generate a parser? What is an AST? We explore three different backends (interpreter, bytecode VM, JIT) to show that the same AST can be executed in multiple ways.
Part II: Firstlang
With the basics understood, we add features that make a real programming language. The grammar grows to 70 lines:
// Statements instead of just expressions<br>Stmt = { Function | Return | Assignment | Expr }
// Functions with parameters<br>Function = { "def" ~ Identifier ~ "(" ~ Params? ~ ")" ~ Block }
// Control flow<br>Conditional = { "if" ~ "(" ~ Expr ~ ")" ~ Block ~ "else" ~ Block }<br>WhileLoop = { "while" ~ "(" ~ Expr ~ ")" ~ Block }
We focus on a single backend (tree-walking interpreter) to deeply understand scoping, call stacks, and recursion. The culminating example is computing Fibonacci recursively.
Part III: Secondlang
We add static types and compile to native code. The grammar changes are minimal (just 7 more lines), but the compiler grows significantly:
// Type annotations<br>Type = { IntType | BoolType }<br>TypedParam = { Identifier ~ ":" ~ Type }<br>ReturnType = { "->" ~ Type }
// Functions now have types<br>Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }
Types are primarily a semantic addition, not a syntactic one. The grammar changes are small, but we need new compiler phases (type checking, type inference) and can now generate efficient native code via LLVM.
Part IV: Thirdlang
Finally, we add object-oriented programming with classes, methods, and memory management. The grammar grows to 140 lines:
// Class definitions<br>ClassDef = { "class" ~ Identifier ~ "{" ~ ClassBody ~...