Create Your Own Programming Language with Rust

paufernandez1 pts0 comments

Introduction - Create Your Own Programming Language with Rust

Keyboard shortcuts

Press ← or → to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Auto

Light

Rust

Coal

Navy

Ayu

Create Your Own Programming Language with Rust

Materials in this book are distributed under the terms of Creative Commons BY-NC-SA 4.0

Motivations and Goals

This book arises from my frustration of not finding modern, clear, and concise teaching materials that are readily accessible to beginners like me who want to learn how to create their own programming language.

“If you don’t know how compilers work, then you don’t know how computers work” 1

“If you can’t explain something in simple terms, you don’t understand it” 2

Pedagogically, one of the most effective methods of teaching is co-creating interactively. Introducing the core aspects around the simplest example (here, our calculator language) helps build knowledge and confidence. We use mature technologies instead of reinventing the wheel.

Getting Started

This book assumes basic knowledge of Rust. If you’re new to Rust, start with the official Rust book.

The code and materials are available on GitHub. To follow along:

git clone https://github.com/ehsanmok/create-your-own-lang-with-rust<br>cd create-your-own-lang-with-rust

Calculator and Firstlang (stable Rust)

These projects work with stable Rust 1.70+ and require no external dependencies:

# Calculator - interpreter mode<br>cd calculator<br>cargo run --bin main examples/simple.calc

# Calculator - VM mode<br>cargo run --bin main --features vm examples/simple.calc

# Firstlang - interpreter<br>cd firstlang<br>cargo run -- examples/fibonacci.fl<br>cargo run # REPL

Secondlang and Thirdlang (nightly Rust + LLVM)

These projects require nightly Rust and LLVM for JIT compilation:

# Install nightly Rust<br>rustup toolchain install nightly

# Install LLVM (macOS)<br>brew install llvm

# Install LLVM (Debian/Ubuntu) - see https://apt.llvm.org/

Check your LLVM version with llvm-config --version and update the inkwell dependency in Cargo.toml to match:

LLVM Versioninkwell feature<br>20.xllvm20-1<br>19.xllvm19-1<br>18.xllvm18-1

For example, with LLVM 20:

inkwell = { version = "0.7", features = ["llvm20-1"] }

# Secondlang<br>cd secondlang<br>rustup run nightly cargo run -- examples/fibonacci.sl<br>rustup run nightly cargo run -- --ir examples/fibonacci.sl # view LLVM IR

# Thirdlang<br>cd thirdlang<br>rustup run nightly cargo run --bin thirdlang -- examples/point.tl<br>rustup run nightly cargo run --bin thirdlang -- examples/counter.tl

Learning Progression

We build four languages, each building on concepts from the previous:

LanguageGrammarNew ConceptsExecution<br>Calculator 18 linesPEG basics, AST, operatorsInterpreter, VM, JIT<br>Firstlang 70 linesVariables, functions, control flow, recursionTree-walking interpreter<br>Secondlang 77 linesTypes, type inference, optimization passesLLVM JIT compilation<br>Thirdlang 140 linesClasses, methods, constructors, memory managementLLVM JIT compilation

Part I: Calculator

We start with the simplest possible language: integer arithmetic with + and -. The grammar fits in 18 lines:

Program = _{ SOI ~ Expr ~ EOF }<br>Expr = { UnaryExpr | BinaryExpr | Term }<br>Term = _{Int | "(" ~ Expr ~ ")" }<br>...

This minimal language lets us focus on the fundamentals: what is a grammar? How does pest generate a parser? What is an AST? We explore three different backends (interpreter, bytecode VM, JIT) to show that the same AST can be executed in multiple ways.

Part II: Firstlang

With the basics understood, we add features that make a real programming language. The grammar grows to 70 lines:

// Statements instead of just expressions<br>Stmt = { Function | Return | Assignment | Expr }

// Functions with parameters<br>Function = { "def" ~ Identifier ~ "(" ~ Params? ~ ")" ~ Block }

// Control flow<br>Conditional = { "if" ~ "(" ~ Expr ~ ")" ~ Block ~ "else" ~ Block }<br>WhileLoop = { "while" ~ "(" ~ Expr ~ ")" ~ Block }

We focus on a single backend (tree-walking interpreter) to deeply understand scoping, call stacks, and recursion. The culminating example is computing Fibonacci recursively.

Part III: Secondlang

We add static types and compile to native code. The grammar changes are minimal (just 7 more lines), but the compiler grows significantly:

// Type annotations<br>Type = { IntType | BoolType }<br>TypedParam = { Identifier ~ ":" ~ Type }<br>ReturnType = { "->" ~ Type }

// Functions now have types<br>Function = { "def" ~ Identifier ~ "(" ~ TypedParams? ~ ")" ~ ReturnType? ~ Block }

Types are primarily a semantic addition, not a syntactic one. The grammar changes are small, but we need new compiler phases (type checking, type inference) and can now generate efficient native code via LLVM.

Part IV: Thirdlang

Finally, we add object-oriented programming with classes, methods, and memory management. The grammar grows to 140 lines:

// Class definitions<br>ClassDef = { "class" ~ Identifier ~ "{" ~ ClassBody ~...

rust llvm cargo language nightly calculator

Related Articles