GitHub - whitecell-dev/cljys: A pragmatic Clojure-to-YAMLScript transpiler toolchain that uses an LLM canonical front-end pass to safely compile expressive Lisp queries into deterministic, zero-overhead native binaries for semantic IR analysis · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
whitecell-dev
cljys
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
calyx_mcp_v6.json
calyx_mcp_v6.json
clj-to-ys.py
clj-to-ys.py
m.ys
m.ys
test.clj
test.clj
View all files
Repository files navigation
clj-to-yamlscript-transpiler
##Why this matters<br>Leverage LLMs for what they're actually good at: This toolchain uses LLMs to drastically reduce the amount of code you need to write when adapting deterministic systems for extreme expressiveness in syntax or natural language. Instead of hand-writing parsers and normalization passes, you use the LLM as an intent-preserving front-end and let deterministic tooling handle the rest.
A hybrid compilation toolchain that transpiles a subset of Clojure data queries into YAMLScript (YS), which then compiles to ultra-fast, zero-dependency native binaries using GraalVM/SCI.
This project solves the "expressivity gap" between Clojure's vocabulary and rigid syntax transpilers by using a dual-pass architecture: an LLM as a structural, intent-preserving front-end normalizer, followed by a 100% deterministic local Python emitter pass.
Architecture & Pipeline
Parsing every edge case of an expressive Lisp with a simple local parser leads to complexity. This project splits the work by matching tools to their strengths:
The Semantic Front-End (LLM): Takes idiomatic Clojure and normalizes it into a strict, flat canonical intermediate representation. No destructuring, explicit (str) wrapping.
The Emitter Middle-End (Python): A rigid, deterministic tokenizer and parser that maps the canonical Clojure AST directly onto YAMLScript structural forms.
The Native Back-End (YAMLScript): Compiles the clean .ys script into a native, standalone binary using ys -c.<br>[Expressive Clojure Query]<br>v (LLM Normalization Pass: flattens destructuring & prints)<br>[Canonical Clojure Subset]<br>v (clj_to_ys.py: Local Deterministic AST Map)<br>[Valid YAMLScript Code]<br>v (ys -c compilation)<br>[Native Executable Binary]
Quick Start
1. Normalize & Transpile
Feed your expressive Clojure query into the normalization pipeline, then pass it to the Python emitter:<br>python3 clj_to_ys.py input.clj -o output.ys
2. Compile to Native Binary
Turn the resulting YAMLScript code into a standalone native binary that executes in milliseconds with zero JVM boot overhead:<br>ys -c output.ys
The Canonical Clojure Specification
To ensure deterministic compilation without syntax errors or broken scalar mappings in YAMLScript, the incoming Clojure must conform to this strict subset. This is handled automatically by the LLM front-end prompt.
1. No Parameter Destructuring
Complex sequence destructuring like (fn [[k v]] ...) or (doseq [[k v] ...]) is forbidden. Bind the collection element to a single variable and extract items explicitly using first, second, or nth inside a let block:<br>;; Good / Canonical<br>(doseq [kv (take 5 active-relations)]<br>(let [k (first kv)<br>v (second kv)]<br>(println (str "Module ID: " k))))
2. Single-Argument Explicit Print Concatenation
println or print statements cannot accept multiple variadic arguments. They must accept exactly one argument. Wrap multi-variable strings completely inside an explicit (str ...) form:<br>;; Good / Canonical<br>(println (str "Total nodes found: " (count active-relations)))
3. Explicit Vector Index Lookups
Sequence indexing must use the (nth vector index) format rather than calling the vector or index as an implicit function form:<br>;; Good / Canonical<br>(def module-49 (nth modules 49))
Components
clj_to_ys.py: The core tokenizer, S-expression parser, and code emitter written in Python. It maps native...