Clojure → YAMLScript transpiler: using LLMs for normalization, SCI for execution

MaykonMan1 pts0 comments

GitHub - whitecell-dev/cljys: A pragmatic Clojure-to-YAMLScript transpiler toolchain that uses an LLM canonical front-end pass to safely compile expressive Lisp queries into deterministic, zero-overhead native binaries for semantic IR analysis · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

whitecell-dev

cljys

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>1 Commit<br>1 Commit

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

calyx_mcp_v6.json

calyx_mcp_v6.json

clj-to-ys.py

clj-to-ys.py

m.ys

m.ys

test.clj

test.clj

View all files

Repository files navigation

clj-to-yamlscript-transpiler

##Why this matters<br>Leverage LLMs for what they're actually good at: This toolchain uses LLMs to drastically reduce the amount of code you need to write when adapting deterministic systems for extreme expressiveness in syntax or natural language. Instead of hand-writing parsers and normalization passes, you use the LLM as an intent-preserving front-end and let deterministic tooling handle the rest.

A hybrid compilation toolchain that transpiles a subset of Clojure data queries into YAMLScript (YS), which then compiles to ultra-fast, zero-dependency native binaries using GraalVM/SCI.

This project solves the "expressivity gap" between Clojure's vocabulary and rigid syntax transpilers by using a dual-pass architecture: an LLM as a structural, intent-preserving front-end normalizer, followed by a 100% deterministic local Python emitter pass.

Architecture & Pipeline

Parsing every edge case of an expressive Lisp with a simple local parser leads to complexity. This project splits the work by matching tools to their strengths:

The Semantic Front-End (LLM): Takes idiomatic Clojure and normalizes it into a strict, flat canonical intermediate representation. No destructuring, explicit (str) wrapping.

The Emitter Middle-End (Python): A rigid, deterministic tokenizer and parser that maps the canonical Clojure AST directly onto YAMLScript structural forms.

The Native Back-End (YAMLScript): Compiles the clean .ys script into a native, standalone binary using ys -c.<br>[Expressive Clojure Query]<br>v (LLM Normalization Pass: flattens destructuring & prints)<br>[Canonical Clojure Subset]<br>v (clj_to_ys.py: Local Deterministic AST Map)<br>[Valid YAMLScript Code]<br>v (ys -c compilation)<br>[Native Executable Binary]

Quick Start

1. Normalize & Transpile

Feed your expressive Clojure query into the normalization pipeline, then pass it to the Python emitter:<br>python3 clj_to_ys.py input.clj -o output.ys

2. Compile to Native Binary

Turn the resulting YAMLScript code into a standalone native binary that executes in milliseconds with zero JVM boot overhead:<br>ys -c output.ys

The Canonical Clojure Specification

To ensure deterministic compilation without syntax errors or broken scalar mappings in YAMLScript, the incoming Clojure must conform to this strict subset. This is handled automatically by the LLM front-end prompt.

1. No Parameter Destructuring

Complex sequence destructuring like (fn [[k v]] ...) or (doseq [[k v] ...]) is forbidden. Bind the collection element to a single variable and extract items explicitly using first, second, or nth inside a let block:<br>;; Good / Canonical<br>(doseq [kv (take 5 active-relations)]<br>(let [k (first kv)<br>v (second kv)]<br>(println (str "Module ID: " k))))

2. Single-Argument Explicit Print Concatenation

println or print statements cannot accept multiple variadic arguments. They must accept exactly one argument. Wrap multi-variable strings completely inside an explicit (str ...) form:<br>;; Good / Canonical<br>(println (str "Total nodes found: " (count active-relations)))

3. Explicit Vector Index Lookups

Sequence indexing must use the (nth vector index) format rather than calling the vector or index as an implicit function form:<br>;; Good / Canonical<br>(def module-49 (nth modules 49))

Components

clj_to_ys.py: The core tokenizer, S-expression parser, and code emitter written in Python. It maps native...

clojure yamlscript canonical native deterministic using

Related Articles