clear/docs/retrospective/can-ai-unslop-itself.md at master · cuzzo/clear · GitHub
//blob/show" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
//blob/show;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
cuzzo
clear
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star<br>15
FilesExpand file tree
master
/can-ai-unslop-itself.md
Copy path
Blame<br>More file actions
Blame<br>More file actions
Latest commit
History<br>History<br>History
165 lines (101 loc) · 8.27 KB
master
/can-ai-unslop-itself.md
Top
File metadata and controls<br>Preview
Code
Blame
165 lines (101 loc) · 8.27 KB
Raw<br>Copy raw file<br>Download raw file
OutlineEdit and raw actions
Can AI Un-Slop Itself?
Everyone knows that LLMs can, at least, sometimes create slop.
The interesting question isn’t whether they can create slop. It’s: can they un-slop themselves?
Problem
I dreamed of a programming language for 10 years. After Gemini 3.1-pro, I figured LLMs were good enough that I should at least finally see what this AI "vibe-coding" craze was all about.
I set on a 6-month journey to build a programming language.
Within 2-months, I had a custom runtime built in Zig "competitive" with Go & Tokio.
Within 3-months, I had an Affine Ownership-based "memory safe" language like Rust - but (in my opinion) much more intuitive.
The problem is: this only barely worked, and no one wants a barely working programming language!
What didn’t work
WRT the language:
Architecturally, the LLMs built it without a MIR-pass.
Without boring you too much, it’s virtually impossible to guarantee affine ownership / memory safety without a MIR-pass.
So I had a memory safe language that WASN’T memory safe...
Nobody wants a Rust that regularly leaks memory and still has UAF and double free bugs...
WRT the runtime:
More of the same
There was no TSan testing in place
There was no memory ordering testing in place
The LLMs built a fiber runtime - complete with hammer tests... and never bothered to test if it’s actually thread safe...
Spoiler: it wasn't!
What happened next?
I told the LLMs to fix their slop!
In about ~35 minutes, they screamed:
“Done, you have a complete MIR pass in place! Everything is memory safe.”
They proceeded to feed me this same line of crap for ~1000 commits over 2 months, and me constantly asking:
“If the MIR pass is done, then how come X is still there and Y is still segfaulting?”
LLMs:
“Oh, yes, there’s just that one small part left. Okay, now it’s done!”
What happened next?
The definition of insanity is trying the same thing and expecting different results!
I gave up on LLMs' ability to tell me the truth about the state of a non-trivial codebase.
I got a wild idea...
If I can’t trust LLMs...
Can I trust them to build me tools & systems to have more trust in them???
I had LLMs build me a set of tools that don’t exist in Ruby (it's turtles all the way down).
I built the compiler in Ruby, because, initially this started off as a project I was building by hand.
The initial scope was merely to play around with Syntax and see if I could develop something I liked.
Never did I ever dream that I would actually get something this far.
Everyone knows that a good compiler eventually self-hosts, and the goal was - like Crystal - my language would be quite close to Ruby (so LLMs should be able to migrate it easily).
Anyway, Ruby has decent tooling around “code health” like SimpleCov / Flay / Flog / Reek / Debride, etc.
The problem is, if you’re building a compiler with LLMs these aren’t the core metrics you need.
LLMs repeat themselves constantly... but then forget to do something the right way in one place... and then also don’t test it.
Especially in a dynamically-typed language, LLMs will randomly decide to use strings or symbols or nil to represent something, and then make thousands of defensive checks throughout your codebase... rather than fixing the problem at the source.
LLMs will regularly “refactor” your code and remove 99% of usage, but still leave 1% and then move on - leaving you with competing systems... and then later use the old one that doesn’t do everything you need, and get that usage back up to 10% (plus...