Can AI Un-Slop Itself? - NewsHub

Can AI Un-Slop Itself?

onlyrealcuzzo1 pts0 comments

clear/docs/retrospective/can-ai-unslop-itself.md at master · cuzzo/clear · GitHub

//blob/show" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

//blob/show;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

cuzzo

clear

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star<br>15

FilesExpand file tree

master

/can-ai-unslop-itself.md

Copy path

Blame<br>More file actions

Blame<br>More file actions

Latest commit

History<br>History<br>History

165 lines (101 loc) · 8.27 KB

master

/can-ai-unslop-itself.md

Top

File metadata and controls<br>Preview

Code

Blame

165 lines (101 loc) · 8.27 KB

Raw<br>Copy raw file<br>Download raw file

OutlineEdit and raw actions

Can AI Un-Slop Itself?

Everyone knows that LLMs can, at least, sometimes create slop.

The interesting question isn’t whether they can create slop. It’s: can they un-slop themselves?

Problem

I dreamed of a programming language for 10 years. After Gemini 3.1-pro, I figured LLMs were good enough that I should at least finally see what this AI "vibe-coding" craze was all about.

I set on a 6-month journey to build a programming language.

Within 2-months, I had a custom runtime built in Zig "competitive" with Go & Tokio.

Within 3-months, I had an Affine Ownership-based "memory safe" language like Rust - but (in my opinion) much more intuitive.

The problem is: this only barely worked, and no one wants a barely working programming language!

What didn’t work

WRT the language:

Architecturally, the LLMs built it without a MIR-pass.

Without boring you too much, it’s virtually impossible to guarantee affine ownership / memory safety without a MIR-pass.

So I had a memory safe language that WASN’T memory safe...

Nobody wants a Rust that regularly leaks memory and still has UAF and double free bugs...

WRT the runtime:

More of the same

There was no TSan testing in place

There was no memory ordering testing in place

The LLMs built a fiber runtime - complete with hammer tests... and never bothered to test if it’s actually thread safe...

Spoiler: it wasn't!

What happened next?

I told the LLMs to fix their slop!

In about ~35 minutes, they screamed:

“Done, you have a complete MIR pass in place! Everything is memory safe.”

They proceeded to feed me this same line of crap for ~1000 commits over 2 months, and me constantly asking:

“If the MIR pass is done, then how come X is still there and Y is still segfaulting?”

LLMs:

“Oh, yes, there’s just that one small part left. Okay, now it’s done!”

What happened next?

The definition of insanity is trying the same thing and expecting different results!

I gave up on LLMs' ability to tell me the truth about the state of a non-trivial codebase.

I got a wild idea...

If I can’t trust LLMs...

Can I trust them to build me tools & systems to have more trust in them???

I had LLMs build me a set of tools that don’t exist in Ruby (it's turtles all the way down).

I built the compiler in Ruby, because, initially this started off as a project I was building by hand.

The initial scope was merely to play around with Syntax and see if I could develop something I liked.

Never did I ever dream that I would actually get something this far.

Everyone knows that a good compiler eventually self-hosts, and the goal was - like Crystal - my language would be quite close to Ruby (so LLMs should be able to migrate it easily).

Anyway, Ruby has decent tooling around “code health” like SimpleCov / Flay / Flog / Reek / Debride, etc.

The problem is, if you’re building a compiler with LLMs these aren’t the core metrics you need.

LLMs repeat themselves constantly... but then forget to do something the right way in one place... and then also don’t test it.

Especially in a dynamically-typed language, LLMs will randomly decide to use strings or symbols or nil to represent something, and then make thousands of defensive checks throughout your codebase... rather than fixing the problem at the source.

LLMs will regularly “refactor” your code and remove 99% of usage, but still leave 1% and then move on - leaving you with competing systems... and then later use the old one that doesn’t do everything you need, and get that usage back up to 10% (plus...

llms language memory slop file search

Related Articles