Deterministic vs. Probabilistic Code Generation

Deterministic vs probabilistic code generation

The Tech Enabler

SubscribeSign in

Deterministic vs probabilistic code generation

Noah Hall May 17, 2026

Bun recently vibe coded a million line change to their codebase, turning Zig into Rust. While they might see this as a magical win, I see this as the collapse of software engineering. Deterministic code generation

A deterministic system, when given the same set of inputs, will perform the same operations. Programming languages are largely deterministic. There are some languages which allow for undefined behaviour, but on the whole every time code is run, it operates in the same way. Uncertainty or confusion in behaviour leads to bugs, often security bugs. There are automated yet deterministic ways to convert code from one language to another. The majority of languages I create have transpiling support out of the box. Derw can produce JavaScript, TypeScript, or English. Tegan can produce JavaScript or Go. Mojie can produce JavaScript, Python, or English1. json-to-elm produced JSON parsing code for multiple versions of Elm, with optional library support. These are all based on building ASTs by parsing code and producing new code based on the AST. This is a deterministic process: given the same input, you’ll get the same output. Deterministic tooling has existed for years. Python’s 2to3 is well known: used for automated conversion from Python 2 to 3 in a deterministic way. The same Python 2 script run through 2to3 will produce the same Python 3 script. Transpiling languages, like Elm, PureScript, TypeScript, all target JavaScript and produce the same JavaScript each time. It makes them predictable. Deterministic systems have a forced structure: it will be consistent. Consistency is crucial in technical systems. If a bug is consistent, we can fix it. If a bug is inconsistent, it becomes exponentially more difficult to fix. Simply reproducing an inconsistent bug will take more time. It is the role of software engineering to make systems consistent. Even small inconsistencies can lead to severe damage to a system. Even with deterministic code generation, I still do not trust the process to be fully automated. There will always be edge cases. It still requires validation and correction. Probabilistic code generation

Generative AI takes input, and produces an output. However, that output varies. Sometimes it’s A , other times it’s B . This introduces uncertainty into the process. It is no longer consistent. Code generally should be predictable. APIs should be intuitive. It is impossible to be intuitive about LLM generated code which you did not review, because it could be different each time. I created neuro-lingo 3 years ago: a programming language where a human only writes function signatures and comments, and the implementation code is entirely generated by LLMs. function add(a: number, b: number): number { // Add two numbers together

function main() { // Print "Hello World" to the console // Print the result of add(2, 3)

An example from neuro-lingo. Every time neuro-lingo is compiled, the code is generated from fresh by the LLMs. It’s slightly different each time. Sometimes it introduces bugs. Sometimes it’s clean and simple. Sometimes it’s chaotic. Neuro-lingo was intended as a parody, but fully AI flows to produce code are doing the exact same thing. When code is shipped, humans are accountable for that code. Not always legally, but morally and ethically. While open source licenses intentionally provide no warranty, the fact remains: code which is pushed into the open source ecosystem has an impact on the industry. Both in open source, and in corporate enterprises. The Havard Business review estimated the economical worth of open source to be $8.8 trillion. It is not possible for a human to review 1 million lines of changes in 9 days. Let’s be clear about that: Bun has not reviewed the code they have merged to master. The “there are tests” fallocy

Tests have never been enough to single-handily measure the quality of code. Consider SQLite, widely considered to be the most tested codebase: As of version 3.42.0 (2023-05-16), the SQLite library consists of approximately 155.8 KSLOC of C code. (KSLOC means thousands of “Source Lines Of Code” or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 590 times as much test code and test scripts - 92053.1 KSLOC.

They list a wide range of different tests they have: Four independently developed test harnesses

100% branch test coverage in an as-deployed configuration

Millions and millions of test cases

Out-of-memory tests

I/O error tests

Crash and power loss tests

Fuzz tests

Boundary value tests

Disabled optimization tests

Regression tests

Malformed database tests

Extensive use of assert() and run-time checks

Valgrind analysis

Undefined behavior checks

Checklists

And yet they do not automate the entire process. They create tools for humans to...

Deterministic vs. Probabilistic Code Generation

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast