Nondeterminism's not the problem
Nondeterminism's not the problem. But if I had a nickel for every time I heard it blamed for difficulties with LLMs...
LLMs are constantly compared to compilers:
"In the future programmers will only write specs and the LLMs will regenerate the code every time like a compiler."
"You don't review the compiler output, why review the LLM output?"
Whenever this kind of statement is brought up, the usual response from skeptics is that LLMs and compilers are fundamentally different because compilers are deterministic and LLMs aren't. Really, whenever an LLM does something bad, you can bet a naysayer will blame nondeterminism.
I disagree with the LLMs-are-compilers take just as much as the next guy, but for different reasons. I feel the need to step in and defend poor old nondeterminism. It's not to blame for the LLM's mistakes!
Determinism
A function is deterministic when its output only depends on its input. For example,<br>List.len<br>is deterministic because the result is completely determined by the length of the input list. By contrast,<br>Time.now<br>is not deterministic; its result is dependent on the current state of the world, not the inputs to the function. One of the key properties of deterministic functions is repeatability: every time you evaluate the function on the same input you'll get the same result.
Compilers are just functions that turn a string of source code into a string of machine code. The generated machine code depends completely on the source code, so the process is deterministic. LLMs, like compilers, are functions from string to string. But if you give ChatGPT the same prompt twice, you'll notice that it produces a different result each time and is thus nondeterministic. The main reason for this is that LLMs intentionally inject randomness between the selection of each token to promote more "creativity" in responses. This creativity is controlled with the temperature parameter.
Okay, compilers are deterministic and LLMs aren't. But do they have to be? It turns out it's incredibly easy to make an LLM deterministic and a compiler nondeterministic.
Bizarro world: nondeterministic compilers and deterministic LLMs
Let's start with compilers. Compilers make all sorts of decisions internally about how to implement your program in machine code that you probably don't care about. For example, the compiler chooses what to inline, which instructions to use, which loops to unroll, which registers to put values into, etc.
Instead of assigning registers deterministically, imagine a compiler that calls<br>Math.random<br>every time there's a choice of register to determine which to use for which value. Voila! We've created a truly nondeterministic compiler. Compile your source code twice and you'll almost certainly get a different binary each time. The compiler is no longer deterministic, but it's still just as useful as before. When you compile your program, it still does the thing you want it to do. Interesting!
We don't even have to use our imagination for deterministic LLMs. We just need to set the temperature to 0 to make the responses deterministic. Or, without having to mess with the temperature at all, some providers support passing in a seed so that the same random values are used across requests. I whipped up a quick Python script to demonstrate this. As of this writing, Groq offers some free inference, so I used their SDK. Create an API key in their<br>console<br>to try it out.
import os<br>from groq import Groq
api_key = os.getenv("GROQ_API_KEY")<br>client = Groq(api_key=api_key)<br>MODEL = "llama-3.1-8b-instant"<br>PROMPT = "Write a 1-sentence sci-fi story about a broken robot."
# Deterministic: temperature 0<br>completion = client.chat.completions.create(<br>model=MODEL,<br>messages=[{"role": "user", "content": PROMPT}],<br>temperature=0,<br>content = completion.choices[0].message.content.strip()<br>print(content)
# Deterministic: seed<br>completion = client.chat.completions.create(<br>model=MODEL,<br>messages=[{"role": "user", "content": PROMPT}],<br>temperature=0.7,<br>seed=42,<br>content = completion.choices[0].message.content.strip()<br>print(content)
# Nondeterministic<br>completion = client.chat.completions.create(<br>model=MODEL,<br>messages=[{"role": "user", "content": PROMPT}],<br>temperature=0.7,<br>content = completion.choices[0].message.content.strip()<br>print(content)
Run the script a few times. You'll notice that the first two requests yield the same response every time while the third varies:
$ uv run --with groq deterministic-llm.py<br>As the last sparks of electricity faded from its rusted frame, the broken robot, once a proud guardian of a distant planet, whispered a single, haunting phrase: "I remember the stars."<br>As the last remnants of its digital soul flickered out, the broken robot's final thought was a haunting echo of its own programming: "Error 404: Life Not Found."<br>As the last sparks of electricity faded from its fractured circuits, the once-mighty robot, Echo-9, whispered a haunting phrase: "I was...