This tiny piece of math prevents perfect coding agents
Heap Hopping
SubscribeSign in
This tiny piece of math prevents perfect coding agents<br>Despite popular belief, coding hasn't yet been solved and might never be solved even with the advent of general and super intelligence.
Sohan Basak<br>May 21, 2026
Share
We all have been using coding agents more and more. We have detailed SKILL.md files, context management systems, memory layers and so on. I’m not here to tell you AI is bad or that it won’t replace jobs. That ship has sailed, and frankly, if we’re not using these tools in 2026, we’re handicapping ourselves.<br>A scheduler that works perfectly, or does it?
Consider a chore scheduler. We have three roommates, and we have to rotate chores fairly, except we have to come up with what constitutes as “fair”. The kind of thing any of us might ask an AI agent to build.<br>Thanks for reading Heap Hopping! Subscribe for free to receive new posts and support my work.
Subscribe
The rules are simple:<br>No one repeats the same chore consecutively
People with fewer total assignments get higher priority
Recently assigned people get a cooldown
Some people are unavailable on certain days
Here’s the Rust code for the core assignment logic:<br>pub fn assign(&mut self, people: &[Person], chore: &str, day: &str) -> Option {<br>let last_person_for_chore = self.history.iter().rev()<br>.find(|a| a.chore == chore)<br>.map(|a| a.person.clone());
let mut candidates: Vec = people.iter()<br>.filter(|p| {<br>!self.unavailable.get(*p)<br>.map(|days| days.contains(&day.to_string()))<br>.unwrap_or(false)<br>})<br>.cloned()<br>.collect();
candidates.sort_by_key(|p| {<br>self.cooldowns.get(p).copied().unwrap_or(0),<br>self.completed.get(p).copied().unwrap_or(0),<br>});
candidates.into_iter()<br>.find(|p| Some(p.clone()) != last_person_for_chore)
The code can be considered to be pretty universally clean, well structured and follows best practices. There are 12 tests covering basic assignment, consecutive repeat prevention, unavailability, cooldown mechanics, workload balancing, history bounds, and edge cases. All pass. Green across the board.<br>Verify it yourself:<br>The full codebase is at github.com/ronniebasak/chore-scheduler-rice-theorem.<br>git clone git@github.com:ronniebasak/chore-scheduler-rice-theorem.git<br>cd chore-scheduler-rice-theorem
# All 12 tests pass<br>cargo test
# Run the 9-week simulation and see the unfair distribution<br>cargo run
Now here’s the output after 9 weeks of simulation:<br>=== Final Workload Distribution ===<br>Alice: 28 assignments (44.4%)<br>Bob: 26 assignments (41.3%)<br>Cara: 9 assignments (14.3%)
Ideal (perfectly fair): 21.0 each<br>Actual spread: 9 to 28 (delta: 19)
⚠️ UNFAIR: The spread of 19 exceeds reasonable bounds. Alice is doing three times more work than Cara. The scheduler follows every single rule it was given. Every test passes. The code is correct by every measurable standard that most AI or beginner level humans can think of. And it produces a deeply unfair outcome.<br>Rice’s theorem, in plain terms
In 1953, Henry Gordon Rice proved something that sounds almost too strong to be true:<br>No algorithm can decide any non-trivial semantic property of programs.
Let me unpack that. A “semantic property” is something about what a program does, not what it looks like. “Does this program halt?” is semantic. “Does this program produce fair outputs?” is semantic. “Does this variable name start with a lowercase letter?” is syntactic (and trivially checkable).<br>“Non-trivial” just means it’s true for some programs and false for others. “Is this scheduler fair?” qualifies, because some schedulers are fair and some aren’t.<br>Rice’s theorem says: for any such property, there is no general algorithm that can look at arbitrary source code and reliably tell you whether that property holds. Not “it’s hard.” Not “we haven’t found one yet.” It’s provably, mathematically impossible.<br>That doesn’t mean we should stop code reviews and not do AI powered reviews, because one, the more reviews we get, the more confident we can be that the code is correct. We just can’t claim it is perfect.<br>Both humans and AI hit this wall
Here’s where we need to be careful, because this isn’t an anti-AI argument.<br>Most of us wouldn’t catch the unfairness in this scheduler immediately either. We write the rules, we write the tests, we feel satisfied. It takes running the simulation over 63 days and actually looking at the distribution to notice. A human reviewer could easily miss this too.<br>And that’s exactly the point. Rice’s theorem doesn’t discriminate between carbon and silicon. It’s a statement about computation itself. No algorithm, no matter how sophisticated the LLM (or brain) behind it, can reliably determine “is this program fair” for arbitrary programs.<br>This applies equally to:<br>“Does this function match what the user actually intended?”
“Will this system behave reasonably under realistic long-term usage?”
“Is this optimization actually an improvement in the ways that matter?”
These are all...