Tiny piece of math prevents perfect coding agents

This tiny piece of math prevents perfect coding agents

Heap Hopping

SubscribeSign in

This tiny piece of math prevents perfect coding agents Despite popular belief, coding hasn't yet been solved and might never be solved even with the advent of general and super intelligence.

Sohan Basak May 21, 2026

We all have been using coding agents more and more. We have detailed SKILL.md files, context management systems, memory layers and so on. I’m not here to tell you AI is bad or that it won’t replace jobs. That ship has sailed, and frankly, if we’re not using these tools in 2026, we’re handicapping ourselves. A scheduler that works perfectly, or does it?

Consider a chore scheduler. We have three roommates, and we have to rotate chores fairly, except we have to come up with what constitutes as “fair”. The kind of thing any of us might ask an AI agent to build. Thanks for reading Heap Hopping! Subscribe for free to receive new posts and support my work.

The rules are simple: No one repeats the same chore consecutively

People with fewer total assignments get higher priority

Recently assigned people get a cooldown

Some people are unavailable on certain days

Here’s the Rust code for the core assignment logic: pub fn assign(&mut self, people: &[Person], chore: &str, day: &str) -> Option { let last_person_for_chore = self.history.iter().rev() .find(|a| a.chore == chore) .map(|a| a.person.clone());

let mut candidates: Vec = people.iter() .filter(|p| { !self.unavailable.get(*p) .map(|days| days.contains(&day.to_string())) .unwrap_or(false) }) .cloned() .collect();

candidates.sort_by_key(|p| { self.cooldowns.get(p).copied().unwrap_or(0), self.completed.get(p).copied().unwrap_or(0), });

candidates.into_iter() .find(|p| Some(p.clone()) != last_person_for_chore)

The code can be considered to be pretty universally clean, well structured and follows best practices. There are 12 tests covering basic assignment, consecutive repeat prevention, unavailability, cooldown mechanics, workload balancing, history bounds, and edge cases. All pass. Green across the board. Verify it yourself: The full codebase is at github.com/ronniebasak/chore-scheduler-rice-theorem. git clone git@github.com:ronniebasak/chore-scheduler-rice-theorem.git cd chore-scheduler-rice-theorem

# All 12 tests pass cargo test

# Run the 9-week simulation and see the unfair distribution cargo run

Now here’s the output after 9 weeks of simulation: === Final Workload Distribution === Alice: 28 assignments (44.4%) Bob: 26 assignments (41.3%) Cara: 9 assignments (14.3%)

Ideal (perfectly fair): 21.0 each Actual spread: 9 to 28 (delta: 19)

⚠️ UNFAIR: The spread of 19 exceeds reasonable bounds. Alice is doing three times more work than Cara. The scheduler follows every single rule it was given. Every test passes. The code is correct by every measurable standard that most AI or beginner level humans can think of. And it produces a deeply unfair outcome. Rice’s theorem, in plain terms

In 1953, Henry Gordon Rice proved something that sounds almost too strong to be true: No algorithm can decide any non-trivial semantic property of programs.

Let me unpack that. A “semantic property” is something about what a program does, not what it looks like. “Does this program halt?” is semantic. “Does this program produce fair outputs?” is semantic. “Does this variable name start with a lowercase letter?” is syntactic (and trivially checkable). “Non-trivial” just means it’s true for some programs and false for others. “Is this scheduler fair?” qualifies, because some schedulers are fair and some aren’t. Rice’s theorem says: for any such property, there is no general algorithm that can look at arbitrary source code and reliably tell you whether that property holds. Not “it’s hard.” Not “we haven’t found one yet.” It’s provably, mathematically impossible. That doesn’t mean we should stop code reviews and not do AI powered reviews, because one, the more reviews we get, the more confident we can be that the code is correct. We just can’t claim it is perfect. Both humans and AI hit this wall

Here’s where we need to be careful, because this isn’t an anti-AI argument. Most of us wouldn’t catch the unfairness in this scheduler immediately either. We write the rules, we write the tests, we feel satisfied. It takes running the simulation over 63 days and actually looking at the distribution to notice. A human reviewer could easily miss this too. And that’s exactly the point. Rice’s theorem doesn’t discriminate between carbon and silicon. It’s a statement about computation itself. No algorithm, no matter how sophisticated the LLM (or brain) behind it, can reliably determine “is this program fair” for arbitrary programs. This applies equally to: “Does this function match what the user actually intended?”

“Will this system behave reasonably under realistic long-term usage?”

“Is this optimization actually an improvement in the ways that matter?”

These are all...

Tiny piece of math prevents perfect coding agents

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play