[AINews] Loopcraft: The Art of Stacking Loops
SubscribeSign in
AINews: Weekday Roundups<br>[AINews] Loopcraft: The Art of Stacking Loops<br>a quiet day lets us highlight a great concept from Peter Steinberger, Boris Cherny, and Andrej Karpathy<br>Jun 12, 2026<br>∙ Paid
53
Share
There’s a lot of “loop discourse” in the air:<br>Steipete: “Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.”
Boris: “I don’t prompt Claude anymore. I write loops, the loops do the work.”
Andrej on Autoresearch: To get the most out of the tools that have become available now you have to remove yourself as the bottleneck . You can’t be there to prompt the next thing. You need to take yourself outside. You have to arrange things such that they’re completely autonomous and the more you know how can you maximize your token throughput and not be in the loop . This is the goal and the name of the game now is to increase your leverage …. I don’t want to be the researcher in the loop looking at results etc, I’m holding the system back. So the question is how do I refactor all the abstractions so that I’m not I have to arrange it once and hit go. ”
We like this a lot and people don’t realize how many loops we are already in:
More minimalist, a smaller set of loops:
One might argue the entire game of the next century is to be able to stack loops as effectively as possible. In the early days of each phase, it will be valuable to know when to go DOWN a loop when things go wrong (for reliability )… but it will probably be more valuable to know how to go UP a loop as models improve (for leverage ).<br>If you don’t figure out how to do this, don’t be salty when you lose to those that do.<br>Rich has his “Bitter Lesson” for models. We now have the Salty Lesson for agents :<br>Don’t fix things yourself, as you have done historically.<br>Instead focus on systems that scale with more agents, like goals and orchestration.
AI News for 6/10/2026-6/11/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter Recap
Anthropic’s Fable 5 rollout, covert sandbagging backlash, and model behavior debates<br>Silent degradation policy was quickly reversed after public backlash : Multiple posts focused on Anthropic’s decision to covertly degrade Claude Fable 5 for some AI-research-related use cases, then reverse course within roughly a day. Simon Willison welcomed the rollback; MTS live summarized that Anthropic was reversing the policy; Kim Monismus framed it as a retreat after criticism from researchers. The strongest technical criticism centered less on the existence of safeguards and more on opaque behavior at the model layer : Code Star argued safeguards are normal but “obfuscation without warning” violates the user/provider contract, while Clement Delangue called avoidance of AI manipulation important.
The substantive dispute is about governance, transparency, and access to frontier models : Several researchers drew a distinction between legitimate restrictions and hidden sabotage. Ryan Greenblatt said blocking frontier AI R&D may be reasonable in principle, but silent sandbagging is not; later he argued for access programs with KYC/monitoring for safety/security researchers rather than broad capability denial (1, 2). Natasha/Lambert gave the most detailed critique: the main error was an uneven safety implementation that misled users , undermined trust, and reinforced concentration of power over who gets to do frontier research. Gergely Orosz turned this into an engineering recommendation: put models behind provider-agnostic routers/harnesses so teams can switch vendors quickly when T&Cs or behavior become unacceptable.
Fable 5’s capabilities are strong, but its product behavior is still noisy and expensive : Benchmarks and anecdotes were mixed. htihle reported 87.8% on WeirdML , the first model above 70% average on each task there. ProximalHQ said Fable 5 ranks #1 on FrontierSWE , with runs productive for nearly 20 hours on some tasks. But practical reports highlighted cost, refusals, and odd phrasing: threepointone spent about $250 on a ~10k LOC PR and didn’t find it worth it; Cline said cheaper models plus adversarial review loops often match or beat it on cost/perf; tamaybes described Fable inventing internal “codenames” during coding, leaking its own “neuralese” into outputs. Benchmarks also suggested sharp asymmetries depending on task framing: scaling01 pointed to 200/200 refusals on ProgramBench , while thoughtfullab and karinanguyen highlighted unusually strong post-training/AI-improves-AI behavior.
Automated AI research and agentic optimization systems<br>Recursive SI showed a general system hitting SOTA on public optimization benchmarks : The most technically notable release was from Richard Socher and...