How Ångstrom (YC S24) used Claude Code to train a model that beat Meta's UMA-OMC | anycloud
Skip to main content<br>Ångstrom AI (YC S24), with the University of Cambridge (the Csanyi group) and AstraZeneca, released DFT Accuracy on Crystal Structure Prediction with Machine Learning Interatomic Potentials. The paper presented CSP-MACE-Å , a machine learning model designed to replace DFT, the expensive quantum mechanical calculation at the heart of crystal structure prediction, with the same accuracy but a 10,000x speedup.
CSP-MACE-Å also significantly outperformed UMA-OMC on crystal-structure prediction benchmarks. UMA is Meta's general purpose model for atoms and molecules; UMA-OMC is the version adapted for organic molecular crystals.
Ångstrom built CSP-MACE-Å on anycloud, a CLI that runs GPU jobs across your own cloud accounts. Ångstrom pointed Claude Code at anycloud: the agent called the anycloud CLI to drive the experiment loop, resulting in roughly 100,000 GPU jobs , almost entirely on multi-cloud spot, on their own cloud accounts.
Why CSP-MACE-Å matters to AstraZeneca
Crystal structure prediction (CSP) answers a deceptively simple question: given a molecule, what solid crystal structures can it form? It matters because one molecule can pack into different crystals structures (known as polymorphs) with different physical characteristics. This creates a major risk for pharmaceutical development, especially when late-appearing forms emerge during manufacturing or storage and alter product performance. In 1998, that nearly sank the HIV drug ritonavir. The drug had to be pulled and reformulated when an unexpected, more stable crystal form of the same molecule appeared 2 years after market release. This cost Abbott more than $250 million. Veritasium tells the story well in the The Crystal That Could Destroy All Medicine video. It is imperative for drugmakers to map all the possible crystal forms of a molecule before release in order to derisk the possibility of an unexpectedly shift to a more stable form later on that may render the drug unusable once it has been distributed.
The workhorse of CSP is DFT (density functional theory). DFT is quantum-mechanical calculation that serves as the gold standard for CSP in industry and academia. However, DFT is extremely expensive and slow. The calculations for one molecule can take days to weeks, which slows down the scientists using it, and caps how many structures they can explore.
Ångstrom’s machine learning model, CSP-MACE-Å, is 10,000 times faster than DFT. Calculations go from taking weeks with DFT to minutes with CSP-MACE-Å. Not only does this save scientists time, but it ultimately means that far more candidate crystal structures may be evaluated, providing greater confidence when derisking crystal forms.
CSP-MACE-Å was also shown to outperform Meta's UMA-OMC model across Ångstrom’s and AstraZeneca's evaluation suites. Meta's UMA-OMC was the previous state of the art machine learning interatomic potential for CSP, however its accuracy was inferior to gold standard DFT. CSP-MACE-Å is the first model to demonstrate the accuracy of DFT for CSP, delivering a massive speed improvement without sacrificing accuracy.
The agent-driven experiment loop
The bottleneck to develop the CSP-MACE-Å model at Ångstrom is the speed at which Ångstrom can iterate on the loop that underlies many AI research orgs:<br>Forming a hypothesis, deciding what computational experiments to run to test it, launching the GPU jobs, pulling results back, analyzing the results and deciding on the next hypothesis to test. All the while, having to additionally reduce GPU costs, and manage hardware failures (and bugs!).
Ångstrom researchers used Claude Code in that loop. They talked through what computational experiments to run, which batches of jobs to launch, what outputs to compare, and what plots/metrics would answer the current question. Claude then turned that plan into concrete work: launching batches of anycloud jobs, monitoring status, downloading results, and generating plots and summaries for the next research decision.
Claude used the same local anycloud CLI and cloud configuration the team used by hand. The researchers stayed focused on the experiment plan and interpretation; Claude handled the execution: the fan-out and bookkeeping between decisions. However, the same fan-out that made the loop fast also made it dangerous: the wrong batch of GPU jobs could become thousands of dollars of real spend before anyone noticed.
How anycloud kept the AI research experiment loop under control
“anycloud gives me the confidence to really let my agents loose without stressing that they will burn through all our compute. These days they continue to work throughout night, autonomously managing my research experiments, while I sleep."
Laurence Midgley , Co-founder & CTO, Ångstrom AI
The most recent feature to anycloud that Ångström loves are spend controls scoped to the agent session....