To be trustworthy, LLMs need to show their work

The AI Chemist: To be trustworthy, LLMs need to show their work

-->

Access thousands of C&EN articles

Get personalized recommendations

Follow your preferred chemistry topics

Save your favorite articles

Don't have an account?<br>Create one

Access Through Institution

The AI Chemist: To be trustworthy, LLMs need to show their work

Save

Columns

Drug Discovery

The AI Chemist: To be trustworthy, LLMs need to show their work

Good scientists reveal how they do their experiments and report their results; so should any machine-driven research

John Trant, special to C&EN

June 16, 2026

min read

A version of this story appeared in

Volume 104, Issue

Save

An illustration of a person standing with their back to the camera. The person is pressing a button on a machine that’s on top of a bench. A ribbon-style protein structure is coming out of one side of the machine. It looks like the protein is being transported away from the machine by a conveyor belt.

Credit: Yang H. Ku/C&EN/Shutterstock

Columns

Commentary on issues of enduring interest to the chemistry community, written by experts.

Introducing the AI Chemist

Artificial intelligence and large language models offer promising methods to interpret vast amounts of data but also more than a few cautions. This C&EN column will cover what the technologies can do now, what they could do in the future, and what they shouldn’t tackle—all written by expert contributors.

Drug discovery is really, really hard, and most drug candidates fail: humans are variable, the animals we use for testing aren’t humans, pharmacokinetics and pharmacodynamics are hard to predict, and unsuspected off-target effects cause toxicity. The emergence of artificial intelligence and machine learning tools such as AlphaFold has raised excitement around the potential to accelerate early-stage drug discovery. Even AI skeptics, who professionally criticize the utility and ethics of ChatGPT and other large language models (LLMs), will often say, “But of course, AlphaFold is helping cure cancer, so it’s not all bad.”

I’m not sure I agree that software like AlphaFold is an exception.

Computer-aided drug design (CADD) uses models of proteins and chemical compounds to prioritize compounds for investigation as possible drugs. This approach accelerates the early stage of drug development, as it helps focus attention on the likeliest candidates while considering frankly enormous numbers of candidates—as many as trillions of compounds. To be fair, medicinal chemists have always considered protein structure when designing drugs, but the tools and the availability of protein structures were more limited in the past. Over the past 50 years, slowly (and then ever more quickly), the computational tools addressing docking, molecular dynamics, free-energy perturbation calculations, and single-point quantum mechanics calculations have improved. The computing power available to run these calculations has improved, and so has the availability of experimentally confirmed protein conformations. But both the use of these tools and the acquisition of the data they rely on require a lot of expertise.

When AlphaFold arrived, suddenly all structures of all proteins were available to everyone at the click of a button. Then LLM-guided docking arrived, which simplified protein-ligand CADD docking, greatly democratizing the screening of protein-ligand interactions. While I was writing this column, a high school student approached me and my colleagues, asking us to help them test some proposed drug candidates they had identified though “vibe CADDing,” performing CADD by conversing with an AI assistant. You can do all the steps of CADD without the collaboration of a bunch of PhDs from different disciplines and without having studied any organic or physical, let alone quantum or medicinal, chemistry at all.

The problem is that in practice, conducting these studies still does require extensive collaboration. A structure from the Protein Data Bank (PDB) is a single frozen conformation of a highly dynamic protein. You can’t use that structure directly in a CADD study without considering dynamics, the protein’s inherent natural environment, the influence of the ligands that the protein was soaked with as a way to reduce motion enough to make a crystal or get a good cryo-electron microscopy ensemble, and more.

Structural biologists know these considerations: many proteins have dozens (or even hundreds) of different entries in the PDB. Those entries exist because the structures are not all the same. Computational, all-atomic simulations can be used to model some of this dynamism back in, but the assumptions used in these calculations are abstractions of physical reality. And when we assume . . . that leads to weird errors for you and me. But if you do this process regularly, you know what to look for and how to validate...

To be trustworthy, LLMs need to show their work

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI