Why the Human Genome’s Tangled Physicality May Confound AI | Quanta Magazine
About Quanta
Search
Search for:
Search<br>Search
Newsletter
Get the latest news delivered to your inbox.
Subscribe
Recent newsletters
Follow Quanta
Youtube
RSS
An editorially independent publication supported by the Simons Foundation.
Type search term(s) and press enter
What are you looking for?
Search
Home
Why the Human Genome’s Tangled Physicality May Confound AI
Comment
Save Article
Read Later
Share
Copied!
Copy link
Ycombinator
Comment
Comments
Save Article<br>Read Later
Read Later
explainers
Why the Human Genome’s Tangled Physicality May Confound AI
By
Philip Ball
June 18, 2026
Our genetic heritage is not a blueprint or an algorithm, as many biologists have imagined, but something else entirely.
Comment
Save Article
Read Later
Samuel Velasco and Hannah Waters/Quanta Magazine
Introduction
By Philip Ball
Contributing Writer
June 18, 2026
View PDF/Print Mode
artificial intelligence
biology
computer science
DNA
eukaryotes
explainers
features
gene regulation
genes
genome
genomics
proteins
RNA
transcription
All topics
Since its molecular structure was deduced in the 1950s, DNA has been hailed by many biologists as the secret of life. They’ve read and studied the information stored in the DNA found in the cells of living organisms, known as their genomes, and claimed that this genetic database must be some kind of blueprint, code script, or computer. But if DNA really does harbor some greater secret about how life works, biologists have yet to find it.
In fact, the human genome is less a script than a puzzle that gets harder the closer they look. Knowing the entire sequence — the order of all 3 billion or so of our DNA’s chemical building blocks, nearly fully deduced by the international Human Genome Project between 1990 and 2003 — hasn’t helped much. That investigation showed that barely 2% of the human genome consists of actual genes, the information-coding sequences of DNA.
It’s now clear that understanding the human genome is no longer a matter of figuring out what each gene does. The deeper and much harder question is how those genes are used, or regulated, a question that seems to involve some and perhaps much of the rest of the genome. By switching suites of genes on and off, the many different cell types in our bodies can all be created from the same material. Cells also regulate their genes from moment to moment in response to a constant inflow of signals from their neighbors and surroundings. But the processes that govern gene regulation are proving so complex that some biologists wonder whether a full understanding of it — of how the genome really works — will ever be within the grasp of our puny minds.
Some are counting on outsourcing the analysis to artificial intelligence. Genomic “foundation models” such as Evo 2, Genos, and Google DeepMind’s AlphaGenome are trained on vast quantities of genomic data, which biologists use to make predictions about how differences in DNA sequence affect biological processes and ultimately the traits (including disease risk) of a whole organism. These algorithms don’t worry about the complicated regulatory stuff going on; all of that is supposedly subsumed by the algorithm’s “training,” through which it deduces correlations from cases we already know about.
This approach is likely to be useful, but for those who crave real understanding of how the genome, and ultimately life itself, works, a computational black box will never suffice. And perhaps more to the point, the genome might not submit to the kind of straightforward input-output approach that such AI models ultimately assume.
That’s because the genome is no blueprint or algorithm. It is something else.
The Old View
Given that it’s the product of around 4 billion years of evolution, perhaps it’s not surprising that our genome is complicated. The surprise has been what those complications are. “Our genome is not what we might make it if we sat down at the drawing board,” said the biologist Karen Adelman, who studies gene regulation at Harvard Medical School.
We’ve stopped thinking about the genome as a linear piece of DNA code.
Wendy Bickmore, University of Edinburgh
The traditional view posits that a small proportion of our DNA holds the code for making the protein molecules that orchestrate our cells’ chemistry. Each instruction for a protein is held in a corresponding gene — we have around 20,000 of these — and gene sequences can range in length from a couple of dozen to almost 3 million DNA “letters” (representing molecules called nucleotides). Making a protein from its gene is a two-stage affair. First the DNA is read, letter by letter, by an enzyme called a polymerase, which creates a copy of that code in a related molecule called messenger RNA (mRNA). This is called transcription. The mRNA is then read...