Open Source Model Predicts Shape of 1B Proteins

karakoram2 pts1 comments

Move over, AlphaFold: open-source model predicts shape of 1 billion proteins

Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain<br>the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in<br>Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles<br>and JavaScript.

Advertisement

Email

Bluesky

Facebook

LinkedIn

Reddit

Whatsapp

The AI tool designed proteins that would bind to cytotoxic T-lymphocyte-associated protein 4 (CTLA-4). Credit: Molekuul/SPL<br>The known protein universe just got a lot bigger. A newly released artificial-intelligence tool has generated an atlas of more than one billion predicted protein structures and billions more protein sequences.<br>The database, known as the ESM Atlas, was unveiled today by researchers at the Chan Zuckerberg Initiative’s Biohub, a biomedical institute created in San Francisco, California, by Facebook founder Mark Zuckerberg and his wife: physician and educator Priscilla Chan.<br>The atlas eclipses the AlphaFold database of predicted protein structures by more than 800 million entries, and a previous ESM Atlas by some 300 million.<br>The predictions were made using ESMFold2, an AI model that Biohub says surpasses the performance of AlphaFold3, the latest version of Google DeepMind’s system, and other protein-structure prediction AIs. The atlas is described in a report released today.<br>“What this atlas does is it shows the totality of protein biology and especially the parts that are most unknown,” says Biohub science head Alex Rives, who led the effort. “We think it’s going to be a really powerful substrate for the discovery of new biology.”<br>Other scientists are impressed with the results, especially that ESMFold2 is fully open source. But the Biohub model enters an increasingly crowded field, in which competing open-source and proprietary protein models are making gains at breakneck speed.<br>Antibody predictions<br>ESMFold2 is based on a ‘protein language’ model that Rives’s team unveiled in 2024, which was trained on billions of proteins from across the tree of life. It includes ‘metagenomic’ sequences from soil, ocean and other environments.<br>Rives’s team says that ESMFold2 outperforms or competes with existing methods, including AlphaFold3, at determining the correct structure of complexes of interacting proteins — including antibody molecules binding to their antigen molecular targets.<br>AlphaFold is running out of data — so drug firms are building their own version

In the report, the researchers describe how they used ESMFold2 to design new antibodies and other proteins that can strongly attach to proteins implicated in cancers and immunological conditions. When created and tested in the lab, a high proportion of the designs worked as predicted.<br>Rives’s team used the tool to create an atlas containing 1.1 billion predicted protein structures as well as information on the sequences of 6.8 billion proteins. Most of these come from metagenomic sequences that have not been fully characterized. Rives hopes that the atlas — which will be freely accessible — will help scientists to make connections between the known and the unknown parts of the protein universe. Using the atlas, the researchers found structural similarities between CRISPR microbial defence proteins and a gene-editing protein that was identified in a soil fungus in 2023 and is found in other eukaryotic species1.<br>Supplementary database

Enjoying our latest content?

Log in or create an account to continue

Access the most recent journalism from Nature's award-winning team

Explore the latest features & opinion covering groundbreaking research

Access through your institution

or

Sign in or create an account

Continue with Google

Continue with ORCiD

doi: https://doi.org/10.1038/d41586-026-01686-3

Updates & Corrections

Correction 29 May 2026 : A previous version of this article misstated that AlphaFold didn’t included marine eukaryotic and soil reference catalogues.

References<br>Saito, M. et al. Nature 620 , 660–668 (2023).

Google Scholar

Yeo, J. et al. Preprint at bioRxiv https://doi.org/10.1101/2025.04.23.650224 (2025).

Download references

Reprints and permissions

Related Articles

The huge protein database that spawned AlphaFold and biology’s AI revolution

AlphaFold is running out of data — so drug firms are building their own version

Beyond AlphaFold: how AI is decoding the grammar of the genome

AlphaFold is five years old — these charts show how it revolutionized science

AlphaFold’s new rival? Meta AI predicts shape of 600 million proteins

AlphaFold touted as next big thing for drug discovery — but is it?

What’s next for AlphaFold and the AI protein-folding revolution

Subjects

Biotechnology

Databases

Machine learning

Proteomics

Latest on:

Biotechnology

Databases

Machine learning

Mouse eyes photosynthesize after...

protein alphafold proteins atlas model version

Related Articles