How AI Solved the 50-Year-Old Obstacle of Protein Structure Prediction

Just under two decades ago, scientists worldwide were celebrating the systematic sequencing of the entire DNA blueprint. The project took 13 years and was coordinated by a global effort known as the Human Genome Project. The effort was made possible by countless hours, $400 million, and over 2,800 researchers. The result was successful sequencing of 3 billion base pairs. Now, as we enter the 18th year after the completion of the Human Genome Project, the results of this sequencing effort—with the help of artificial intelligence (AI)—have helped diagnose genetic diseases worldwide.

The Next Biological Problem

However, even after the Human Genome Project, another one of biological science’s most intriguing problems still perplexed scientists. Understanding how proteins transform from a lengthy chain of amino acids into a functional three-dimensional shape has long challenged researchers. While scientists have long known that the unique sequences of amino acids within proteins interact with one another to cause the attraction and repulsion forces that lead a protein to fold into shape, not much has been known about the unique three-dimensional shapes themselves.

The shape is an extremely important characteristic of a protein. The way a protein has folded exposes unique amino acids and binding sites. Through these binding sites, proteins can interact with other molecules and carry out their primary functions. Advanced knowledge about protein shape can aid researchers in creating biopharmaceuticals designed to interact with these binding sites. In addition, with this knowledge, researchers can create synthetic proteins that replicate organic protein shapes for use in biotechnologies that require proteins of a desired shape and function.

How AI Found the Solution

Since before the Vietnam War, researchers have recognized that mapping the full amino acid sequence of a protein could enable them to predict its shape. However, this process progressed slowly. In the 1990s, a competition was established to hasten the project. Critical Assessment of Protein Structure Prediction (CASP), assigned contestants 100 proteins for which amino acid sequences remained unknown and allowed some groups to utilize laboratory methods to find the sequences, while others utilized computerized predictions. Then, the two sets of results were compared, and the computerized predictions were scored on a scale of zero to 100 depending on how closely they match.

In 2020, a team from the UK known as DeepMind developed a deep learning algorithm called AlphaFold to identify information learned from enormous swaths of protein sequencing and structural data. Using this learned data from 170,000 known proteins, AlphaFold was able to piece together amino acid links in small pockets before connecting the pieces in ways that made structural sense. The result? Excellent scores averaging 92.4, with an impressive 87 (25 whole points above the average of the rest of the field) on the most challenging and large proteins.

Future Impacts of AI

Machine learning and AI have accomplished what decades of research and x-ray crystallography were still studying—accurate predictions and depictions of structure for many of the world’s most mysterious proteins. As a condition of participation, DeepMind has released information about the AlphaFold method. Even now, others may be adapting their strategies to incorporate AI to analyze and predict the protein structures of new disease-causing pathogens. The result may be a quicker path to therapeutic compounds known to interact with them.

While there are still many things left to unanswered, one thing is sure—AI has played a critical role in unraveling questions that have been around for over 50 years. There’s no telling what problems it could tackle next.