DeepMind's artificial intelligence can predict the structure of almost any known protein
From today it is possible, with a glance, to cover the entire three-dimensional universe of all (or almost) the existing proteins. In fact, one year after its initial publication, the AlphaFold database, the artificial intelligence system developed by Google's DeepMind and the European molecular biology laboratory (Embl), capable of predicting the structure of proteins, has been expanded more than 200 times: The 3D structures of over 200 million proteins have been released, making them openly available to the scientific community. Almost all the proteins known to science so far are in the database. Since its inception, AlphaFold has helped accelerate research in numerous areas of the life sciences, from developing vaccines for malaria to studying ways to eliminate plastic pollution.
The importance of structure of proteins In proteins, among the constituent elements of life, the three-dimensional structure is closely linked to the function they perform: although, in fact, the single units of which they are composed - amino acids - are only 20, arranged in precise sequences, in nature there are millions of different proteins, each of which has different characteristics and functions. This is possible because amino acids arrange themselves in space in different ways, depending on their chemical characteristics: therefore, each amino acid sequence of which a protein is made up will correspond to a different structure. Understanding the three-dimensional configuration of a protein is fundamental in research, because it provides information about its function and how to modify, block or regulate it. Over the years, the study of the 3D structure of proteins has indeed shown its usefulness in many areas of the life sciences, such as in the discovery of new drugs.
Yet, while the amino acid sequence is rather simple to identify (the list of the vast majority of proteins known so far is collected in a specific database), it is not so obvious to derive the three-dimensional structure from it: generally, in fact, experimental techniques are used which however present various obstacles in terms of complexity and implementation times. Obtaining a prediction of the structure with a computational approach was previously possible with bioinformatics technologies, but only for limited parts of the entire protein structure: what was sought, instead, was to find a way to obtain, starting from an amino acid sequence, a prediction , reliable and in high resolution, of the structure of a protein in its entirety.
“Google” the structures It is in this context that the AlphaFold artificial intelligence has entered, thanks to a collaboration between DeepMind, who developed the algorithm, and the Embl: based on machine learning, bioinformatics and structural biology techniques, the team built a database that works, according to its designer, as a "research on Google ": you enter the name, gene, amino acid sequence or organism of the protein of interest and the tool offers instant access to predictions of three-dimensional structures of that sequence with atomic precision, reducing the time it takes scientists to know the likely conformations of the protein they are studying and actually accelerating their experimental work. DeepMind and Embl launched the AlphaFold database in July 2021, which contained over 350,000 predictions about the structure of proteins, including those that made up the entire human proteome. Subsequent updates have seen the addition of 27 new proteomes, to arrive at approximately one million protein structures in the database, to which more than 500,000 researchers from over 190 countries around the world have accessed.
“We stayed surprised by the speed with which AlphaFold has already become an essential tool for hundreds of thousands of scientists in laboratories and universities around the world, "said Demis Hassabis, founder and CEO of DeepMind." AlphaFold now offers a three-dimensional view of the universe. of proteins, "added Edith Heard, CEO of Embl.
It was not enough: after a year of work the database has been updated and now has 200 million sequences, covering almost every organism on Earth. genome (plants, bacteria, animals and other organisms) has been sequenced, opening new avenues for research in the life sciences. DeepMind and Embl, in fact, hope that the new o expanded database will accelerate the work of researchers and scientists in addressing the global challenges they are called to face, from fighting neglected diseases to safeguarding the environment.
“This computational work represents an extraordinary advance on the problem of structure of proteins, a challenge in biology that has lasted for fifty years ”, said in a comment Venki Ramakrishnan, Nobel Prize in Chemistry in 2009 for the discovery of the structure and function of ribosomes. “It will be exciting to see the many ways that biological research will radically change.”
AlphaFold in action Indeed, since its launch, AlphaFold has already proven itself in the work of numerous research groups: for example, a team of scientists from the University of Oxford in the United Kingdom studied a protein that would represent one of the most promising candidates for developing a malaria vaccine, analyzing the structure to understand where the most effective antibodies could bind to block transmission of the parasite. Among the other areas in which this technology has been used there is also basic research in biology, but not only: some studies on plastic pollution, on Parkinson's disease, on the health of honey bees, on the ice formation, neglected diseases (such as Chagas disease and Leishmaniasis) and human evolution, touching every aspect of the life sciences.
"In the last year alone, over a thousand articles have been published on a wide range of research topics using AlphaFold facilities, "said Sameer Velankar, team leader of the European Embl-Ebi Protein data bank." And that's just the impact of a million forecasts: imagine the impact of having over 200 million predictions of openly accessible protein structures in the database ".
The importance of structure of proteins In proteins, among the constituent elements of life, the three-dimensional structure is closely linked to the function they perform: although, in fact, the single units of which they are composed - amino acids - are only 20, arranged in precise sequences, in nature there are millions of different proteins, each of which has different characteristics and functions. This is possible because amino acids arrange themselves in space in different ways, depending on their chemical characteristics: therefore, each amino acid sequence of which a protein is made up will correspond to a different structure. Understanding the three-dimensional configuration of a protein is fundamental in research, because it provides information about its function and how to modify, block or regulate it. Over the years, the study of the 3D structure of proteins has indeed shown its usefulness in many areas of the life sciences, such as in the discovery of new drugs.
Yet, while the amino acid sequence is rather simple to identify (the list of the vast majority of proteins known so far is collected in a specific database), it is not so obvious to derive the three-dimensional structure from it: generally, in fact, experimental techniques are used which however present various obstacles in terms of complexity and implementation times. Obtaining a prediction of the structure with a computational approach was previously possible with bioinformatics technologies, but only for limited parts of the entire protein structure: what was sought, instead, was to find a way to obtain, starting from an amino acid sequence, a prediction , reliable and in high resolution, of the structure of a protein in its entirety.
“Google” the structures It is in this context that the AlphaFold artificial intelligence has entered, thanks to a collaboration between DeepMind, who developed the algorithm, and the Embl: based on machine learning, bioinformatics and structural biology techniques, the team built a database that works, according to its designer, as a "research on Google ": you enter the name, gene, amino acid sequence or organism of the protein of interest and the tool offers instant access to predictions of three-dimensional structures of that sequence with atomic precision, reducing the time it takes scientists to know the likely conformations of the protein they are studying and actually accelerating their experimental work. DeepMind and Embl launched the AlphaFold database in July 2021, which contained over 350,000 predictions about the structure of proteins, including those that made up the entire human proteome. Subsequent updates have seen the addition of 27 new proteomes, to arrive at approximately one million protein structures in the database, to which more than 500,000 researchers from over 190 countries around the world have accessed.
“We stayed surprised by the speed with which AlphaFold has already become an essential tool for hundreds of thousands of scientists in laboratories and universities around the world, "said Demis Hassabis, founder and CEO of DeepMind." AlphaFold now offers a three-dimensional view of the universe. of proteins, "added Edith Heard, CEO of Embl.
It was not enough: after a year of work the database has been updated and now has 200 million sequences, covering almost every organism on Earth. genome (plants, bacteria, animals and other organisms) has been sequenced, opening new avenues for research in the life sciences. DeepMind and Embl, in fact, hope that the new o expanded database will accelerate the work of researchers and scientists in addressing the global challenges they are called to face, from fighting neglected diseases to safeguarding the environment.
“This computational work represents an extraordinary advance on the problem of structure of proteins, a challenge in biology that has lasted for fifty years ”, said in a comment Venki Ramakrishnan, Nobel Prize in Chemistry in 2009 for the discovery of the structure and function of ribosomes. “It will be exciting to see the many ways that biological research will radically change.”
AlphaFold in action Indeed, since its launch, AlphaFold has already proven itself in the work of numerous research groups: for example, a team of scientists from the University of Oxford in the United Kingdom studied a protein that would represent one of the most promising candidates for developing a malaria vaccine, analyzing the structure to understand where the most effective antibodies could bind to block transmission of the parasite. Among the other areas in which this technology has been used there is also basic research in biology, but not only: some studies on plastic pollution, on Parkinson's disease, on the health of honey bees, on the ice formation, neglected diseases (such as Chagas disease and Leishmaniasis) and human evolution, touching every aspect of the life sciences.
"In the last year alone, over a thousand articles have been published on a wide range of research topics using AlphaFold facilities, "said Sameer Velankar, team leader of the European Embl-Ebi Protein data bank." And that's just the impact of a million forecasts: imagine the impact of having over 200 million predictions of openly accessible protein structures in the database ".