A study shows that Artificial Intelligence can decipher the function of unknown proteins
This is the first study to demonstrate that these tools can classify previously unknown functions with great detail.
This collaborative work between two CSIC centers (CABD and IBE) allows for the identification of genes and exploration of proteins that may be of biomedical and biotechnological interest, among other research avenues.
A study conducted by the Andalusian Center for Developmental Biology (CABD-CSIC-UPO) together with the Institute of Evolutionary Biology (IBE: CSIC-UPF) in Barcelona has employed advanced artificial intelligence techniques for protein analysis. Thanks to this methodology, the research team has demonstrated that it is possible to identify and describe what proteins do in detail, even without prior information. This work allows for the mass application of these methods to understand proteins in less-studied organisms, identify new gene functions, and explore which proteins may be of biomedical and biotechnological interest with much greater precision than traditional methods.
In nature, the information contained in DNA is transformed into proteins, which are the ones that act in cells. In this project, led by CABD researchers Ildefonso Cases and Ana M. Rojas, together with Rosa Fernández from the IBE, two deep learning-based methods were used to analyze proteins in various model organisms, such as yeast, mice, and fruit flies. The exploration showed that language models (Transformers) are more effective than convolutional networks, providing more accurate and informative insights about the proteins of the studied species. Additionally, language models can retrieve functional information from RNA data (RNA is a molecule that carries DNA instructions to make proteins in cells).
"We are at a critical moment due to the enormous amount of sequencing projects of unknown organisms that produce millions of sequences, for which we cannot predict their function using traditional methods," explains Ana Rojas (CABD). This work opens up new research avenues related to higher precision in protein function analysis and classification models.
New lines of research
This new study, published in the journal ‘Nuc Acids Red Genomics and Bioinformatics’, lays the groundwork for the use of artificial intelligence in other applications. "These deep learning tools will allow us to tackle new problems in computational biology. We are working on applying these techniques to other objectives, such as custom promoters, single-cell annotation, or protein engineering."
Meanwhile, IBE researcher Rosa Fernández emphasizes that this research is crucial in the field of biodiversity, where new protein sequences are published daily, with unknown functions, allowing us to address the problem of dark proteome annotation. "To this end, we are using these tools on thousands of transcriptomes from the animal kingdom, work that is under review. The more information we have on the functions of new sequences, the faster we will decipher the molecular mechanisms of biological processes related to biodiversity and regeneration, with potential biotechnological (food industry) and biomedical (pharmaceutical industry) applications," concludes the researcher.
Referenced article:
Israel Barrios-Núñez, Gemma I Martínez-Redondo, Patricia Medina-Burgos, Ildefonso Cases, Rosa Fernández, Ana M Rojas, Decoding functional proteome information in model organisms using protein language models, NAR Genomics and Bioinformatics, Volume 6, Issue 3, September 2024, lqae078,
https://doi.org/10.1093/nargab/lqae078