Accurate prediction of protein function using statistics informed graph networks
2024-08-14
From:
Mabnus
浏览量:

Background

Understanding protein function is crucial for understanding the complex mechanisms of many key biological activities and has profound implications for medicine, biotechnology, and drug development. However, over 200 million proteins remain uncharacterized, and computational efforts rely heavily on protein structural information to predict annotations of different functions.

On August 4, 2024, researchers from the University of Oxford published a study titled "Accurate prediction of protein function using statistics-informed graph networks" in Nature Communications. The study designed a statistically based learning method, PhiGnet, to facilitate functional annotation of proteins and identification of functional sites. This method inherently characterizes evolutionary features, allowing for a quantitative assessment of the importance of residues performing specific functions. Compared to other methods, PhiGnet not only demonstrates superior performance but also narrows the sequence-function gap even in the absence of structural information. Our results demonstrate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting existing properties and novel functions of proteins in research and biomedicine.

Accurate prediction of protein function using statistics informed graph networks

PhiGnet for protein function annotation

PhiGnet can predict protein function based solely on sequence, without structural information. Given a sequence, PhiGnet uses a stacked GCN to learn pre-embeddings, EVCs, and RCs to infer protein functional annotations. Researchers calculated the RCs for the serine-aspartate repeat-containing protein D (SdrD). The two RCs were mapped to a complete β-sheet fold that is bound by three Ca2+ ions. Most residues identified from the EVCs work together with the three Ca2+ ions to stabilize the SdrD fold. This suggests that even though EVCs are sparsely distributed within the RCs, they contain essential information for inferring the functional roles of residues. PhiGnet was used to calculate activation scores for functional sites in the MgIA protein. The activation scores calculated by PhiGnet measure the importance of each residue; the higher the score, the greater the likelihood that the residue plays a functional role in biological activity.

Accurate prediction of protein function using statistics informed graph networks

PhiGnet annotation of protein functional sites

Given that the functional contributions of amino acids can vary significantly across functions, a key feature of PhiGnet is its ability to quantitatively assess the importance of individual amino acids for specific functions, enabling us to identify residues associated with distinct biological activities. PhiGnet assessed the importance of residues in nine proteins, with activation scores indicating essential ligand/ion contact residues, demonstrating that learning from different levels of evolutionary knowledge can identify binding interfaces at the residue level.

Accurate prediction of protein function using statistics informed graph networks

PhiGnet over other methods

The PhiGnet method demonstrated predictive power in assigning functional annotations to proteins in two test sets, accurately assigning EC numbers to proteins. Comparing prediction performance for GO terms, PhiGnet ranked first in both accuracy and robustness. PhiGnet is robust in generalization and can be tested against proteins with different sequence identity thresholds compared to those in the training set. PhiGnet is able to predict protein function from amino acid sequence without structural information and quantify the importance of each residue for a specific function, which is important for identifying functionally critical sites.

Accurate prediction of protein function using statistics informed graph networks

PhiGnet driven by evolutionary traits

Through extensive experiments, the researchers found that the residues predicted by PhiGnet are consistent with those at experimentally determined functional sites. Evolutionary information, particularly that contained in RCs, is sufficient to assign protein function and quantitatively characterize residues at functional sites. Furthermore, the results indicate that RCs contain evolutionary knowledge at a higher level, while EVCs contain information at a lower level. Furthermore, the information contained in RCs plays a significant role in enhancing PhiGnet's ability to identify functionally relevant sites at the residue level.

Accurate prediction of protein function using statistics informed graph networks

The researchers also applied PhiGnet to some proteins of unknown function and found through experimental verification that PhiGnet's high-confidence predictions were highly consistent with experimental annotations, suggesting that it will be helpful in the computational work of assigning functional annotations to proteins with unknown labels.

Summarize

This study designed a statistically based learning method, PhiGnet, to facilitate functional annotation of proteins and identification of functional sites. This method inherently captures evolutionary features, allowing for a quantitative assessment of the importance of residues for performing specific functions. PhiGnet outperformed other methods and was able to quantify the importance of individual residues for specific functions even in the absence of structural data. The results of this study demonstrate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting existing properties and new functions of proteins in research and biomedicine.

Accurate prediction of protein function using statistics informed graph networks