Skip to content

Sazan-Mahbub/EGRET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 

Repository files navigation

EGRET

This repository contains the official implementation of our paper "EGRET: Edge Aggregated GRaph Attention NETworks and Transfer Learning Improve Protein-Protein Interaction Site Prediction" published in the journal Briefings in Bioinformatics.

If you use any part of this repository, we shall be obliged if you cite our paper.

Usage

Pytorch and DGL installation

We implemented our method using PyTorch and Deep Graph Library (DGL). Please install these two for successfully running our code. Necessary installation instructions are available at the following links:

  1. Python 3.9.x
  2. PyTorch 2.2.x
  3. Deep Graph Library 2.3.x

Download pretrained-model weights:

ProtBERT model weight

  1. Please download the pretrained model weight-file "pytorch_model.bin" from here.
  2. Place this weight-file in the folder "EGRET/inputs/ProtBert_model". If you use this pretrained model for your publication, please cite the paper ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing

EGRET model weight

  1. Please download the pretrained model weight-file "egret_model_weight.dat" from here.
  2. Place this weight-file in the folder "EGRET/models".

Input Data

To store input-features, navigate to the folder "EGRET/inputs". In this folder, follow the following steps:

  1. Store the PDB files of the isolated proteins that shall be used for prediction in the folder "pdb_files". Rename the PDB files in the format: "<an arbritary name>_<chain IDs>". Please see the example PDB files provided in this folder. Please provide the real chain IDs (as available in the PDB file) after the underscore ("_") correctly. (In the provided examples <an arbritary name> is the PDB ID of a complex in which this input protein is one of the subunits. It is not mendatory.)
  2. List all the protein-names in the file "protein_list.txt"

Run inference to predict numeric propensity (of each of the residues) for interaction

  1. From command line navigate to the folder "EGRET" (where the "run_egret.py" file is situated).
  2. Please run the following command:
python run_egret.py
  1. The command above will generate the results in the "EGRET/outputs" folder.

Output format

  1. The output generated by running EGRET will be stored as a pickle file in the "EGRET/outputs" folder. To open the pickle file please run the following commands in the python interpreter:
import pickle
output = pickle.load(open('EGRET/outputs/prediction_and_attention_scores.pkl', 'rb'))
  1. In the above commands the "output" variable is a python dictionary (with the four keys: 'pred', 'protein_info', 'edges', 'attention_scores').
  • To access the predicted numeric propensity, please run the following commands:

    prediction = output['pred']  
    protein_index = 0  
    print(prediction[protein_index])  

    In the above commands, "protein_index" represents the index of the protein-name in the "protein_list.txt" file. (You can set it to any number, e.g: for the protein-name at index 2 (third row of the "protein_list.txt" file), set protein_index=2).
    These commands will print the predicted numeric propensities of all the residues in the protein at index "0" of "protein_list.txt" file. The propensities will be printed sequentially following the order of the residues in the input PDB file of this protein.

  • To access general information about the input proteins, please run the following commands:

    protein_information = output['protein_info']  
    protein_index = 0  
    print(protein_information[protein_index])    

    These commands will print a python dictionary corresponding to the protein at index "0" of "protein_list.txt" file. This python dictionary contains the number of residues in the protein (represented with the key 'seq_length' in this dictionary).

  • To access the edges of the graphs representions of the input proteins, please run the following commands:

    graph_edges = output['edges']  
    protein_index = 0  
    print(graph_edges[protein_index])  

    These commands will print a numpy array corresponding to the protein at index "0" of "protein_list.txt" file. Each row of this numpy array corresponds to a neighborhood, that contains the indices of the neighboring nodes (residues) of one residue (i.e. the center of the neighborhood). (please see our paper for more details). This center of the neighborhood is the row count of the matrix. The following example command will print the neighborhood (neighboring residue indices) of the residue with index 2 -

    center_node = 2  
    print(graph_edges[protein_index][center_node])  
  • To access the attention scores associated with the edges, please run the following commands:

    attention_scores = output['attention_scores']  
    protein_index = 0  
    print(attention_scores[protein_index])  

    These commands will print a numpy array corresponding to the protein at index "0" of "protein_list.txt" file. Each row of this numpy array contains the attention scores associated with the corresponding edge. In the following example command, center of the neighborhood is the residue at position 2. This command will print the attention scores associated with the edges from its neighboring residues (nodes) to this residue-

    center_node = 2  
    print(attention_scores[protein_index][center_node])  

Please reach out if you face any issues while trying to run the code!

Citation

Sazan Mahbub, Md Shamsuzzoha Bayzid, EGRET: edge aggregated graph attention networks and transfer learning improve protein–protein interaction site prediction, Briefings in Bioinformatics, 2022;, bbab578, https://doi.org/10.1093/bib/bbab578

BibTeX:

@article{10.1093/bib/bbab578,
    author = {Mahbub, Sazan and Bayzid, Md Shamsuzzoha},
    title = "{EGRET: edge aggregated graph attention networks and transfer learning improve protein–protein interaction site prediction}",
    journal = {Briefings in Bioinformatics},
    year = {2022},
    month = {01},
    abstract = "{Protein–protein interactions (PPIs) are central to most biological processes. However, reliable identification of PPI sites using conventional experimental methods is slow and expensive. Therefore, great efforts are being put into computational methods to identify PPI sites.We present Edge Aggregated GRaph Attention NETwork (EGRET), a highly accurate deep learning-based method for PPI site prediction, where we have used an edge aggregated graph attention network to effectively leverage the structural information. We, for the first time, have used transfer learning in PPI site prediction. Our proposed edge aggregated network, together with transfer learning, has achieved notable improvement over the best alternate methods. Furthermore, we systematically investigated EGRET’s network behavior to provide insights about the causes of its decisions.EGRET is freely available as an open source project at https://github.com/Sazan-Mahbub/EGRET.shams\[email protected]}",
    issn = {1477-4054},
    doi = {10.1093/bib/bbab578},
    url = {https://doi.org/10.1093/bib/bbab578},
    note = {bbab578},
    eprint = {https://academic.oup.com/bib/advance-article-pdf/doi/10.1093/bib/bbab578/42350487/bbab578.pdf},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages