Skip to content

Software to predict the occurence of programmed ribosomal frameshifting in bacterial, phage, and viral genomes

License

Notifications You must be signed in to change notification settings

deprekate/prfect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

prfect

PRFect is a tool to predict programmed ribosomal frameshifting in eukaryotic, prokaryotic, and viral genomes

The published manuscript is available at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-024-05701-0


PRFect takes as input the genome and its annotated CoDing Sequences (CDS) as a GenBank file.
       *  If you only have a fasta file we recommend our brand new gene caller Genotate that is
          the only gene caller that can call gene fragments

PRFect searches through a GenBank file looking for 8 different slippery site motifs associated with backwards (-1) frameshifts and two motifs associated with forward ( 1) frameshifts. When a motif is encountered, various cellular properties and factors are assessed and a prediction is made whether the site is involved in programmed ribosomal frameshifting.


To install:

python3 -m pip install prfect

To run:

prfect.py input.gbk

An example genome for SARS-Cov2 is provided in the test folder. The SARS-Cov2 genome contains 12 genes the first of which happens to be a PRF gene and is denoted as such through the use of the join keyword. Any genes already present in the input GenBank file that use the join keyword are split into their two parts and subsequently predicted anew and then tagged with the /label=1 feature tag to indicate a TruePositive. When the genome is run through PRFect the known PRF gene is correctly predicted to utilize programmed ribosomal frameshifting.

$ prfect.py test/covid19.gbk 

     CDS             join(266..13468,13468..21555)
                     /ribosomal_slippage
                     /direction=-1
                     /motif=is_threethree
                     /slippery_sequence=tttaaac
                     /label=1
                     /locus=NC_045512
                     /product="ORF1ab polyprotein"
                     /product="ORF1ab polyprotein"

Another example is bacteriophage lambda, which has the geneG and geneGT tail assembly chaperone gene that is known to frameshift. The current genbank annotation file (NC_001416) does not have the gene properly denoted with the join keyword and so both pieces are in two separate CDS features. When the genome is run through PRFect the gene is correctly identified as being a single PRF gene with the /label=0 to indicate that it is an UnknownPositive.

$ prfect.py test/lambda.gbk

     CDS             join(9711..10115,10115..10549)
                     /ribosomal_slippage
                     /direction=-1
                     /motif=is_threethree
                     /bases=gggaaag
                     /label=0
                     /locus=NC_001416
                     /product="minor tail protein G"
                     /product="tail assembly protein T"

You can show all the slippery sites that PRFect checked to make sure it evaluated a given site and to see if there were any near hits. Using the --dump flag will show the calculated cellular properites at each potential slippery site:

$ prfect.py test/lambda.gbk --dump | head
LOCUS      SLIPSITE   LOC  LABEL  N  DIR RBS1 RBS2  A0     A1     LF50    HK50    LF100   HK100  PRED  PROB  MOTIF
NC_001416  gcaaaacgc  4278   0  159   1   13   1.8  0.015  0.025  -0.24   -0.236  -0.523  -0.306   0    1.0  three
NC_001416  ggaaagtgt  10115  0   18  -1    2     0  0.004  0.024  -0.313  -0.287  -0.668  -0.404  -1   0.88  threethree  
NC_001416  gcgaaagca  31034  0   30   1    2   1.0  0.029  0.032  -0.282  -0.243  -0.477  -0.326   0    1.0  three
NC_001416  tggaaacgc  33370  0   72   1    1     0  0.015  0.028  -0.124  -0.118  -0.482  -0.36    0    1.0  three
NC_001416  cgtaaatta  33388  0   90   1    0     0  0.009  0.012  -0.15   -0.138  -0.291  -0.237   0    1.0  three
NC_001416  gcagggtgg  33442  0  144   1    0     0  0.017  0.021  -0.092  -0.039  -0.388  -0.274   0    1.0  three
NC_001416  gaaaaggag  42081  0   42  -1    0     0  0.027  0.013  -0.246  -0.149  -0.176  -0.105   0    1.0  twofour
NC_001416  aaaaccttc  42206  0   66  -1    0     0  0.015  0.014  -0.403  -0.266  -0.367  -0.249   0    1.0  fivetwo
NC_001416  cgaaaaaat  43240  0    6   1    2     0  0.019  0.023  -0.513  -0.245  -0.395  -0.294   0   0.98  four

The columns are:

LOCUS     id of the sequence
SLIPSITE  bases of the slippery site
LOC       location within the bases of the slippery site
LABEL     whether the slippery site is already annotated: 0 not a joined gene, 1 a joined gene, -1 a joined gene but is >10bp away 
N         distance of the slippery site from the in-frame stop codon
DIR       direction of the shift
RBS1      Prodigal like ribosomal binding site interference score
RBS2      RAST like ribosomal binding site interference score
A0        frequency of the A-site codon usage in all genes
A1        frequency of the  1 A-site codon usage in all genes
LF50      normalized LinearFold minimum free energy calculation of the downstream 50bp window
LF100     normalized LinearFold minimum free energy calculation of the downstream 100bp window
HK50      normalized HotKnots minimum free energy calculation of the downstream 50bp window
HK100     normalized HotKnots minimum free energy calculation of the downstream 100bp window
PRED      type of shift predicted by PRFect to occur: -1 backwards, 0 no shift,  1 forwards
PROB      how sure PRFect was for the predicted (PRED) type
MOTIF     slippery sequence motif

You can even use the flag -s to scale the MFE calculations to account for extreme GCcontent/temp/salinity:

$ prfect.py test/lambda.gbk -s 1.5 --dump | head -n 2
LOCUS      SLIPSITE   LOC  LABEL  N  DIR RBS1 RBS2  A0     A1     LF50    HK50    LF100   HK100  PRED  PROB  MOTIF
NC_001416  gcaaaacgc  4278   0  159   1   13   1.8  0.015  0.025  -0.36   -0.354  -0.785  -0.459   0    1.0  three
NC_001416  ggaaagtgt  10115  0   18  -1    2     0  0.004  0.024  -0.47   -0.431  -1.002  -0.606  -1  0.999  threethree  

you will notice that the MFE values were scaled by 50% when compared to the above dump, which also caused the trained model to be more confident in the backward -1 PREDiction at LOCation 10115

About

Software to predict the occurence of programmed ribosomal frameshifting in bacterial, phage, and viral genomes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published