Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994.[1][2] CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a "world championship" in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

A target structure (ribbons) and 354 template-based predictions superimposed (gray Calpha backbones); from CASP8

Selection of target proteins

edit

In order to ensure that no predictor can have prior information about a protein's structure that would put them at an advantage, it is important that the experiment be conducted in a double-blind fashion: Neither predictors nor the organizers and assessors know the structures of the target proteins at the time when predictions are made. Targets for structure prediction are either structures soon-to-be solved by X-ray crystallography or NMR spectroscopy, or structures that have just been solved (mainly by one of the structural genomics centers) and are kept on hold by the Protein Data Bank. If the given sequence is found to be related by common descent to a protein sequence of known structure (called a template), comparative protein modeling may be used to predict the tertiary structure. Templates can be found using sequence alignment methods (e.g. BLAST or HHsearch) or protein threading methods, which are better in finding distantly related templates. Otherwise, de novo protein structure prediction must be applied (e.g. Rosetta), which is much less reliable but can sometimes yield models with the correct fold (usually, for proteins less than 100-150 amino acids). Truly new folds are becoming quite rare among the targets,[3][4] making that category smaller than desirable.

Evaluation

edit

The primary method of evaluation[5] is a comparison of the predicted model α-carbon positions with those in the target structure. The comparison is shown visually by cumulative plots of distances between pairs of equivalents α-carbon in the alignment of the model and the structure, such as shown in the figure (a perfect model would stay at zero all the way across), and is assigned a numerical score GDT-TS (Global Distance Test—Total Score) describing percentage of well-modeled residues in the model with respect to the target.[6] Free modeling (template-free, or de novo) is also evaluated visually by the assessors, since the numerical scores do not work as well for finding loose resemblances in the most difficult cases.[7] High-accuracy template-based predictions were evaluated in CASP7 by whether they worked for molecular-replacement phasing of the target crystal structure[8] with successes followed up later,[9] and by full-model (not just α-carbon) model quality and full-model match to the target in CASP8.[10]

Evaluation of the results is carried out in the following prediction categories:

Tertiary structure prediction category was further subdivided into:

  • homology modeling
  • fold recognition (also called protein threading; note that this naming is incorrect as threading is a method)
  • de novo structure prediction, now referred to as 'New Fold' as many methods apply evaluation, or scoring, functions that are biased by knowledge of native protein structures, such as an artificial neural network.

Starting with CASP7, categories have been redefined to reflect developments in methods. The 'Template based modeling' category includes all former comparative modeling, homologous fold based models and some analogous fold based models. The 'template free modeling (FM)' category includes models of proteins with previously unseen folds and hard analogous fold based models. Due to limited numbers of template free targets (they are quite rare), in 2011 so called CASP ROLL was introduced. This continuous (rolling) CASP experiment aims at more rigorous evaluation of template free prediction methods through assessment of a larger number of targets outside of the regular CASP prediction season. Unlike LiveBench and EVA, this experiment is in the blind-prediction spirit of CASP, i.e. all the predictions are made on yet unknown structures.[11]

The CASP results are published in special supplement issues of the scientific journal Proteins, all of which are accessible through the CASP website.[12] A lead article in each of these supplements describes specifics of the experiment[13][14] while a closing article evaluates progress in the field.[15][16]

AlphaFold

edit

In December 2018, CASP13 made headlines when it was won by AlphaFold, an artificial intelligence program created by DeepMind.[17] In November 2020, an improved version 2 of AlphaFold won CASP14.[18] According to one of CASP co-founders John Moult, AlphaFold scored around 90 on a 100-point scale of prediction accuracy for moderately difficult protein targets.[19] AlphaFold was made open source in 2021, and in CASP15 in 2022, while DeepMind did not enter, virtually all of the high-ranking teams used AlphaFold or modifications of AlphaFold.[20]

See also

edit

References

edit
  1. ^ "Home - CASP15". predictioncenter.org. Retrieved 2022-12-14.
  2. ^ Moult J, Pedersen JT, Judson R, Fidelis K (November 1995). "A large-scale experiment to assess protein structure prediction methods". Proteins. 23 (3): ii–v. doi:10.1002/prot.340230303. PMID 8710822. S2CID 11216440.ii-v&rft.date=1995-11&rft_id=https://api.semanticscholar.org/CorpusID:11216440#id-name=S2CID&rft_id=info:pmid/8710822&rft_id=info:doi/10.1002/prot.340230303&rft.aulast=Moult&rft.aufirst=J&rft.au=Pedersen, JT&rft.au=Judson, R&rft.au=Fidelis, K&rft_id=https://zenodo.org/record/1229334&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  3. ^ Tress ML, Ezkurdia I, Richardson JS (2009). "Target domain definition and classification in CASP8". Proteins. 77 Suppl 9 (Suppl 9): 10–7. doi:10.1002/prot.22497. PMC 2805415. PMID 19603487.10-7&rft.date=2009&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2805415#id-name=PMC&rft_id=info:pmid/19603487&rft_id=info:doi/10.1002/prot.22497&rft.aulast=Tress&rft.aufirst=ML&rft.au=Ezkurdia, I&rft.au=Richardson, JS&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2805415&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  4. ^ Zhang Y, Skolnick J (January 2005). "The protein structure prediction problem could be solved using the current PDB library". Proceedings of the National Academy of Sciences of the United States of America. 102 (4): 1029–34. Bibcode:2005PNAS..102.1029Z. doi:10.1073/pnas.0407152101. PMC 545829. PMID 15653774.1029-34&rft.date=2005-01&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC545829#id-name=PMC&rft_id=info:pmid/15653774&rft_id=info:doi/10.1073/pnas.0407152101&rft_id=info:bibcode/2005PNAS..102.1029Z&rft.aulast=Zhang&rft.aufirst=Y&rft.au=Skolnick, J&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC545829&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  5. ^ Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A (2009). "Evaluation of template-based models in CASP8 with standard measures". Proteins. 77 Suppl 9 (Suppl 9): 18–28. doi:10.1002/prot.22561. PMC 4589151. PMID 19731382.18-28&rft.date=2009&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589151#id-name=PMC&rft_id=info:pmid/19731382&rft_id=info:doi/10.1002/prot.22561&rft.aulast=Cozzetto&rft.aufirst=D&rft.au=Kryshtafovych, A&rft.au=Fidelis, K&rft.au=Moult, J&rft.au=Rost, B&rft.au=Tramontano, A&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589151&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  6. ^ Zemla A (July 2003). "LGA: A method for finding 3D similarities in protein structures". Nucleic Acids Research. 31 (13): 3370–4. doi:10.1093/nar/gkg571. PMC 168977. PMID 12824330.3370-4&rft.date=2003-07&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC168977#id-name=PMC&rft_id=info:pmid/12824330&rft_id=info:doi/10.1093/nar/gkg571&rft.aulast=Zemla&rft.aufirst=A&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC168977&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  7. ^ Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y (2009). "Assessment of CASP8 structure predictions for template free targets". Proteins. 77 Suppl 9 (Suppl 9): 50–65. doi:10.1002/prot.22591. PMID 19774550. S2CID 16517118.50-65&rft.date=2009&rft_id=https://api.semanticscholar.org/CorpusID:16517118#id-name=S2CID&rft_id=info:pmid/19774550&rft_id=info:doi/10.1002/prot.22591&rft.aulast=Ben-David&rft.aufirst=M&rft.au=Noivirt-Brik, O&rft.au=Paz, A&rft.au=Prilusky, J&rft.au=Sussman, JL&rft.au=Levy, Y&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  8. ^ Read RJ, Chavali G (2007). "Assessment of CASP7 predictions in the high accuracy template-based modeling category". Proteins. 69 Suppl 8 (Suppl 8): 27–37. doi:10.1002/prot.21662. PMID 17894351. S2CID 33172629.27-37&rft.date=2007&rft_id=https://api.semanticscholar.org/CorpusID:33172629#id-name=S2CID&rft_id=info:pmid/17894351&rft_id=info:doi/10.1002/prot.21662&rft.aulast=Read&rft.aufirst=RJ&rft.au=Chavali, G&rft_id=https://doi.org/10.1002%2Fprot.21662&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  9. ^ Qian B, Raman S, Das R, Bradley P, McCoy AJ, Read RJ, Baker D (November 2007). "High-resolution structure prediction and the crystallographic phase problem". Nature. 450 (7167): 259–64. Bibcode:2007Natur.450..259Q. doi:10.1038/nature06249. PMC 2504711. PMID 17934447.259-64&rft.date=2007-11&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2504711#id-name=PMC&rft_id=info:pmid/17934447&rft_id=info:doi/10.1038/nature06249&rft_id=info:bibcode/2007Natur.450..259Q&rft.aulast=Qian&rft.aufirst=B&rft.au=Raman, S&rft.au=Das, R&rft.au=Bradley, P&rft.au=McCoy, AJ&rft.au=Read, RJ&rft.au=Baker, D&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2504711&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  10. ^ Keedy DA, Williams CJ, Headd JJ, Arendall WB, Chen VB, Kapral GJ, et al. (2009). "The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models". Proteins. 77 Suppl 9 (Suppl 9): 29–49. doi:10.1002/prot.22551. PMC 2877634. PMID 19731372.29-49&rft.date=2009&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2877634#id-name=PMC&rft_id=info:pmid/19731372&rft_id=info:doi/10.1002/prot.22551&rft.aulast=Keedy&rft.aufirst=DA&rft.au=Williams, CJ&rft.au=Headd, JJ&rft.au=Arendall, WB&rft.au=Chen, VB&rft.au=Kapral, GJ&rft.au=Gillespie, RA&rft.au=Block, JN&rft.au=Zemla, A&rft.au=Richardson, DC&rft.au=Richardson, JS&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2877634&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  11. ^ Kryshtafovych A, Monastyrskyy B, Fidelis K (February 2014). "CASP prediction center infrastructure and evaluation measures in CASP10 and CASP ROLL". Proteins. 82 Suppl 2 (2): 7–13. doi:10.1002/prot.24399. PMC 4396618. PMID 24038551.7-13&rft.date=2014-02&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396618#id-name=PMC&rft_id=info:pmid/24038551&rft_id=info:doi/10.1002/prot.24399&rft.aulast=Kryshtafovych&rft.aufirst=A&rft.au=Monastyrskyy, B&rft.au=Fidelis, K&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396618&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  12. ^ "CASP Proceedings".
  13. ^ Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A (2007). "Critical assessment of methods of protein structure prediction-Round VII". Proteins. 69 Suppl 8 (Suppl 8): 3–9. doi:10.1002/prot.21767. PMC 2653632. PMID 17918729.3-9&rft.date=2007&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2653632#id-name=PMC&rft_id=info:pmid/17918729&rft_id=info:doi/10.1002/prot.21767&rft.aulast=Moult&rft.aufirst=J&rft.au=Fidelis, K&rft.au=Kryshtafovych, A&rft.au=Rost, B&rft.au=Hubbard, T&rft.au=Tramontano, A&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2653632&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  14. ^ Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A (2009). "Critical assessment of methods of protein structure prediction - Round VIII". Proteins. 77 Suppl 9 (Suppl 9): 1–4. doi:10.1002/prot.22589. PMID 19774620. S2CID 9704851.1-4&rft.date=2009&rft_id=https://api.semanticscholar.org/CorpusID:9704851#id-name=S2CID&rft_id=info:pmid/19774620&rft_id=info:doi/10.1002/prot.22589&rft.aulast=Moult&rft.aufirst=J&rft.au=Fidelis, K&rft.au=Kryshtafovych, A&rft.au=Rost, B&rft.au=Tramontano, A&rft_id=https://doi.org/10.1002%2Fprot.22589&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  15. ^ Kryshtafovych A, Fidelis K, Moult J (2007). "Progress from CASP6 to CASP7". Proteins. 69 Suppl 8 (Suppl 8): 194–207. doi:10.1002/prot.21769. PMID 17918728. S2CID 40200832.194-207&rft.date=2007&rft_id=https://api.semanticscholar.org/CorpusID:40200832#id-name=S2CID&rft_id=info:pmid/17918728&rft_id=info:doi/10.1002/prot.21769&rft.aulast=Kryshtafovych&rft.aufirst=A&rft.au=Fidelis, K&rft.au=Moult, J&rft_id=https://doi.org/10.1002%2Fprot.21769&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  16. ^ Kryshtafovych A, Fidelis K, Moult J (2009). "CASP8 results in context of previous experiments". Proteins. 77 Suppl 9 (Suppl 9): 217–28. doi:10.1002/prot.22562. PMC 5479686. PMID 19722266.217-28&rft.date=2009&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5479686#id-name=PMC&rft_id=info:pmid/19722266&rft_id=info:doi/10.1002/prot.22562&rft.aulast=Kryshtafovych&rft.aufirst=A&rft.au=Fidelis, K&rft.au=Moult, J&rft_id=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5479686&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  17. ^ Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 19 July 2019.
  18. ^ "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.
  19. ^ Callaway, Ewen (2020). "'It will change everything': DeepMind's AI makes gigantic leap in solving protein structures". Nature. 588 (7837): 203–204. doi:10.1038/d41586-020-03348-4. PMID 33257889. S2CID 227243204.203-204&rft.date=2020&rft_id=https://api.semanticscholar.org/CorpusID:227243204#id-name=S2CID&rft_id=info:pmid/33257889&rft_id=info:doi/10.1038/d41586-020-03348-4&rft.aulast=Callaway&rft.aufirst=Ewen&rft_id=https://www.nature.com/articles/d41586-020-03348-4&rfr_id=info:sid/en.wikipedia.org:CASP" class="Z3988">
  20. ^ Schreiner, Maximilian (2022-12-14). "CASP15: AlphaFold's success spurs new challenges in protein-structure prediction". The Decoder. Retrieved 2023-02-13.
edit

Result ranking

edit

Automated assessments for CASP15 (2022)

Automated assessments for CASP14 (2020)

Automated assessments for CASP13 (2018)

Automated assessments for CASP12 (2016)

Automated assessments for CASP11 (2014)

Automated assessments for CASP10 (2012)

Automated assessments for CASP9 (2010)

Automated assessments for CASP8 (2008)

Automated assessments for CASP7 (2006)