Skip to content

Optimizing protein language models with Sentence Transformers - ADOPT2

License

Notifications You must be signed in to change notification settings

PeptoneLtd/contrastive-finetuning-plms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contrastive Finetuning protein Language Models

This repo contains data and scripts to demonstrate how Sentence-Transformers can be used with protein Language Models, in particular ESM models, as demonstrated in the paper Optimizing protein language models with Sentence Transformers, NeurIPS (2023).

Setup

Please note that this implementation requires GPUs.

git clone https://github.com/PeptoneLtd/contrastive-finetuning-plms.git
cd contrastive-finetuning-plms
pip install -r full_env.txt

Usage

Two minimal examples showing how to train a solubility and disorder prediction are provided.

  • scripts/solubility_search_seeds.py
  • scripts/disorder_st_avg.py

Note that the scripts take the data from the data folder and might require adjusting of the paths depending on the environment setting. For the disorder task in case of a large scale search, one might consider caching the frozen residue level representations from ESM, as currently it automatically downloads those from huggingface on-the-fly.

Citations

If you use this work in your research, please cite the the relevant software:

@inproceedings{adopt2,
  title     = {Optimizing protein language models with Sentence Transformers},
  author    = {Istvan Redl and Fabio Airoldi and Sandro Bottaro and Albert Chung and Oliver Dutton and Carlo Fisicaro and Patrik Foerch and Louie Henderson and Falk Hoffmann and Michele Invernizzi and Benjamin M J Owens and Stefano Ruschetta and Kamil Tamiola},
  booktitle = {Proceedings of the NeurIPS Workshop on Machine Learning in Structural Biology},
  year      = {2023},
  note      = {Workshop Paper},
  url       = {https://www.mlsb.io/papers_2023/Optimizing_protein_language_models_with_Sentence_Transformers.pdf}
}

Licence

This source code is licensed under the Apache 2.0 license found in the LICENSE file in the root directory of this source tree.

About

Optimizing protein language models with Sentence Transformers - ADOPT2

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages