Skip to content

CompGenomeLab/Phyloseek

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phyloseek

This repository contains code for computing Pdiff matrices for every residue across the extensive set of 10.2 million proteins covered by PHACT (Kuru et al., 2022) trees.

These matrices are then fed into a VQ-VAE (Oord et al., 2018) in a per-residue fashion to obtain a lower dimensional representation of this data. This results in a k-letter alphabet similar to 20-letter Foldseek (van Kempen et al., 2023) alphabet.

Note: In case of available resources, you can run the PHACT pipeline to compute Pdiff matrices for rest of the UniRef50.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages