Author: Mattia Colbertaldo
Date: February 2024
This repository contains code and data for a thesis project focused on simulating phylogenetic trees under different models and inferring their parameters using machine learning (ML) techniques.
The main purpose of this thesis work is to substitute the slow Maximum Likelihood Estimation (MLE) methods traditionally used to infer parameters with ML methods. Additionally, the goal is to create a model predictor that, given a phylogenetic tree, can automatically guess the model and infer its parameters.
-
Simulations:
- The
simulations.r
file is used to simulate phylogenetic trees under different models using functions from the diversitree package. The trees are then saved to file for further analysis. - In the
ranges.r
file, ranges of parameters for simulating trees are obtained through MLE inference on phylogenetic trees from real-world data.
- The
-
Parameter Inference:
CDV_full_tree.py
encodes the trees with the CDV representation, preparing them for ML analysis.Summary_Statistics.py
encodes the trees with the SS representation.- In the
AllModels_SS.ipynb
notebook, encoded trees are read, data is managed, and they are input into a Convolutional Neural Network (CNN) created to train it to infer the parameters of the model. The code is universal for all models in the diversitree package.
-
Model Predictor:
- The
ModelPredictor.ipynb
notebook trains different Neural Networks to try to infer the model given a tree. I suggest to look at the SS one because it is always the most updated.
- The
We explore various models for simulating phylogenetic trees, including:
- BD
- BiSSE
- MuSSE
- QuaSSE
- GeoSSE
- BiSSEness
- ClaSSE
Additionally, we compare simulations with the Constant Birth-Death model.
Feel free to explore the code and data provided in this repository. For detailed instructions on running simulations, inferring parameters, and training model predictors, refer to the respective files and notebooks.
For further reading and understanding of the models, methods, and ML techniques used, please refer to the cited references.
If you have any questions or suggestions, don't hesitate to reach out. Happy exploring!