This repo contains code for gene expression alignment presented in Building a Translational Cancer Dependency Map for The Cancer Genome Atlas. Part of the codes are incorportated from Celligner.
The preprint version of the manuscript is available at https://www.biorxiv.org/content/10.1101/2022.03.24.485544v2.
The predicted gene essentiality scores for TCGADEPMAP(Table S5), GTEXDEPMAP(Table S14) and PDXEDEPMAP (Table S12) are available at figshare.
The TCGA expression data is available from the treehouse dataset.
The DEPMAP cell line expression data is downloaded from DEPMAP portal.
The metadata for the analysis is downloaded from https://figshare.com/articles/Celligner_data/11965269.
- run_expression_alignment_TCGADEPMAP.R script can be used to align the gene expression between TCGA and DEPMAP.
- build_model_predict_TCGADEPMAP.R script can be used to build elastic-net models from DEPMAP gene essentiality data and transfer on TCGA data.
- run_SL_pipeline_Lasso.R script can be used to run SL on DEPMAP data. It can be further used for other data by changing the input data.
- The code to run the greedy version of UNCOVER is available at https://github.com/VandinLab/UNCOVER.
- The code to run biomaRt is available at https://bioconductor.org/packages/release/bioc/html/biomaRt.html.
We built a shiny app for users to explore the data. The code for running the shiny app is at shinyapp/app.R. The app will need R image data of TCGADEPMAP in the same directory of the shiny app code. The R image data can be downloaded from https://figshare.com/s/5e09e93d10892afa63c8.
Here is the list of the dependencies in R packages which are available in Bioconductor. 'here', 'tidyverse', 'reshape2', 'plyr', 'data.table', 'Seurat', 'pheatmap', 'pdist', 'gridExtra', 'ggpubr', 'grDevices', 'RColorBrewer', 'FNN', 'ggrepel', 'ggridges', 'irlba', 'viridis', 'limma', 'edgeR', 'batchelor', 'BiocParallel', 'sva', 'GSEABase', 'piano', 'fgsea', 'preprocessCore'