Malformations of cortical development (MCD) are neurological conditions displaying focal disruption of cortical architecture and disrupted cellular organization that occurs during embryogenesis. This repository contains the pipelines for data processing, codes for data analysis and plotting for the large-scale MCD data analysis. Data for this project is available on NIMH Data Archive (NDA) under study accession 1484, and on NIMH Sequence Read Archive (SRA) under accession number PRJNA821916. Raw single cell RNA-seq data are provided on the Single Cell Portal.
The processing of the WES data followed the BSMN common pipeline, of which the WES part is described also in the BSMN common experinment paper.
Alignment and pre-processing of the MPAS data were derived the published MPAS pipeline.
The WES and MPAS pipelines were further implemented into a generalized snakemake pipeline version.
Processed data from WES and MPAS were subjected to different pipelines and candidate mosaic variants were collected: sample-specific variants were called using the paired modes using the BSMN common pipeline; sample-shared and single mode variants were either called with GATK haplotyper polidy50 according to the BSMN common pipeline (WES only), or MosaicHunter, or MuTect2 single mode followed by MosaicForecast or DeepMosaic.
Passed variants were further annotated with a pipeline we previously described, and information including the COSMIC89, gnomAD genome, avsnp150, CADD 1.3, eigen value, and fathmm by ANNOVAR command ./table_annovar.pl input.avinput /humandb/ -buildver hg19 -out output_annotated -remove -protocol refGene,gnomad_genome,avsnp150,cosmic89,cadd13,eigen,fathmm -operation g,f,f,f,f,f,f -nastring .
Codes and strategies for TASeq are available on GitHub.
Codes and inputs for the plots in the project.
oncoplot.ipynb
describes codes for oncoplot presented in Fig.1e
oncoplot.maf
contains the source data.
Genotype_phenotype_association.ipynb
contains R codes for Fig.4.
Supplementary Table 4
is the source data for the association analyse.
Single-nuclei RNA sequencing (snRNAseq) in the fetal cortex from Nowakowski et al 2017 Science
Fetal_cortex_snRNAseq.ipynb
contains R codes used for generating Fig.5.
MCD_snRNAseq.ipynb
contains R codes used for generating Fig.6, Extended Data Fig.7, Extended Data Fig.9, and Supplementary Table 5.
Codes as well as some intermediate files and scripts for the statistical analysis of the project.
📧 Changuk Chung: [email protected]
📧 Xiaoxu Yang: [email protected], [email protected]
📧 Joseph Gleeson: [email protected], or the Gleeson lab [email protected]
Chung C & Yang X, et al., Gleeson JG. Comprehensive multi-omic profiling of somatic mutations in malformations of cortical development. 2022. (Nat. Genet., DOI:10.1038/s41558-022-01276-9)