snpless-nf - A Nextflow pipeline for time-course analysis with bacterial NGS whole-genome data.
- QC
- FASTQC FastQC
- TRIM Trimmomatic
- PEAR pear
- GENMAP GenMap
- ASSEMBLY
- MAPPING
- BRESEQ breseq >> SAMTOOLS samtools add read group
- MINIMAP2 minimap2 >> SAMBLASTER samblaster remove duplicates
- BWA BWA >> SAMBLASTER samblaster remove duplicates
- COVERAGE samtools
- SNPCALLING
- FREEBAYES freebayes >> VCFFILTER vcflib >> VT Vt normalize >> decompose
- BCFTOOLS bcftools mpileup, call, vcfutils.pl varFilter >> VT Vt normalize >> decompose
- LOFREQ LoFreq indelqual, index, call-parallel
- VARSCAN varscan mpileup2snp, mpileup2indel
- MPILEUP samtools >> parse_mpileup.py >> annotate_pvalues
- GDCOMPARE gdtools
- SVCALLING
- FILTERING/MERGING
- BEDTOOLS bedtools
- ANNOTATION
- SNPEFF SnpEff
- PLOTTING
- PLOT R
Addtional Tools used for data conversion and data analysis:
- HTSLIB htslib
- trajectory_pvalue_cpp_code https://github.com/benjaminhgood/LTEE-metagenomic/tree/master/trajectory_pvalue_cpp_code compiled into annotate_pvalues
- create_timecourse.py https://github.com/benjaminhgood/LTEE-metagenomic/blob/master/cluster_scripts/create_timecourse.py used in parse_mpileup.py
- Install Nextflow (>=21.10.0)
Install Nextflow by using the following command:
curl -s https://get.nextflow.io | bash
or
Install Nextflow by using conda:
conda create -n nf python=3
conda activate nf
conda install -c bioconda nextflow
- Download the pipeline
git clone https://github.com/kullrich/snpless-nf.git
- Test the pipeline on an minimal dataset with a single command:
Using nextflow conda environment:
conda activate nf
nextflow run snpless-nf -profile test
- Start running your own analysis:
Check the necessary input files!
nextflow run snpless-nf --input <samples.tsv> --reference <genome.fna> --gff3 <genome.gff3> --proteins <genome.gbff>
Download via wget:
cd snpless-nf/examples
wget -O behringer2018.tar.gz https://owncloud.gwdg.de/index.php/s/fqD9ik2s3FReOUn/download
tar -xvf behringer2018.tar.gz
Download via weblink:
behringer2018 - samples 113, 129, 221
Using nextflow conda environment:
conda activate nf
nextflow run snpless-nf --input behringer2018/behringer2018_113.txt --reference behringer2018/GCF_000005845.2_ASM584v2_genomic.fna --gff3 GCF_000005845.2_ASM584v2_genomic.gff --proteins behringer2018/GCF_000005845.2_ASM584v2_genomic.gbff
see a detailed description here: usage
see a detailed description here: parameters
see a detailed description here: output
MIT (see LICENSE)
If you would like to contribute to snpless-nf, please file an issue so that one can establish a statement of need, avoid redundant work, and track progress on your contribution.
Before you do a pull request, you should always file an issue and make sure that someone from the snpless-nf developer team agrees that it’s a problem, and is happy with your basic proposal for fixing it.
Once an issue has been filed and we've identified how to best orient your contribution with package development as a whole, fork the main repo, branch off a feature branch from master
, commit and push your changes to your fork and submit a pull request for snpless-nf:master
.
By contributing to this project, you agree to abide by the Code of Conduct terms.
Please report any errors or requests regarding snpless-nf to Kristian Ullrich ([email protected])
This repository adhere to Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)
See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society Code-of-Conduct.
By contributing to this project, you agree to abide by its terms.
Behringer, Megan G., et al. "Escherichia coli cultures maintain stable subpopulation structure during long-term evolution." Proceedings of the National Academy of Sciences 115.20 (2018): E4642-E4650. https://www.pnas.org/content/115/20/E4642.short
- Good, Benjamin H., et al. "The dynamics of molecular evolution over 60,000 generations." Nature 551.7678 (2017): 45-50. link
- Di Tommaso, Paolo, et al. "Nextflow enables reproducible computational workflows." Nature biotechnology 35.4 (2017): 316-319. link
- Andrews, Simon. "FastQC: a quality control tool for high throughput sequence data. 2010." (2017): W29-33. link 4.Bolger, Anthony M., Marc Lohse, and Bjoern Usadel. "Trimmomatic: a flexible trimmer for Illumina sequence data." Bioinformatics 30.15 (2014): 2114-2120. link
- Zhang, Jiajie, et al. "PEAR: a fast and accurate Illumina Paired-End reAd mergeR." Bioinformatics 30.5 (2014): 614-620. link
- Pockrandt, Christopher, et al. "GenMap: ultra-fast computation of genome mappability." Bioinformatics 36.12 (2020): 3687-3692. link
- Wick, Ryan R., et al. "Unicycler: resolving bacterial genome assemblies from short and long sequencing reads." PLoS computational biology 13.6 (2017): e1005595. link
- Seemann, Torsten. "Prokka: rapid prokaryotic genome annotation." Bioinformatics 30.14 (2014): 2068-2069. link
- Deatherage, Daniel E., and Jeffrey E. Barrick. "Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq." Engineering and analyzing multicellular systems. Humana Press, New York, NY, 2014. 165-188. link
- Li, Heng. "Minimap2: pairwise alignment for nucleotide sequences." Bioinformatics 34.18 (2018): 3094-3100. link
- Li, Heng. "Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM." arXiv preprint arXiv:1303.3997 (2013). link
- Faust, Gregory G., and Ira M. Hall. "SAMBLASTER: fast duplicate marking and structural variant read extraction." Bioinformatics 30.17 (2014): 2503-2505. link
- Li, Heng, et al. "The sequence alignment/map format and SAMtools." Bioinformatics 25.16 (2009): 2078-2079. link
- Garrison, Erik, and Gabor Marth. "Haplotype-based variant detection from short-read sequencing." arXiv preprint arXiv:1207.3907 (2012). link
- Wilm, Andreas, et al. "LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets." Nucleic acids research 40.22 (2012): 11189-11201. link
- Koboldt, Daniel C., et al. "VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing." Genome research 22.3 (2012): 568-576. link
- Ye, Kai, et al. "Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads." Bioinformatics 25.21 (2009): 2865-2871. link
- Cameron, Daniel L., et al. "GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing." bioRxiv (2021): 2020-07. link
- Quinlan, Aaron R., and Ira M. Hall. "BEDTools: a flexible suite of utilities for comparing genomic features." Bioinformatics 26.6 (2010): 841-842. link
- Cingolani, Pablo, et al. "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3." Fly 6.2 (2012): 80-92. link
- Wickham, Hadley. "ggplot2." Wiley Interdisciplinary Reviews: Computational Statistics 3.2 (2011): 180-185. link