Lately, Smakemake become a favorite workflow management system for most in the bioinformatics community. This repo can serve as a base to start adding rules/modules as per diffrent workflow requirements.
Basic read QC step is added which is start point for most of NGS analysis. It accepts both paired-end and single-end reads in fastq(.fq
) format as mentioned in units.tsv file.
Use of conda and snakemake wrappers/APIs making it really easy to configure tool requirements, so no need to setup individual tools.
Insatall Snakemake>=5.7.0
in a global enviroment using pip3
pip3 install snakemake
or make an isolated enviroment using conda and activate it.
conda create -c bioconda -c conda-forge -n snakemake snakemake=5.7
conda activate snakemake
git clone https://github.com/codingene/snakemake-base.git
cd snakemake-base
snakemake -n
snakemake --use-conda --cores 10
It should produce an qc/multiqc.html
report on current direcotry.
If you working on a server open the html with following
From current directory run
python -m http.server 8000
browse html file with
http://0.0.0.0:8000/qc/multiqc.html
A specific workflow can be created added by adding rules/modules (.smk files). For example see here for alignment.
Take advantage of followings to write snakemake files.
Some commonly used tool can be called directly without writing the full syntax.
Also the advantage, it will automatically download the corresponding tool wrapper with --use-conda
flag.
In details - The Snakemake Wrappers repository doc
Snakemake give some API functionality to make life easy to deal with common workflow problems.
In details - Snakemake-API reference doc
Some addtional utils from snakemake.
In details - Snakemake Utils doc
The best practices in writing snakemake workflows are taken from snakemake-workflows.