SRA_Analysis_WDL

An Auto Pipeline
Robustness
Designed for Bio-os

Here, we need a script, a program or an other things, to meet our need.

What we need?

We have a platform built for fetch raw sequencing data to our system. Once the data is under our gover, we will get our pipline on woring. Given the complexity of the situation, our tools should be packaged as stand-alone toolkits, or they should take advantage of infrastructure that is readily available.

What we have,now

10X Cellranger count WDL
10X Cellranger ATAC count WDL
10X Cellranger VDJ WDL
10X Spaceranger WDL
10X Cellranger multi WDL (for GEX VDJ-T/VDJ-B or both of them)
SeqWell & Drop-seq & BD WDL (STARsolo)
SMART-seq WDL (STARsolo, too) Praise the god of STAR
Dockers at here: https://hub.docker.com/repositories/ooaahhdocker

Update

2024.10.10 : let's have a try

Try using rust as an encapsulation for the command part of the wdl, replacing python. The results are here: _SRAtoFastqgz/2.0_rust.

I'm looking forward to this being a start, a start to be able to judge the type of vdj files (or any other files) quickly.

2024.10.9 : ATTENTON!

All of those images' name should be replaced as followed.

registry-vpc.miracle.ac.cn/gznl/
- registry-vpc.miracle.ac.cn/gznl/ooaahhdocker/ooaahhdocker/python_pigz:1.0
- registry-vpc.miracle.ac.cn/gznl/ooaahhdocker/py39_scanpy1-10-1
- registry-vpc.miracle.ac.cn/gznl/ooaahhdocker/starsolo2:3.0
- registry-vpc.miracle.ac.cn/gznl/ooaahhdocker/starsolo2:2.0
- registry-vpc.miracle.ac.cn/gznl/python:3.9.19-slim-bullseye

2024.5.14 : Function added

For cellranger count WDL, updated naming conventions of h5ad files.
- filtered_feature_bc_matrix.h5ad convert to ~{sample}_filtered_feature_bc_matrix.h5ad

2024.5.11 : Refining Code Logic

When handling the extraction of SRA files to fastq files, an issue with file attribution was encountered, making it difficult to accurately determine the correct naming of the extracted fastq files. Solution: The R1 data contains a large number of duplicate sequences composed of barcodes and UMIs. When performing high compression ratio file compression, the size of the R1 data file should be smaller than that of R2.
Simply reverse the order of the file compression and renaming logic.

2024.5.9 : multi need to set NA as []

Array[File] Cannot accept input with an empty string, use [] as insted.

2024.5.4 : Updated naming logic for files

The extent of the impact "SRA > fastq.gz"

2024.4.28 : Added unplanned WDL files

10X Cellranger multi WDL

2024.4.28 : Bugs fix

For VDJ files(SRA), we have to use parameters: "--split-file combined with --include-technologies".
ps. For SpaceRanger, we need to use parameters --split-3. Therefore, in the case of 10X, we need to choose the appropriate workflow for the specific situation.

2024.4.26 : Function added

For local fastq files, I had added cellranger_singleFile.wdl.

2024.4.23 : Function added

Increased the output of h5ad&bam files as much as possible.

2024.4.22 : Added STARsolo WDL files, which could used in BD&SeqWell&Dropseq, without umitools.

ps. Set --soloBarcodeReadLength=0 to skip the barcode and umi checks.
Docker pull: ooaahhdocker/starsolo2:3.0, with python3.9/scanpy1.10.1/star2.7.11 inside.
Attention!
- To make the agreement between STARsolo and CellRanger even more perfect, you can add
args_dict['--genomeSAsparseD'] = ['3']
- CellRanger 3.0.0 use advanced filtering based on the EmptyDrop algorithm developed by Lun et al. This algorithm calls extra cells compared to the knee filtering, allowing for cells that have relatively fewer UMIs but are transcriptionally different from the ambient RNA. In STARsolo, this filtering can be activated by:
args_dict['--soloCellFilter'] =['EmptyDrops_CR']

2024.4.16 : Must come with full image information, slide number, etc.

For spaceranger, complete image information is a must, and the data provided by some authors is incomplete.

2024.4.12 : The technical roadmap has been updated, and sra files are now reused using fasterq-dump

Docker pull: ooaahhdocker/python_pigz:1.0 with python3.9/pigz, which meet fastq file to fastq compressed file fast implementation.

2024.4.11 : Resolving compatibility issues

Lower versions of cellranger(2.9.6) are unable to handle newer 10X scRNA-seq data.
Added a way to externally import the cellranger package

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
_CellRanger_ATAC		_CellRanger_ATAC
_CellRanger_VDJ		_CellRanger_VDJ
_CellRanger_count		_CellRanger_count
_CellRanger_multi		_CellRanger_multi
_SRAtoFastqgz		_SRAtoFastqgz
_SeqWell&DropSeq&BD		_SeqWell&DropSeq&BD
_SmartSeq_manifest_file		_SmartSeq_manifest_file
_SpaceRanger		_SpaceRanger
docs		docs
docs_rtd		docs_rtd
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRA_Analysis_WDL

What we need?

What we have,now

Update

2024.10.10 : let's have a try

2024.10.9 : ATTENTON!

2024.5.14 : Function added

2024.5.11 : Refining Code Logic

2024.5.9 : multi need to set NA as []

2024.5.4 : Updated naming logic for files

2024.4.28 : Added unplanned WDL files

2024.4.28 : Bugs fix

2024.4.26 : Function added

2024.4.23 : Function added

2024.4.22 : Added STARsolo WDL files, which could used in BD&SeqWell&Dropseq, without umitools.

2024.4.16 : Must come with full image information, slide number, etc.

2024.4.12 : The technical roadmap has been updated, and sra files are now reused using fasterq-dump

2024.4.11 : Resolving compatibility issues

About

Releases

Packages

Languages

rnacentre/WDL_Tools

Folders and files

Latest commit

History

Repository files navigation

SRA_Analysis_WDL

What we need?

What we have,now

Update

2024.10.10 : let's have a try

2024.10.9 : ATTENTON!

2024.5.14 : Function added

2024.5.11 : Refining Code Logic

2024.5.9 : multi need to set NA as []

2024.5.4 : Updated naming logic for files

2024.4.28 : Added unplanned WDL files

2024.4.28 : Bugs fix

2024.4.26 : Function added

2024.4.23 : Function added

2024.4.22 : Added STARsolo WDL files, which could used in BD&SeqWell&Dropseq, without umitools.

2024.4.16 : Must come with full image information, slide number, etc.

2024.4.12 : The technical roadmap has been updated, and sra files are now reused using fasterq-dump

2024.4.11 : Resolving compatibility issues

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages