Summary

[TOC]

Summary

It's a library to generate analysis pipeline by simple yaml-format config file. The command will generate a bash script which could run in anyway you want and as a history log.

Basic usage

To use this library, you have to to these steps:

clone this library to your local, by git clone
clone another repo named small_tools_of_bioinformatics, which server for this repo, by git clone .
add a envrionment variable $SCRIPTS= "the repo path". You can add this in .bashrc like below, with you own path instead:

export SCRIPTS="/home/user/scripts/small_tools_of_bioinformatics"
generate a config file you need, which is already created by author. Take chipseq.yamal as example
```
python tdas.py generate -n "configname" -o "you destination dir"
```
generate a seqpair table, for example, try.seqpair.table

seqpair table is a special table defined here, it describes you raw info. It's character-delimter table, with define in the config yaml you choose
Generally, it should contain four columns: project(row name);idir(where you input file is);iseq1(the name of R1 in pair-end seq. can be others if you define it in the yaml);iseq2(R2 fastq.gz name)
finally, you can get output we want, by command below
```
python tdas.py cmd -c chipseq.yaml -bf try.seqpair.table 
```
by the command, you will get a dir created with name defined in the config yaml, and a bash script is in the dir, you can treat it in anyway you want. Enjoy your time~

Advanced

config format

config file here used is a yaml format file.
It consists of two parts: head info and process body:

In the head part, it defines how your seqpair table look like, order to process each record, and how many config part you can use
In the process body, many config part is defined detail.

Config part has three parts: header, workflow, and stat functions. header part about output info, process order

work flow part defined steps used in head. Each steps will apply to every record in seqpair table, if the step is used in head order module.

stats function part define some post-process steps, such as basic QC summmarise or file-info summary.
Or you can do anythings you want.

self-defined functions

For some complex functions which can't describe by yaml config file, you can add you own process function within the frame we provided here. see spfunctions.py.
**Don't forget add self-defined functions to the bottom dicts! **

add you own keywords

Key words are key-items in the yaml config file. We define the process in Block.py.
It's a complex work so not suggested. But you are welcome to do this if you really to do so. I am willing to provide helps as your request!

Any problem, contact me: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
TDAS		TDAS
readme.md		readme.md
tdas.py		tdas.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Basic usage

Advanced

config format

self-defined functions

add you own keywords

About

Releases

Packages

Languages

yang-dongxu/totalDataAnalysisWorkflow

Folders and files

Latest commit

History

Repository files navigation

Summary

Basic usage

Advanced

config format

self-defined functions

add you own keywords

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages