Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
cokelaer committed Dec 26, 2017
2 parents 9fe02d9 16fa95b commit f7c1c1b
Show file tree
Hide file tree
Showing 33 changed files with 2,151 additions and 97 deletions.
10 changes: 10 additions & 0 deletions doc/Changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 12,17 @@ Changelog
* BUGS:

* Fix regression bug (https://github.com/sequana/sequana/issues/484)
* Fix missing N_final column in table of the quality_control multi-summary
page
* Remove phix174.fa requirements in RNAseq pipeline config file
* Fix path starting with tilde (https://github.com/sequana/sequana/issues/486)

* NEWS:

* add isoseq Class
* add vcf_filter module back to help in filtering VCF files created with
mpileup for instance
* add sequana_vcf_filter standalone

0.6.1
~~~~~~~~~~
Expand Down
15 changes: 5 additions & 10 deletions doc/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -616,17 616,12 @@ and then used as::
singularity exec sequana-sequana-master.img sequana_coverage --help

One issue here is that we should push on master only when we are sure that the
repository is not bugged. Since people may use the container, we should no
remove them... but the will be lots of containers.
repository is not bugged. Since people may use the container, we should not
remove them... but there will be lots of containers.

Instead, we could create a specific tag e.g. release_0_5_2 , activate the
branches in the singularity hub page, push an empty commit and disable that
branch. In doing so, we have a unique container for the release 0_5_2
Instead, we could create a specific tag e.g. release_0_5_2, activate the
branch in the singularity hub page, push an empty commit and disable that
branch. In doing so, we have a unique container for the release 0_5_2.


======= ========= ==================
branch version sequana version
======= ========= ==================
0_5_2 3253 0.5.2
======= ========= ==================

4 changes: 2 additions & 2 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 10,7 @@ Sequana documentation


<a href="http://bioconda.github.io/recipes/sequana/README.html">
<img src="https://img.shields.io/badge/install with-bioconda-brightgreen.svg?style=flat-square">
<img src="https://img.shields.io/badge/install with-bioconda-brightgreen.svg?style=flat-square"></a>

<a href="https://pypi.python.org/pypi/sequana">
<img src="https://badge.fury.io/py/sequana.svg"></a>
Expand Down Expand Up @@ -39,7 39,7 @@ Sequana documentation
:Source: See `http://github.com/sequana/sequana <https://github.com/sequana/sequana/>`_.
:Issues: Please fill a report on `github <https://github.com/sequana/sequana/issues>`_
:How to cite: For Sequana in general including the pipelines, please use
the `JOSS DOI (10.21105/joss.00352) <http://www.doi2bib.org/#/doi/10.21105/joss.00352>`_
the `JOSS DOI (10.21105/joss.00352) <http://www.doi2bib.org/bib/10.21105/joss.00352>`_


For the **genome coverage** tool (sequana_coverage): Dimitri Desvillechabrol,
Expand Down
14 changes: 7 additions & 7 deletions doc/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 10,8 @@ available on **bioconda**. Note, however, that
releases of Sequana are also available on Pypi so you could also use **pip**.

If you just want to test **Sequana** or **Sequanix** or one of the Sequana
standalone, we have started to provide **Singularity** containers since version
0.5.2. This is a great solution for reproducibility as well. Containers are
standalone, we also provide **Singularity** containers. This is a great
solution for reproducibility as well. Containers are
available on https://singularity-hub.org/collections/114/.


Expand All @@ -20,7 20,7 @@ Overview of installation methods

We support 3 types of installations:

#. Singularity. Strictly speaking, there is no compilation. This method is for testing and production. It downloads an image / container that is ready-to-use::
#. Singularity (tested with version 2.4.2) . Strictly speaking, there is no compilation. This method is for testing and production. It downloads an image / container that is ready-to-use::

singularity pull --name sequana.img shub://sequana/sequana
singularity shell sequana.img
Expand Down Expand Up @@ -160,10 160,10 @@ environment) and would work for Linux users only for the time being. The main
reason being that under Mac and windows a virtualbox is used by Singularity
preventing a X connection. This should be solved in the near future.

First, install singularity (http://singularity.lbl.gov/). For example ,here is
how to download version 2.4.0 and install::
First, install singularity (http://singularity.lbl.gov/). You must use at least
version 2.4. We tested this recipe with version 2.4.2 (Dec 2017)::

VERSION=2.4
VERSION=2.4.2
wget https://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.tar.gz
tar xvf singularity-$VERSION.tar.gz
cd singularity-$VERSION
Expand All @@ -177,7 177,7 @@ Second, download a Sequana image. For instance, for the latest master version::

or for the release 0.6.1::

singularity pull --name sequana.img shub://sequana/sequana@release_0_6_1
singularity pull --name sequana_0_6_1.img shub://sequana/sequana@release_0_6_1


Do not interrupt the download (1.5Go). Once downloaded,
Expand Down
9 changes: 2 additions & 7 deletions environment_rtd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 11,9 @@ dependencies:
- numpy
- pandas
- matplotlib
- scipy
- pysam
- krona
- kraken
- pillow
- sphinx
- pysam==0.12.0.1
- reports
- easydev>=0.9.34
- easydev>=0.9.36
- bioservices
- pip
- shustring
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 3,7 @@ bioservices>=1.4.14
biokit>=0.4.1
colorlog
docutils
easydev>=0.9.34
easydev>=0.9.36
matplotlib>=2.0.0
mock
multiqc==1.0
Expand Down
2 changes: 1 addition & 1 deletion requirements_pipelines.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 23,4 @@ snpeff
spades
star
subread
multiqc
multiqc==1.0
17 changes: 8 additions & 9 deletions sequana/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 5,14 @@
except:
version = ">=0.20.0"

import colorlog as logger
def sequana_debug_level(level="WARNING"):
"""A deubg level setter at top level of the library"""
assert level in ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]
logging_level = getattr(logger.logging.logging, level)
logger.getLogger().setLevel(logging_level)

try:
from easydev.logging_tools import Logging
logger = Logging("sequana", "WARNING")
except:
import colorlog
logger = colorlog.getLogger("sequana")



from easydev import CustomConfig
Expand Down Expand Up @@ -53,6 55,3 @@ def _download_biokit_taxon():
tt._load_flat_file()
try: _download_biokit_taxon()
except: pass



108 changes: 108 additions & 0 deletions sequana/busco.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 109,111 @@ def download(self, uncompress=True):
execute("rm -f %s" % ( target))


class BuscoAnalysis(object):

def __init__(self, bin_path=None, config_path=None, species=None,
sample_name=None, outpath=None):
"""
:param bin_path: path to busco binary file.
:param config_path: is the path to augustus config path. See example
in sequana/resources/busco
:param species: one of busco dataset (e.g. bacteria_odb9)
:param sample_name: prefix used for all files
:param outpath: main output path
mode = config['busco']['mode_choice'],
options = config['busco']['options'],
wkdir = __busco__workdir,
species = config['busco']['species_choice'],
If bin_path or config_path are none, uses CONDA_PREFIX or
CONDA_ENV_PATH.
"""
assert outpath is not None
assert sample_name is not None
assert species is not None

import os
#sample_name = params.wkdir.split(os.sep)[0]
if bin_path is None:
# try conda
try:
self.conda_bin_path = os.environ['CONDA_PREFIX'] os.sep "bin"
except:
self.conda_bin_path = os.environ['CONDA_ENV_PATH'] os.sep "bin"
else:
self.conda_bin_path = bin_path

if config_path is None:
try:
self.augustus_config_path = os.environ['CONDA_PREFIX'] os.sep "config"
except KeyError:
self.augustus_config_path = os.environ['CONDA_ENV_PATH'] os.sep "config"
else:
self.augustus_config_path = config_path

self.species = species # TODO make it an attribute that calls set_config
self.sample_name = sample_name
self.outpath = outpath


def _set_config(self):
config = BuscoConfig(
self.species,
sample_name=self.sample_name,
outpath=self.outpath,
conda_bin_path=self.conda_bin_path,
Rscript_bin_path="", # not required by our analysis
tmp_path="./tmp_{}".format(self.sample_name)
)
from easydev import mkdirs
mkdirs(self.outpath)
self.config_filename = self.outpath "/config.ini"
config.save_config_file(self.config_filename)

def run(self, infile, threads=4, mode="genome"):
"""
mode_choice: genome, transcriptomics, proteins
species: name of a BUSCO dataset. (use Sequanix to get the list)
options: any options understood by busco
"""
self._set_config()
# -f to force overwritten existing files
cmd = "export BUSCO_CONFIG_FILE={};".format(self.config_filename)
cmd = "export AUGUSTUS_CONFIG_PATH={};".format(self.augustus_config_path)
cmd = "run_busco -i {infile} -m {mode} -c {threads} -f 2>>{log};"
cmd = cmd.format(**{"infile":infile, 'mode':mode, "threads":threads, "log":"log"})

# Note that execute() does not keep the shell info
from easydev import execute as shell
from snakemake import shell
shell(cmd)
shell("rm -rf ./tmp_{}".format(self.sample_name))


def multirun(self, infile, threads=4, mode="genome", species=[]):
"""
datasets = BuscoDownload().filenames
res = b.multirun("canu_test10X.contigs.fasta", threads=1,
species=datasets)
"""
res = {}
buffer_ = self.species
for this in species:
self.species = this
try:
self.run(infile, threads=threads, mode=mode)
from sequana.assembly import BUSCO
ab = BUSCO("{}/run_{}/full_table_{}.tsv".format(
self.outpath, self.sample_name, self.sample_name))
res[this] = ab.score
except:
pass
self.species = buffer_

return res
24 changes: 15 additions & 9 deletions sequana/cigar.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 39,22 @@ class Cigar(object):
Possible CIGAR types are:
- "M" for MATCH (0)
- "I" for Insertion (1)
- "D" for deletion 2
- "N" for reference skipped 3
- "S" for soft clipping 4
- "H" for hard clipping 5
- "P" for padding
- "M" for alignment MATCH (0)
- "I" for Insertion to the reference (1)
- "D" for deletion from the reference 2
- "N" for skipped region from the reference 3
- "S" for soft clipping (clipped sequence present in seq) 4
- "H" for hard clipping (clipped sequence NOT present in seq) 5
- "P" for padding (silent deletion from padded reference)
- "=" for equal
- "X" for diff
- "B" for back
- "X" for diff (sequence mismatched)
- "B" for back !!!! could be also NM ???
!!! BWA MEM get_cigar_stats returns list with 11 items
Last item is
!!! what is the difference between M and = ???
Last item is I S X
!!! dans BWA, mismatch (X) not provided... should be deduced from last item - I - S
.. note:: the length of the query sequence based on the CIGAR is calculated
by adding the M, I, S, =, or X and other operations are ignored.
Expand Down
10 changes: 7 additions & 3 deletions sequana/fastq.py
Original file line number Diff line number Diff line change
@@ -1,6 1,4 @@
"""Utilities to manipulate FASTQ and Reads
"""
"""Utilities to manipulate FASTQ and Reads"""
import io
import time
import zlib
Expand Down Expand Up @@ -686,6 684,7 @@ def _to_fasta(self, output_filename="test.fasta", level=6, CHUNKSIZE=65536):
if tozip is True: self._gzip(output_filename)
"""

def filter(self, identifiers_list=[], min_bp=None, max_bp=None,
progressbar=True, output_filename='filtered.fastq', remove=True):
"""Filter reads
Expand Down Expand Up @@ -782,6 781,11 @@ def to_krona(self, k=7, output_filename="fastq.krona"):
letters = "\t".join([x for x in index.decode()])
fout.write("%s\t" % count letters "\n")

def stats(self):
self.rewind()
data = [len(read['sequence']) for read in self]
return {"mean_read_length": pylab.mean(data), "N": len(data)}

def __eq__(self, other):
if id(other) == id(self):
return True
Expand Down
6 changes: 6 additions & 0 deletions sequana/freebayes_bcf_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 186,10 @@ def _filter_line(self, bcf_line):
Return line if all filters are passed.
"""
# SRF="Number of reference observations on the forward strand
# SRR="Number of reference observations on the reverse strand
# SAF="Number of alternate observations on the forward strand
# SAR=Number of alternate observations on the reverse strand
if bcf_line.qual < self.filters["freebayes_score"]:
return False

Expand Down Expand Up @@ -285,6 289,7 @@ def strand_ratio(number1, number2):
return 0
return division


def compute_freq(bcf_line):
""" Compute frequency of alternate allele.
alt_freq = Count Alternate Allele / Depth
Expand All @@ -295,6 300,7 @@ def compute_freq(bcf_line):
for count in bcf_line.info["AO"]]
return alt_freq


def compute_strand_bal(bcf_line):
""" Compute strand balance of alternate allele include in [0,0.5].
strand_bal = Alt Forward / (Alt Forward Alt Reverse)
Expand Down
Loading

0 comments on commit f7c1c1b

Please sign in to comment.