パッケージ: deblur (1.1.1-3)
deconvolution for Illumina amplicon sequencing
Deblur is a greedy deconvolution algorithm for amplicon sequencing based on Illumina Miseq/Hiseq error profiles. The authors recommend using Deblur via the QIIME2 plugin q2-deblur. Examples of its use can be found within the plugin itself. However, Deblur itself does not depend on QIIME2.
The input to Deblur workflow is a directory of FASTA or FASTQ files (1 per sample) or a single demultiplexed FASTA or FASTQ file. These files can be gzip'd. The output directory will contain three BIOM tables in which the observation IDs are the Deblurred sequences. The outputs are contingent on the reference databases used and a more focused discussion on them is in the subsequent README section titled "Positive and Negative Filtering." The output files are as follows:
* reference-hit.biom : contains only Deblurred reads matching the positive filtering database. By default, a reference composed of 16S sequences is used, and this resulting table will contain only those reads which recruit at a coarse level to it will be retained. Reads are also filtered against the negative reference, which by default will remove any read which appears to be PhiX or adapter.
* reference-hit.seqs.fa : a fasta file containing all the sequences in reference-hit.biom
* reference-non-hit.biom : contains only Deblurred reads that did not align to the positive filtering database. Negative filtering is also appied to this table, so by default, PhiX and adapter are removed.
* reference-non-hit.seqs.fa : a fasta file containing all the sequences in reference-non-hit.biom
* all.biom : contains all Deblurred reads. This file represents the union of the "reference-hit.biom" and "reference-non-hit.biom" tables.
* all.seqs.fa : a fasta file containing all the sequences in all.biom
Deblur uses two types of filtering on the sequences:
* Negative mode - removes known artifact sequences (i.e. sequences aligning to PhiX or Adapter with >=95% identity and coverage).
* Positive mode - keeps only sequences similar to a reference database (by default known 16S sequences). SortMeRNA is used, and any sequence with an e-value <= 10 is retained. Deblur also outputs a BIOM table without this positive filtering step (named all.biom).
The FASTA files for both of these filtering steps can be supplied via the --neg-ref-fp and --pos-ref-fp options. By default, the negative database is composed of PhiX and adapter sequence and the positive database of known 16S sequences.
Deblur uses negative mode filtering to remove known artifact (i.e. PhiX and Adapter sequences) prior to denoising. The output of Deblur contains three files: all.biom, which includes all sOTUs, reference-hit.biom, which contains the output of positive filtering of the sOTUs (default only sOTUs similar to 16S sequences), and reference-non-hit.biom, which contains only sOTUs failing the positive filtering (default only non-16S sOTUs).
その他の deblur 関連パッケージ
|
|
|
|
-
- dep: python3
- interactive high-level object-oriented language (default python3 version)
-
- dep: python3-biom-format
- Biological Observation Matrix (BIOM) format (Python 3)
-
- dep: python3-click
- Command-Line Interface Creation Kit - Python 3.x
-
- dep: python3-h5py (>= 2.2.0)
- general-purpose Python interface to hdf5
-
- dep: python3-numpy
- Fast array facility to the Python language (Python 3)
-
- dep: python3-scipy
- scientific tools for Python 3
-
- dep: python3-skbio
- Python3 data structures, algorithms, educational resources for bioinformatic