qcbc
is a python package to quality control synthetic barcode sequences for orthogonal sequencing-based assays such as:
The latest release can be installed with
pip install qcbc
The development version can be installed with
pip install git https://github.com/pachterlab/qcbc
Run qcbc
on your own barcode list
qcbc
consists of four subcommands:
$ qcbc
usage: qcbc [-h] [--verbose] <CMD> ...
qcbc 0.0.2: Format sequence specification files
positional arguments:
<CMD>
ambiguous find barcodes with shared subsequence
content compute base distribution (A,T,C,G counts/frequencies)
homopolymer
compute homopolymer distribution (length > 2)
pdist compute pairwise distance
volume compute size of barcode space
Barcode files are expected to contain both the barcode sequence and a name associated with the barcode, separated by a tab. For example
$ cat barcodes.txt
AGCAGTTACAG tag1
CTTGTACCCAG tag2
$ cat -t barcodes.txt
CATGGAGGCG^Itag1
AGCAGTTACAG^Itag2
Note that cat -t file.txt
converts <tabs>
into ^I
and can be used to verify that the file is properly setup.
Find barcodes that share subsequences of a given length.
qcbc ambiguous -l <length> <bc_file>
- optionally,
-rc
can be used to check the reverse complement of the subsequences. <length>
corresponds to the subsequence length used to evaluate ambiguity between barcodes.<bc_file>
corresponds to the barcode file.
# check ambiguous barcodes by subsequences of length 6
$ qcbc ambiguous -l 3 barcodes.txt
CAG tag1,tag1,tag2
TAC tag1,tag2
Compute the base distribution within each barcode.
qcbc content <bc_file>
- optionally, specify
-- frequency
to return the base distribution fraction - optionally, specify
--entropy
to return the entropy of the base distribution fraction relative to the max entropy. <bc_file>
corresponds to the barcode file.
$ qcbc content -e barcodes.txt
name seq ent
tag1 AGCAGTTACAG 0.67
tag2 CTTGTACCCAG 0.67
Find the number of homopolymers of length two or greater.
qcbc homopolymer <bc_file>
<bc_file>
corresponds to the barcode file.
$ qcbc homopolymer barcodes.txt
name seq homopolymer_length
tag1 AGCAGTTACAG 1,0,0,0,0,0,0,0,0,0
tag2 CTTGTACCCAG 1,1,0,0,0,0,0,0,0,0
Compute the pairwise hamming distance between barcodes.
qcbc pdist <bc_file>
- optionally,
-rc
can be used to check the reverse complement of the subsequences. <bc_file>
corresponds to the barcode file.
$ qcbc pdist barcodes.txt
AGCAGTTACAG tag1 CTTGTACCCAG tag2 8.0
Compute the fraction of barcode space occupied by the given barcodes.
qcbc volume <bc_file>
<bc_file>
corresponds to the barcode file.
$ qcbc volume barcodes.txt
2 out of 4,194,304 possible unique barcodes representing 0.0000%
Thank you for wanting to improve qcbc
. If you have a bug that is related to qcbc
please create an issue. The issue should contain
- the
qcbc
command ran, - the error message, and
- the
qcbc
and python version.
If you'd like to add assays sequence specifications or make modifications to the qcbc
tool please do the following:
- Fork the project.
# Press "Fork" at the top right of the GitHub page
- Clone the fork and create a branch for your feature
git clone https://github.com/<USERNAME>/qcbc.git
cd qcbc
git checkout -b cool-new-feature
- Make changes, add files, and commit
# make changes, add files, and commit them
git add path/to/file1.py path/to/file2.py
git commit -m "I made these changes"
- Push changes to GitHub
git push origin cool-new-feature
- Submit a pull request
If you are unfamilar with pull requests, you find more information on the GitHub help page.