BioKanga is an integrated toolkit of high performance bioinformatics subprocesses targeting the challenges of next generation sequencing analytics. Kanga is an acronym standing for 'K-mer Adaptive Next Generation Aligner'.
Compared with other widely used aligners, BioKanga provides substantial gains in both the proportion and quality of aligned sequence reads at competitive or increased computational efficiency. Unlike most other aligners, BioKanga utilises Hamming distances between putative alignments to the targeted genome assembly for any given read as the discrimative acceptance criteria rather than relying on sequencer generated quality scores.
Another primary differentiator for BioKanga is that this toolkit can process billions of reads against targeted genomes containing 100 million contigs and totalling up to 100Gbp of sequence.
The BioKanga toolset contains a number of subprocesses, each of which is targeting a specific bioinformatics analytics task. Primary subprocesses provide functionality for:
- Generate simulated NGS datasets
- Quality check the raw NGS reads to identify potential processing issues
- Filter NGS reads for sequencer errors and/or exact duplicates
- de Novo assemble filtered reads into contigs
- Scaffold de Novo assembled contigs
- Blitz local alignments
- Generate index over genome assembly or sequences
- NGS reads alignment-less K-mer derived marker sequences generation
- NGS reads alignment-less prefix K-mer derived marker sequences generation
- Concatenate sequences to create pseudo-genome assembly
- Align NGS reads to indexed genome assembly or sequences
- Scaffold assembly contigs using PE read alignments
- Identify SSRs in multifasta sequences
- Map aligned reads loci to known features
- RNA-seq differential expression analyser with optional Pearsons generation
- Generate tab delimited counts file for input to DESeq or EdgeR
- Extract fasta sequences from multifasta file
- Merge PE short insert overlap reads
- SNP alignment derived marker sequences identification
- Remap alignment loci
- Locate and report regions of interest
- Generate marker sequences from SNP loci
- Generate SQLite Marker Database from SNP markers
- Generate SQLite SNP Database from aligner identified SNPs
- Generate SQLite DE Database from RNA-seq DE
- Generate SQLite Blat alignment PSL database
To build on linux, clone this repository, run autoreconf
, configure
and make
. The following example will install the biokanga toolkit to a bin
directory underneath the user's home directory.
git clone https://github.com/csiro-crop-informatics/biokanga.git
cd biokanga
autoreconf -f -i
./configure --prefix=$HOME
make install
Alternatively, the binary built for the appropriate platform can be used directly.
To build on Windows, the current version requires Visual Studio 2015 or 2017 with build tools v140.
- Open the
biokanga.sln
file in Visual Studio. - Under the Build menu, select Configuration Manager.
- For Active solution platform, select x64.
- The project can then be built. By default, executables will be copied into the
Win64
directory.
Alternatively, the windows binaries can be used directly.
Documentation for the core functionality of biokanga and pacbiokanga is available under the Docs
directory.
BioKanga is maintained by the Crop Bioinformatics and Data Science team at CSIRO in Canberra, Australia.
Contributions are most welcome. To contribute, follow these steps.
- Fork biokanga into your own repository (more information)
- Clone and enter the repository to your development machine
- Checkout the
dev
branch - Make and checkout a new branch for your work (
git checkout -b great-new-feature
) - Make regular commits on your new branch
- Push your branch back to your github repository (
git push origin great-new-feature
) - Create a pull request to the
dev
branch of the csiro-crop-informatics/biokanga repository (more information) - If you're work is related to an existing issue, refer to the issue in the pull request comment
Please report issues on the github project.
BioKanga has been developed by Dr Stuart Stephen, with contributions from other team member in CSIRO.