BioKanga

BioKanga is an integrated toolkit of high performance bioinformatics subprocesses targeting the challenges of next generation sequencing analytics. Kanga is an acronym standing for 'K-mer Adaptive Next Generation Aligner'.

Why YAL (Yet Another Aligner)

Compared with other widely used aligners, BioKanga provides substantial gains in both the proportion and quality of aligned sequence reads at competitive or increased computational efficiency. Unlike most other aligners, BioKanga utilises Hamming distances between putative alignments to the targeted genome assembly for any given read as the discrimative acceptance criteria rather than relying on sequencer generated quality scores.

Another primary differentiator for BioKanga is that this toolkit can process billions of reads against targeted genomes containing 100 million contigs and totalling up to 100Gbp of sequence.

Toolset Components

The BioKanga toolset contains a number of subprocesses, each of which is targeting a specific bioinformatics analytics task. Primary subprocesses provide functionality for:

Generate simulated NGS datasets
Quality check the raw NGS reads to identify potential processing issues
Filter NGS reads for sequencer errors and/or exact duplicates
de Novo assemble filtered reads into contigs
Scaffold de Novo assembled contigs
Blitz local alignments
Generate index over genome assembly or sequences
NGS reads alignment-less K-mer derived marker sequences generation
NGS reads alignment-less prefix K-mer derived marker sequences generation
Concatenate sequences to create pseudo-genome assembly
Align NGS reads to indexed genome assembly or sequences
Scaffold assembly contigs using PE read alignments
Identify SSRs in multifasta sequences
Map aligned reads loci to known features
RNA-seq differential expression analyser with optional Pearsons generation
Generate tab delimited counts file for input to DESeq or EdgeR
Extract fasta sequences from multifasta file
Merge PE short insert overlap reads
SNP alignment derived marker sequences identification
Remap alignment loci
Locate and report regions of interest
Generate marker sequences from SNP loci
Generate SQLite Marker Database from SNP markers
Generate SQLite SNP Database from aligner identified SNPs
Generate SQLite DE Database from RNA-seq DE
Generate SQLite Blat alignment PSL database

Build and installation

Linux

To build on linux, clone this repository, run autoreconf, configure and make. The following example will install the biokanga toolkit to a bin directory underneath the user's home directory.

git clone https://github.com/csiro-crop-informatics/biokanga.git
cd biokanga
autoreconf -f -i
./configure --prefix=$HOME
make install

Alternatively, the binary built for the appropriate platform can be used directly.

Windows

To build on Windows, the current version requires Visual Studio 2015 or 2017 with build tools v140.

Open the biokanga.sln file in Visual Studio.
Under the Build menu, select Configuration Manager.
For Active solution platform, select x64.
The project can then be built. By default, executables will be copied into the Win64 directory.

Alternatively, the windows binaries can be used directly.

Documentation

Documentation for the core functionality of biokanga and pacbiokanga is available under the Docs directory.

Contributing

BioKanga is maintained by the Crop Bioinformatics and Data Science team at CSIRO in Canberra, Australia.

Contributions are most welcome. To contribute, follow these steps.

Fork biokanga into your own repository (more information)
Clone and enter the repository to your development machine
Checkout the dev branch
Make and checkout a new branch for your work (git checkout -b great-new-feature)
Make regular commits on your new branch
Push your branch back to your github repository (git push origin great-new-feature)
Create a pull request to the dev branch of the csiro-crop-informatics/biokanga repository (more information)
If you're work is related to an existing issue, refer to the issue in the pull request comment

Issues

Please report issues on the github project.

Authors

BioKanga has been developed by Dr Stuart Stephen, with contributions from other team member in CSIRO.

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
Adaptors		Adaptors
BEDFilter		BEDFilter
BEDMerge		BEDMerge
DNAseqSitePotential		DNAseqSitePotential
Docs		Docs
FastaToPE		FastaToPE
FindShortApproxMatches		FindShortApproxMatches
GFFfilter		GFFfilter
GTFfilter		GTFfilter
HammingDist		HammingDist
KangaRADSeq		KangaRADSeq
LocateROI		LocateROI
Loci2Phylip		Loci2Phylip
PEscaffold		PEscaffold
RNAFragSim		RNAFragSim
RNAseqSitePotential		RNAseqSitePotential
SSRdiscovery		SSRdiscovery
Script		Script
SimulateMNase		SimulateMNase
bed2csv		bed2csv
biokanga		biokanga
blast2csv		blast2csv
csv2bed		csv2bed
csv2fasta		csv2fasta
csv2feat		csv2feat
csv2sqlite		csv2sqlite
csv2stats		csv2stats
csvfilter		csvfilter
csvmerge		csvmerge
dmpbioseq		dmpbioseq
fasta2bed		fasta2bed
fasta2dist		fasta2dist
fasta2struct		fasta2struct
fastafilter		fastafilter
filterreads		filterreads
genDESeq		genDESeq
genElementProfiles		genElementProfiles
genGenomeFromAGP		genGenomeFromAGP
genNormWiggle		genNormWiggle
genWiggle		genWiggle
genalignconf		genalignconf
genalignloci2core		genalignloci2core
genalignref2relloci		genalignref2relloci
genalignstats		genalignstats
genbiobed		genbiobed
genbiomultialign		genbiomultialign
genbioseq		genbioseq
gencentroidmetrics		gencentroidmetrics
gencomposition		gencomposition
genelementseq		genelementseq
genhyperconserved		genhyperconserved
genhyperdropouts		genhyperdropouts
genloci2gene		genloci2gene
genmarkers		genmarkers
gennucstats		gennucstats
genpseudogenome		genpseudogenome
genrollups		genrollups
gensampler		gensampler
genseqcandidates		genseqcandidates
genstructprofile		genstructprofile
genstructstats		genstructstats
genultras		genultras
genzygosity		genzygosity
hooks		hooks
kangahrdx		kangahrdx
kangapr		kangapr
kangar		kangar
kangarg		kangarg
libBKPLPlot		libBKPLPlot
libbiokanga		libbiokanga
loci2dist		loci2dist
locmarkers		locmarkers
maploci2features		maploci2features
pacbiokanga		pacbiokanga
predconfnucs		predconfnucs
prednucleosomes		prednucleosomes
proccentroids		proccentroids
processcsvfiles		processcsvfiles
psl2csv		psl2csv
quickcount		quickcount
splitmultifasta		splitmultifasta
ufilter		ufilter
uhamming		uhamming
usimdiffexpr		usimdiffexpr
zlibstatic		zlibstatic
.travis.yml		.travis.yml
ChangeLog		ChangeLog
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile.am		Makefile.am
NEWS		NEWS
README.md		README.md
appveyor.yml		appveyor.yml
biokanga.sln		biokanga.sln
configure.ac		configure.ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioKanga

Why YAL (Yet Another Aligner)

Toolset Components

Build and installation

Linux

Windows

Documentation

Contributing

Issues

Authors

About

Releases 6

Packages

Contributors 4

Languages

License

csiro-crop-informatics/biokanga

Folders and files

Latest commit

History

Repository files navigation

BioKanga

Why YAL (Yet Another Aligner)

Toolset Components

Build and installation

Linux

Windows

Documentation

Contributing

Issues

Authors

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 4

Languages

Packages