Skip to content

Example genomics data for tool developers

License

Notifications You must be signed in to change notification settings

omgenomics/bio-data-zoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bio Data Zoo

This repo contains example data in various genomics file formats. It is intended for bioinformatics tool developers to make testing software easier. It includes examples of valid file formats, edge cases, and invalid formats.

Browse

Browse the data on 42basepairs: https://42basepairs.com/browse/r2/bio-data-zoo

Download

Download this repo as a zip file: https://github.com/omgenomics/bio-data-zoo/archive/refs/heads/main.zip

Formats included

Format Extensions
FASTA .fa, .fa.gz
FASTQ .fastq, .fastq.gz
BAM .bam, .bam.bai, .bam.csi, .sam, .sam.gz, .sam.gz.csi, .sam.gz.tbi
VCF .vcf, .vcf.gz, .vcf.gz.csi, .vcf.gz.tbi, .bcf, .bcf.csi
BED .bed, .bed.gz, .bed.gz.csi, .bed.gz.tbi
CRAM TODO: .cram, .crai, different CRAM versions
GFF TODO: .gff3, .gtf, .gff, .gff.gz, .gff.gz.tbi

Data Source

Path Source Preview file Download file
basic_R1.fastq s3://1000genomes Preview on 42basepairs Download
basic.bam s3://1000genomes Preview on 42basepairs Download
basic_multisample.vcf s3://human-pangenomics Preview on 42basepairs Download
basic.vcf s3://human-pangenomics Preview on 42basepairs Download
basic.bed s3://human-pangenomics Preview on 42basepairs Download

Contributing

See CONTRIBUTING docs.

About

Example genomics data for tool developers

Resources

License

Stars

Watchers

Forks