Daniel Baker dnbaker

Hi, I'm Daniel 👋

Software Engineer at Roche. Previously, I was a Senior Scientist at Pacific Biosciences (PacBio) after earning my PhD at Johns Hopkins University in the department of Computer Science. Before that I was a Bioinformatics Scientist at ARUP Laboratories, where I worked on cell-free circulating tumor DNA (ctDNA) analysis and clinical genomics after my training in Physics [BS] and Biophysics/Computational Biology [MS]. I've worked with biological data (sequence, molecular modeling, metabolomics, transcriptomics, metagenomics), telecommunications data, as well as graph algorithms, machine learning, and numerical optimization.

🔭 I've worked on similarity search, and clustering, and indexing for large-scale biological data, simd/gpu-accelerated and randomized algorithms. Most recently, I've been developing methods for human genetics, including long RNA-seq, VNTRs, and haplotype phasing.

😄 Pronouns: He/Him/His

A quick tour of my interests

Practical randomized algorithms

This ranges from libraries providing sketch data structures and coresets, as well as projects using random projections and DCI.

My work on coresets and clustering is primarily part of the minicore project, with the aims of providing a standard utility for coreset construction and weighted clustering, especially for exponential family models and shortest-paths metrics.

Computational Biology

The bonsai project provides methods for metagenomic analysis, along with k-mer encoding/decoding and I/O, while the Dashing performs scalable sketching and comparison of sequence data.

BMFtools performs molecular demultiplication over sequencing barcoded data, reducing error rates while eliminating redundant information. Designed for ctDNA, this method can reduce error rates by orders of magnitude, allowing confident detection of very rare events.

scavenger has rust implementations using tch-rs for VAEs for count-based data, applied to single-cell transcriptomics.

I also co-developed pbfusion, a fast tool for characterizing transcriptional abnormalities.

General C

Most of my projects fall into this category, serving as tools I can reuse in various projects.

Some of my favorites:

vec provides type-generic abstractions over x86-64 vectorization, making it easy to write fast, portable code.
kspp is an RAII-based variant of kstring from klib with extra niceties making appending printf-style formatting easy.
aesctr provides STL-style random number generators built on fast aes-ctr and wyhash
circularqueue provides a range-based circular queue container that uses power-of-two sizes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daniel Baker dnbaker

Achievements

Achievements

Block or report dnbaker

Hi, I'm Daniel 👋

A quick tour of my interests

Pinned Loading