LatchBio reposted this
Because many classes of analysis problems in biology involve searching, manipulating or otherwise moving around large amounts of file data, progress in systems programming will have a large impact on therapeutics development and basic biology research alike. But while the pace of data generation from experimental techniques is increasing, there are actually few assays where the computational step is the bottleneck in the end to end research workflow. In industry, high quality open source tools and widespread access to cloud infrastructure are often sufficient to distill GBs of raw data into small tables of values and plots with clear biological interpretation. If we consider bulk RNA-seq and genetic sequencing as two representative examples, both have well-maintained aligners with strong consensus amongst computational biologists. So even when the data size becomes quite large, as deep WGS can yield raw files 100s of GBs in size, the use of computers is a small component of a workflow dominated by the wet lab Spatial single cell epigenetics is one such experimental technique where the existing software tools simply break and computational steps are rate limiting towards interpretable results. This assay provides the biomolecular state of tissue with geometric structure, building beautiful pictures of cell state progression, gene programs and mechanisms that elucidate disease pathogenesis and basic biology. Unfortunately, the multi-modal nature of spatial epigenetics and the volume of data generated from modern kits is actually breaking the fragmented ecosystem of tools that has emerged to process it. Most analysis workflows of this data type lean heavily on core objects and operations defined in the Seurat or Scanpy packages, popular frameworks for the exploratory analysis of single cell data. These codebases were born in academic biology labs and started as small Python and R projects for the early days of single cell biology. Spatial assays routinely generate >1M cells worth of sequencing data and this amount is doubling every few years as assay developers are shrinking the feature size on tissue slides with parallels to semiconductor manufacturing decades prior. Working with this volume of single cell data is very difficult and reveals the seams in the existing academic codebases. We are sitting down with AtlasXomics Inc., a talented group of scientists and engineers that are building these kits. If you are a strong programmer and want to make a contribution to the pace of experimental biology in practice, this is a good place to look. If you are building a biotech platform, or working in the ivory tower, want to understand what state-of-the-art spatial workflows look like and the rich biology they uncover, please tune in. https://lnkd.in/gEVjCmG2