Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Summary

Part of the Genome in a Bottle Consortium hosted by NIST dedicated to authoritative characterization of benchmark cancer genomes. The whole genome characterization in this project complements NIST's current DNA copy number Reference Materials for HER2, as well as EGFR and MET.  Sign up for General GIAB and Analysis Team email lists.

Updates September 2024:

  1. HG008 currently available data includes twenty whole genome-scale datasets from thirteen collaborating laboratories representing thirteen distinct measurement technologies, and more is coming.
  2. Preprint describing the current HG008 Dataset, "Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair for somatic benchmarks," is now available at https://doi.org/10.1101/2024.09.18.613544.
  3. Tumor cell line (HG008-T) received at NIST, and NIST is exploring differences between passages and developing single cell clones of HG008-T.

Interested in job opportunities or collaborations with us? Contact justin.zook [at] nist.gov (Justin Zook).
Click here for the GIAB FAQ

Description

GIAB Logo

Goals:

This project is an extension of the Genome in a Bottle Consortium to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of cancer genome sequencing to clinical practice and innovations in technologies. The priority of GIAB is authoritative characterization of human genomes for use in benchmarking, including analytical validation and technology development, optimization, and demonstration.

Reference Samples:

NIST has been collaborating with Andrew Liss at MGH to develop new tumor cell lines with paired normal samples that are explicitly consented for fully public dissemination of genomic data and cell lines. The first tumor cell line (HG008-T) is from a pancreatic ductal adenocarcinoma, for which we have paired normal pancreatic (HG008-N-P) and duodenal tissue (HG008-N-D) for sequencing, but no normal cell line. We currently are collecting extensive genomic data described below, and are working towards making these cell lines available in public repositories. We plan to have another pancreatic tumor cell line with a paired normal cell line in the near future, but these are still under development. We also welcome additional collaborations for tumor and normal cell line pairs that are explicitly consented for fully public dissemination of genomic data and cell lines.

Benchmark (or "High-confidence") Variant Calls and Regions:

We are working with the GIAB community to develop benchmark variants for the tumor and normal samples, using assembly-based and mapping-based approaches. We welcome collaborations in this new project.

Whole Genome Scale Data:

Starting in the Fall 2023, we began collecting  a diverse set of whole genome scale measurements for the GIAB HG008 samples (Figure 1).  We are making the data public, without embargo, as we collect them.  The data being collected is described in Table 1.   We welcome collaborations to analyze these data. 

This figure shows the various genome scale measurements made for the HG008 samples.  Broadly, these measurements include short-read, long-read, single cell and Hi-C whole genome sequencing as well as cytogenetic analysis and whole genome mapping.
Credit: Jennifer McDaniel

Figure 1  HG008 whole genome scale measurement technologies

Data Access:

These contributed data can be accessed through the public GIAB FTP as it becomes available.  Data can also be browsed through 42basepairs (https://42basepairs.com/browse/web/giab/data_somatic/HG008/), which allows for high level exploration and preview of the sequencing data. 

For navigating the available data, we provide the Cancer GIAB Data Manifest, which allows for exploration of the tumor and normal data currently available on the FTP.  If you are interested in exploring the manifest, you can create a filter view by 1) selecting the entire spreadsheet 2) Data → filter views → create new filter view.  Please note tumor data collected from year 1 (2022) is from a prior passage of tumor cells and is emphasized in RED.  Most of the tumor data being collected is from a single batch of tumor cells known as 0823p23.  Please take these passages into consideration when choosing tumor datasets you are interested in exploring.

Table 1  Available datasets for GIAB HG008 tumor and normal samples

HG008 datasets that are available on the GIAB FTP have been QC'd and are noted by estimated coverage and read lengths.  The Dataset ID corresponds to those in the Cancer GIAB Data Manifest Coverage estimates reflect the expected coverage of diploid regions of the normal and tumor samples, assuming no whole genome duplication has occurred. Note that a majority of tumor data comes from a large batch of cells (batch 0823p23), but some data are from other passages of the cell line (* batch 0823p23, ** 2022 passages,  ***2024 NIST passage 18).  This table is updated as data are received, last update 2024-09-24.

Research Opportunities:

NIST-NRC Postdoctoral Fellowship: 2-year fellowship at NIST, U.S. citizens only, ~$75,000 salary plus benefits, relocation expenses included, application deadlines are Feb. 1 and Aug. 1, requires 10 page research proposal. Contact Justin Zook if you are interested in writing a proposal on a genomics research project. We have opportunities posted for metrology in Cancer Genomics, Diploid Assembly, Epigenomics and Transcriptomics, Biological Data Science/Machine Learning, and Precision Medicine.

GIAB Email Lists:
General announcements
Analysis Team

Created October 18, 2023, Updated September 24, 2024