BioData Catalyst
Introduction
NHLBI’s BioData Catalyst is a shared virtual space where scientists can access and work with the digital objects of biomedical research, such as data and software. It is a cloud-based platform for tools, applications, and workflows, and it provides secure workspaces to share, store, cross-link, and analyze large sets of data generated from biomedical and behavioral research, while also ensuring patient privacy.
Why is BioData Catalyst Important?
BioData Catalyst will meet the needs of the NHLBI and our research community by enhancing access to NHLBI data, like NHLBI's Trans-Omics for Precision Medicine (TOPMed) Program, which is among the first available datasets in BioData Catalyst. The platform will also provide access to tools that can be used to analyze various data types, including phenotypic, genomic, other omics, and imaging data.
AT A GLANCE
- BioData Catalyst will improve FAIR-ness—the findability, accessibility, interoperability, and reusability—of NHLBI data.
- BioData Catalyst will accelerate research and engagement to drive discovery of new diagnostics, treatments, and prevention strategies for HLBS conditions.
- It supports data democratization, so NHLBI data is accessible and understandable by researchers and citizen scientists as they work to accelerate discovery.
- Because of its interoperability, BioData Catalyst will be able to exchange information with other components of the Data Commons.
- Scientists will be able to use the platform’s GE’s capabilities to integrate NHLBI imaging data with TOPMed data.
- The platform includes chest images from the COPDGene study and whole genome sequences from the TOPMed program.
How does BioData Catalyst contribute to scientific discoveries?
BioData Catalyst directly addresses the NHLBI Strategic Vision objective of leveraging emerging opportunities in data science to open new frontiers in heart, lung, blood, and sleep (HLBS) research.
BioData Catalyst will offer specialized search functions, controlled access to data, and analytic tools via widely available programming interfaces. With these capabilities, NHLBI researchers and other scientists will be able to use NHLBI datasets for scientific discovery.
BioData Catalyst will use HLBS research to test and expand the platform. In the long term, BioData Catalyst will integrate massive datasets from NHLBI-supported clinical, population-based, and genomic studies to support NHLBI efforts toward precision medicine.
BioData Catalyst leverages the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative to enhance efficiency of data storage and computation by partnering with commercial providers. It is also collaborating with the Office of Data Science Strategy to support its efforts to connect data systems across NIH.
Who is developing BioData Catalyst?
BioData Catalyst is a joint effort of the NHLBI and data science experts in academic institutions, research organizations, and industry. Harvard Medical School, Seven Bridges Genomics, the Renaissance Computing Institute, University of California Santa Cruz, the Broad Institute, and University of Chicago are working closely with the NHLBI to develop the platform.
BioData Catalyst is governed by a steering committee that includes the development teams, NLHBI staff, and data producers and consumers. An external panel of experts provides guidance to the NHLBI during the development and implementation of BioData Catalyst.
How does BioData Catalyst work?
The BioData Catalyst development team is building this platform by engaging in the following activities.
- Constructing and enhancing annotated metadata for NHLBI datasets that comply with FAIR data principles.
- Designing and testing search and analysis tools for the unique characteristics of NHLBI datasets, and that also group data based on certain shared characteristics so that researchers can test hypotheses.
- Establishing and supporting secure workspaces for collaborative analysis specialized for NHLBI datasets and HLBS research, using a platform that brings the computation to the data, not the data to the computation.
- Developing and integrating analytic tools and workflows, as well as data analysis pipelines that will enable other researchers to repeat analyses and therefore confirm findings