Data Resources

A collection of datasets for the HOPE project

The HOPE project relies on high-quality, publicly available datasets for research and development. Below are the details of key datasets utilized in this project:

  • MIMIC-III (Medical Information Mart for Intensive Care III): a large, freely-available database containing de-identified health-related data associated with over 40,000 critical care patients. It provides a comprehensive view of patient data, including demographics, vital signs, laboratory results, medications, and more.

    • Key Features:
      • Over 58,000 hospital admissions.
      • Data spanning from 2001 to 2012.
      • Includes ICU data such as prescriptions, procedures, and diagnostic codes.
    • Access:
      • MIMIC-III Website
      • License: Requires successful completion of a Data Use Agreement (DUA) and CITI “Data or Specimens Only Research” training.
  • MIMIC-IV (Medical Information Mart for Intensive Care IV): is the latest iteration of the MIMIC database, featuring updated data from the Beth Israel Deaconess Medical Center (BIDMC). This dataset improves upon MIMIC-III with more structured and expanded records, offering even greater research potential.

    • Key Features:
      • Covers data from 2008 to 2019.
      • Improved modularity and linkage of data tables for better usability.
      • Includes high-resolution waveform data for advanced analysis.
    • Access:
      • MIMIC-IV Website
      • License: Requires a Data Use Agreement (DUA) and completion of CITI training, similar to MIMIC-III.
  • All of Us Research Program: is a historic initiative by the National Institutes of Health (NIH) to collect health data from one million or more people across the United States. This dataset offers a diverse range of health-related data aimed at advancing precision medicine.

    • Key Features:
      • Data from a diverse and large cohort.
      • Includes genetic, environmental, and lifestyle data.
      • Enables studies on how individual differences influence health outcomes.
    • Access:
  • Epic COSMOS: is Epic’s extensive dataset designed to provide insights into patient care trends. It aggregates real-world clinical data from millions of patients across Epic’s client organizations, offering a powerful resource for research and analysis.

    • Key Features:
      • Real-world clinical data from more than 160 million patients.
      • Includes longitudinal records across multiple healthcare organizations.
      • Supports advanced analytics and machine learning applications.
    • Access:
      • COSMOS Website
      • License: Requires agreements through participating Epic institutions.
  • EMRBots: A synthetic Electronic Medical Record (EMR) dataset designed to simulate clinical scenarios for research and analysis. This dataset provides an innovative resource for testing machine learning models in healthcare contexts without the privacy concerns of real-world patient data.

    • Key Features:
      • Synthetic EMR data that mimics real-world patient scenarios.
      • Includes structured and unstructured data for various clinical use cases.
      • Useful for machine learning applications in healthcare without requiring patient data access.
    • Access:

View Additional Description