Data Description

Description and additional resources for each dataset

MIMIC-III (Medical Information Mart for Intensive Care III)

A large, freely-available database containing de-identified health-related data associated with over 40,000 critical care patients. It provides a comprehensive view of patient data, including demographics, vital signs, laboratory results, medications, and more.

  • Key Features:
    • Over 58,000 hospital admissions.
    • Data spanning from 2001 to 2012.
    • Includes ICU data such as prescriptions, procedures, and diagnostic codes.

Resources

  • Database Schema: By Joseph Miles
  • Notebooks: Examples of data extraciton and analysis in R markdown and jupyter Notebooks
  • Tutorials: Focues on explaining concepts to new users

Tools

  • Bloatectomy (paper) - A python based package for removing duplicate text in clinical notes
  • Medication categories - Python script for extracting medications from free-text notes
  • MIMIC Extract (paper) - A python based package for transforming MIMIC-III data into a machine learning friendly format
  • FIDDLE (paper) - A python based package for a FlexIble Data-Driven pipeLinE (FIDDLE), transforming structured EHR data into a machine learning friendly format

Papers

  • DRGCoder: An explainability-enhanced clinical claim coding system for the early prediction of medical severity DRGs (MS-DRGs).
    • Introduces novel multi-task Transformer model for MS-DRG prediction
    • Allow visualizations of salient words for better explainability
    • Identifies diseases for comparision among other discharge summaries
  • Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization (Code)
    • four distinct clinical summarization task (radiology reports, patient questions, progress notes, and doctor-patient dialogue)
    • 10 physicians evaluated the summary completeness, correctness, and conciseness. [ equivalent(46%) or superior(36%) compared to summaries from medical experts.]
  • GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs - (Code)
    • Mortality Prediction, Length of Stay Prediction, Readmission Prediction, Drug Recommendation
    • Explictly uses MIMIC-III and MIMIC-IV datasets

MIMIC-IV (Medical Information Mart for Intensive Care IV)

It is the latest iteration of the MIMIC database, featuring updated data from the Beth Israel Deaconess Medical Center (BIDMC). This dataset improves upon MIMIC-III with more structured and expanded records, offering even greater research potential.

  • Key Features:
    • Covers data from 2008 to 2019.
    • Improved modularity and linkage of data tables for better usability.
    • Includes high-resolution waveform data for advanced analysis.

Modules

  • Core
    • HOSP: Hospital level data. Collected from hospital Electronic Health Record Systems. (patients, admissions, labs, medication, and billing information).
    • ICU: Data collected from Clinical Information Systems in the ICU.
  • ED: Data Related to Emergency Department.
  • CXR: Provides lookup tables for linking patients with MIMIC-CXR (links patients chest x-rays to the MIMIC-IV modules).
  • ECG: Provides waveforms data. Provides lookup tables to link with other MIMIC-IV modules.
  • Note: Contains De-identified free-text clinical notes for hospitalized patients.

Resources


Extra

Papers we can look into

Additional Resources