Data Description
Description and additional resources for each dataset
MIMIC-III (Medical Information Mart for Intensive Care III)
A large, freely-available database containing de-identified health-related data associated with over 40,000 critical care patients. It provides a comprehensive view of patient data, including demographics, vital signs, laboratory results, medications, and more.
- Key Features:
- Over 58,000 hospital admissions.
- Data spanning from 2001 to 2012.
- Includes ICU data such as prescriptions, procedures, and diagnostic codes.
Resources
Quick Links
- Database Schema: By Joseph Miles
- Notebooks: Examples of data extraciton and analysis in R markdown and jupyter Notebooks
- Tutorials: Focues on explaining concepts to new users
Tools
- Bloatectomy (paper) - A python based package for removing duplicate text in clinical notes
- Medication categories - Python script for extracting medications from free-text notes
- MIMIC Extract (paper) - A python based package for transforming MIMIC-III data into a machine learning friendly format
- FIDDLE (paper) - A python based package for a FlexIble Data-Driven pipeLinE (FIDDLE), transforming structured EHR data into a machine learning friendly format
Papers
- DRGCoder: An explainability-enhanced clinical claim coding system for the early prediction of medical severity DRGs (MS-DRGs).
- Introduces novel multi-task Transformer model for MS-DRG prediction
- Allow visualizations of salient words for better explainability
- Identifies diseases for comparision among other discharge summaries
- Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization (Code)
- four distinct clinical summarization task (radiology reports, patient questions, progress notes, and doctor-patient dialogue)
- 10 physicians evaluated the summary completeness, correctness, and conciseness. [ equivalent(46%) or superior(36%) compared to summaries from medical experts.]
- GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs - (Code)
- Mortality Prediction, Length of Stay Prediction, Readmission Prediction, Drug Recommendation
- Explictly uses MIMIC-III and MIMIC-IV datasets
MIMIC-IV (Medical Information Mart for Intensive Care IV)
It is the latest iteration of the MIMIC database, featuring updated data from the Beth Israel Deaconess Medical Center (BIDMC). This dataset improves upon MIMIC-III with more structured and expanded records, offering even greater research potential.
- Key Features:
- Covers data from 2008 to 2019.
- Improved modularity and linkage of data tables for better usability.
- Includes high-resolution waveform data for advanced analysis.
Modules
- Core
- ED: Data Related to Emergency Department.
- CXR: Provides lookup tables for linking patients with MIMIC-CXR (links patients chest x-rays to the MIMIC-IV modules).
- ECG: Provides waveforms data. Provides lookup tables to link with other MIMIC-IV modules.
- Note: Contains De-identified free-text clinical notes for hospitalized patients.
Resources
Important Links
- Mimic-iv Tables Overview by Alistair Johnson
- Reporduction of a study in MIMIC-IV by Alistair Johnson - (Code)
- Notebook: Shows how to use the data in MIMIC-IV
Extra
Papers we can look into
- MedDialog: Large-scale Medical Dialogue Datasets
- Contains Chinese dataset with 3.4 million conversations and English Dataset with 0.26 million conversations between patients and doctors
- Could be useful in medical dialogue systems
- Development and validation of a machine-learning model for predicting the risk of death in sepsis patients with acute kidney injury
- Explictly uses MIMIC-III and MIMIC-IV datasets
Additional Resources
- Standardized Data: The OMOP Common Data Model: The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is an open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence.
- Collection of Graph-based Deep Learning Literatures