Skip to content

This repository contains the official code for the paper "Real-time prediction of intensive care unit patient acuity and therapy requirements using state-space modelling" (Nature Communications), which presents a deep learning framework for real-time patient acuity prediction using EHR data.

Notifications You must be signed in to change notification settings

iheallab/apricotM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

76 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

APRICOT Logo
APRICOT-Mamba: Acuity Prediction in Intensive Care Unit (ICU)

License: GPL v3 Python 3.8+ PyTorch arXiv DOI

πŸ’»Code implementation for "Real-time prediction of intensive care unit patient acuity and therapy requirements using state-space modelling" paper.

πŸ“ Paper was accepted for publication at Nature Communications (πŸ“„PDF).


πŸ“˜ Overview

APRICOT-Mamba is a deep learning framework designed to continuously predict patient acuity in the ICU using Electronic Health Records (EHR). It extends the APRICOT family by integrating Mamba-based state space models and Transformer architectures, enabling real-time, interpretable predictions of patient stability and transitions.

This repository includes:

  • Data preprocessing pipelines for retrospective and prospective ICU cohorts.
  • Training and evaluation scripts for APRICOT-Mamba, APRICOT-Transformer, GRU, CatBoost, and Transformer baselines.
  • Post-hoc analysis tools for calibration, feature attribution, and prospective validation.

πŸ“‚ Project Structure

β”œβ”€β”€ README.md
└── main/
    β”œβ”€β”€ analyses/             # Post-training analyses (calibration, performance, etc.)
    β”‚   β”œβ”€β”€ calibration/
    β”‚   β”œβ”€β”€ confusion_matrix/
    β”‚   β”œβ”€β”€ integrated_gradients/  # Feature importance analysis
    β”‚   └── ...
    β”œβ”€β”€ baseline_models/      # Baseline models (CatBoost, GRU, Transformer)
    β”‚   β”œβ”€β”€ catboost/
    β”‚   β”œβ”€β”€ gru/
    β”‚   └── transformer/
    β”œβ”€β”€ datasets/             # Data loading and description
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ eicu/
    β”‚   β”œβ”€β”€ mimic/
    β”‚   └── uf/
    β”œβ”€β”€ models/               # Core model implementations (APRICOT-Mamba, APRICOT-T)
    β”‚   β”œβ”€β”€ apricotm/         # APRICOT-Mamba model
    β”‚   β”œβ”€β”€ apricott/         # APRICOT-Transformer model
    β”‚   β”œβ”€β”€ model_comparison.py
    β”‚   └── variables.py      # Configuration variables
    β”œβ”€β”€ prospective_cohort/   # Prospective cohort data processing
    β”œβ”€β”€ retrospective_cohort/ # Retrospective cohort data processing
    β”œβ”€β”€ sofa_baseline/        # SOFA score baseline calculation
    └── summary/              # Summary generation scripts

βš™οΈ Requirements

Software

  • Python β‰₯ 3.8
  • Package Manager: pip or conda
  • Key Python Libraries:
    • pandas
    • numpy
    • scikit-learn
    • h5py
    • torch (PyTorch)
    • optuna
    • catboost
    • captum

Install all dependencies with:

pip install -r requirements.txt

Note: For GPU support with PyTorch, refer to the official installation guide.

Hardware

  • CPU: Multi-core processor
  • RAM: β‰₯ 16GB
  • GPU: NVIDIA GPU with CUDA support (recommended for training deep learning models)

πŸ₯ Data Sources

This project utilizes EHR data from:

  • eICU Collaborative Research Database: A multi-center ICU database with high granularity data for over 200,000 admissions. Access requires credentialed approval.

  • MIMIC-IV: A large, freely accessible critical care database comprising de-identified health-related data associated with over 60,000 ICU admissions.

  • University of Florida Health (UFH): Internal EHR data from UF Health. Note: This dataset is not publicly available at this time.

Data processing scripts are located in:

  • main/datasets/
  • main/retrospective_cohort/
  • main/prospective_cohort/

The primary data format for training and evaluation is HDF5 (.h5). The script main/retrospective_cohort/5_build_hdf5.py demonstrates the structure of the final dataset.h5 file, which includes training, validation, external test, and temporal test sets with features (X), static data (static), and labels (y_main, y_trans).

Refer to main/datasets/README.md for detailed information on data sources and initial setup.


πŸš€ Getting Started

1. Data Preparation

Process raw EHR data to generate the required dataset.h5 file:

python main/retrospective_cohort/5_build_hdf5.py

Note: Adjust paths and parameters as needed in the script.

2. Model Training

Navigate to the desired model directory and run the training script:

cd main/models/apricotm/
python 1_train.py

This script performs hyperparameter optimization using optuna, trains the model with PyTorch, and saves:

  • Best hyperparameters: best_params.pkl
  • Model weights: apricotm_weights.pth
  • Model architecture: apricotm_architecture.pth

Training duration is approximately 2 hours on an NVIDIA A100 GPU.

Repeat the process for other models as needed.

3. Model Evaluation

Evaluate the trained model on test sets:

python 2_eval.py

Evaluation results are saved in the results subdirectory within the model's directory.

4. Prospective Run

If prospective data is prepared, apply the trained model:

python 3_prospective.py

5. Post-hoc Analyses

Perform analyses on model predictions:

python main/analyses/calibration/1_calibration.py
python main/analyses/integrated_gradients/1_integrated_gradients_table.py

6. Expected Output

Results are generated under the user-defined home directory (HOME_DIR), time window (time_window), and model:

{HOME_DIR}/deepacu/main/{time_window}h_window/model/{model}/results

πŸ“Š Results & Performance

APRICOT-Mamba demonstrates high performance in predicting patient acuity, with AUROC scores comparable to state-of-the-art models. Detailed performance metrics, calibration plots, and feature importance analyses are available in the results directories and can be visualized using the provided analysis scripts.


πŸ§‘β€πŸ’» Contributing

We welcome contributions from the community! To contribute:

  1. Fork the repository.
  2. Create a new branch for your feature or bugfix.
  3. Commit your changes with clear messages.
  4. Submit a pull request detailing your changes.

πŸ“„ License

This project is licensed under the GNU General Public License v3.0.


πŸ“š Citation

If you use this work in your research, please cite:

@article{contreras2025real,
  author    = {Miguel Contreras, Brandon Silva, Benjamin Shickel, Andrea Davidson, Tezcan Ozrazgat-Baslanti, Yuanfang Ren, Ziyuan Guan, Jeremy Balch, Jiaqing Zhang, Sabyasachi Bandyopadhyay, Tyler Loftus, Kia Khezeli, Gloria Lipori, Jessica Sena, Subhash Nerella, Azra Bihorac, Parisa Rashidi},
  title     = {Real-time prediction of intensive care unit patient acuity and therapy requirements using state-space modelling},
  journal   = {Nature Communications},
  year      = {2025},
  month     = {July},
  doi       = {10.1038/s41467-025-62121-1},
}

πŸ“¬ Contact


About

This repository contains the official code for the paper "Real-time prediction of intensive care unit patient acuity and therapy requirements using state-space modelling" (Nature Communications), which presents a deep learning framework for real-time patient acuity prediction using EHR data.

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages