TY - JOUR
T1 - A Reinforcement Learning Framework for Personalized Anticoagulation Dosing in Critical Care
T2 - Integrating Batch-Constrained Policy Optimization and Off-Policy Evaluation
AU - Lim, Yooseok
AU - Park, In Beom
AU - Lee, Sujee
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Precise medication dosing in the intensive care unit (ICU) is vital for patient survival. Heparin, a widely used anticoagulant, requires careful administration due to patient-specific variability, and inappropriate dosing can cause severe complications such as stroke or hemorrhage. This study introduces a reinforcement learning (RL)-based decision-support framework for heparin dosing, integrating offline RL algorithms with rigorous evaluation. We employ Batch-Constrained deep Q-Learning (BCQ) to learn an optimal dosing policy from retrospective data, addressing distributional shift inherent in offline settings. The dosing policies are trained on the MIMIC-III database and evaluated on the MIMIC-IV database, and vice versa. Policy effectiveness is assessed through multiple off-policy evaluation (OPE) methods, demonstrating higher expected returns than clinician-derived strategies. Interpretability is enhanced through t-SNE visualization, showing that Q-values are well aligned with therapeutic aPTT targets. To our knowledge, this is the first study to combine BCQ, multi-metric OPE, and interpretability analysis for anticoagulation management across two large-scale ICU cohorts. By advancing both methodological rigor and clinical relevance, this work provides a foundation for reliable RL-based decision-support systems in critical care.
AB - Precise medication dosing in the intensive care unit (ICU) is vital for patient survival. Heparin, a widely used anticoagulant, requires careful administration due to patient-specific variability, and inappropriate dosing can cause severe complications such as stroke or hemorrhage. This study introduces a reinforcement learning (RL)-based decision-support framework for heparin dosing, integrating offline RL algorithms with rigorous evaluation. We employ Batch-Constrained deep Q-Learning (BCQ) to learn an optimal dosing policy from retrospective data, addressing distributional shift inherent in offline settings. The dosing policies are trained on the MIMIC-III database and evaluated on the MIMIC-IV database, and vice versa. Policy effectiveness is assessed through multiple off-policy evaluation (OPE) methods, demonstrating higher expected returns than clinician-derived strategies. Interpretability is enhanced through t-SNE visualization, showing that Q-values are well aligned with therapeutic aPTT targets. To our knowledge, this is the first study to combine BCQ, multi-metric OPE, and interpretability analysis for anticoagulation management across two large-scale ICU cohorts. By advancing both methodological rigor and clinical relevance, this work provides a foundation for reliable RL-based decision-support systems in critical care.
KW - Reinforcement learning
KW - batch-constrained policy
KW - medical information mart for intensive care
KW - off-policy evaluation
KW - personalized heparin dosing policy
UR - https://www.scopus.com/pages/publications/105023313047
U2 - 10.1109/ACCESS.2025.3638417
DO - 10.1109/ACCESS.2025.3638417
M3 - Article
AN - SCOPUS:105023313047
SN - 2169-3536
VL - 13
SP - 203145
EP - 203157
JO - IEEE Access
JF - IEEE Access
ER -