Enhancing Hospital Resource Management: Predicting Patient Length of Stay Using Machine Learning
Enhancing Hospital Resource Management: Predicting Patient Length of Stay Using Machine Learning
ISSN No:-2456-2165
Abstract:- This project aims to enhance hospital known for its applicability in various healthcare analyses [6].
management by predicting patients’ length of stay using
the MIMIC dataset, ultimately resulting in substantial cost II. MOTIVATION
savings and improved resource allocation. In our initial
approach, we categorized the target variable, “length of The motivation driving this project stems from the
stay” into three classes: short, medium, and long. immense financial strain imposed on the healthcare system in
Employing classification models including Logistic the United States. In 2020, the nation’s healthcare expenditure
Regression, Random Forests, and Gradient Boosting, we surpassed an astonishing $4 trillion, with nearly a third of
attempted to predict patient outcomes. However, the initial this allocated to hospital charges and services. Understanding
results were unsatisfactory, prompting us to refine our the substantial costs associated with patient care, especially
methodology. We expanded the target variable classes to concerning the duration of hospital stays, emphasizes the
five: very short, short, medium, long, and very long, need for efficient resource management.
leading to improved accuracy in predicting short The outbreak of the COVID-19 pandemic significantly
hospital stays. In the second approach, we treated the disrupted routine healthcare services [7] [8]. Lock-downs and
length of stay as a continuous variable and employed a healthcare focus on COVID-19 treatment led to the halting
Multiple Linear Regression for modeling. Unfortunately, of critical vaccination programs against diseases like measles,
this ap- proach yielded sub-optimal results compared to polio, and meningitis. This interruption in regular healthcare
the classification techniques. We analyzed the encountered protocols endangered millions of children, underscoring the
limitations and further propose future steps to enhance urgency of efficient patient care and resource allocation.
the efficiency and accuracy of prediction models,
ultimately contributing to more effective hospital resource The pandemic highlighted the importance of
management. categorizing patients based on symptom severity. Hospitals
faced over- whelming patient influxes, necessitating a system
Keywords:- Length of Stay, MIMIC III, Classification, to prioritize admissions and allocate resources effectively [9].
Random Forest, Healthcare. This under- scored the necessity of a predictive model that
I. INTRODUCTION could categorize patients into “short”, “medium” and
“long” stays based on various metrics. Such a model would
The rising demand for healthcare, especially in ensure appropriate care and resource distribution, particularly
developed countries, is driven by an aging population. for the most critical cases.
Policymakers and healthcare organizations aim to align
financial incentives with best practices to improve patient It’s crucial to emphasize that our project does not intend
outcomes and healthcare af- fordability. Chronic diseases, to replace medical professionals, rather it aims to
linked to changing lifestyles and dietary habits, pose a complement their expertise. Recognizing that the pivotal
significant challenge, being the leading cause of mortality and period for a patient’s treatment begins upon hospital
disability in the US [1]. Conditions like obstructive admission, our model provides crucial insights for medical
pulmonary disease, type 2 diabetes, cancer, and staff to optimize resource utilization and deliver timely care.
cardiovascular diseases are burdening healthcare systems [2]. By achieving this, we strive to alleviate the strain on
Long-term hospital stays have surged over the past decade due healthcare resources and ultimately save both time and lives.
to the prevalence of chronic illnesses [2]. In the US, hospitals III. OBJECTIVE
spend over $377.5 billion annually on patient admissions
and stays [3]. Prolonged hospitalizations increase the risk Our objective is to construct two distinct models
of hospital-acquired conditions [4]. Accurately predicting a utilizing data extracted from the MIMIC database [6].
patient’s hospital stay length is crucial for efficient resource These models aim to provide predictive and exploratory
management, cost reduction, and improving patient care [4]. insights. The first model is designed to forecast the
Machine learning and data mining techniques, particularly in probability of a categorical outcome, specifically a patient’s
intensive care, show promise in optimizing healthcare resource length of stay. Patients will be categorized into various classes
management [5]. This project utilizes the MIMIC database [6], of length of stay based on their individual characteristics and
Fig. 2: Patient length of stay (LOS) by age, across various patient ethnicity
short stay at the hospital. Whereas both the Males and VI. RESULTS
Females with a median age of above 65 had the longest stay
at the hospital. Initially, the target variable (LOS) had 3 classes: short
(0- 5), medium (5-10), and long (>10 days). The
C. Modelling Approach performance of all three models Logistic Regression, Random
Approach 1: We built a classification model, to predict Forests, and Boosting, is presented below. With three
the output variable LOS(Length of Stay)/. Various target classes, all models exhibited subpar performance with
classifica- tion model, like Logistic Regression [15] [16], an accuracy below 50%. The Gradient Boosting model
Random Forest [17], and Boosting [18] were implemented slightly outperformed the other two models with an accuracy
on the unseen dataset. of 45.71%, an error rate of 54.28%, and a 95% confidence
Approach 2: As the target variable is continuous we interval in the range of 0.4523.
can use the regression techniques as well , like multiple
linear regression. Multiple linear regression [19], uses
several explanatory variables to predict the outcome of
a response variable. Hence, the patient’s length of stay
can be modelled as a linear function of multiple
variables.
to 0.4620. The confusion matrix metrics, such as hospital length of stay for the short days class. Finally, we
precision and recall, also displayed significantly poor results, considered the target variable as continuous and applied a
with none of them surpassing the 70% threshold in any of the linear regression model for prediction. However, the outcomes
three models. To further enhance these results, we from this model, as depicted below, were notably inferior
expanded the target variable into two additional classes, compared to the classification models with 3 classes. The
resulting in a total of 5 classes. In this scenario, the Multiple R-squared value was computed to be 0.0971,
performance improved signifi- cantly for all three models. As while the adjusted R-squared was 0.09702.
seen in the table below, Random Forests outperformed the
other two models with an accuracy of 87.11%, an error rate Since the classification models outperformed the
of 18.37%, and a 95% confidence interval in the range of regression model, we opted to use classification as our final
0.8661 to 0.8760 as seen in the Fig. 6. clearly indicates that model.
our Random Forest model is successfully predicting the
VII. DISCUSSION