B22IT031-report[2]
B22IT031-report[2]
on
MACHINE LEARNING IN HEALTHCARE
Submitted in
Partial fulfilment of the requirements
for the award of
Bachelor of Technology
in
Information Technology
By
D. SATHWIKA
B22IT031
1
CERTIFICATE
ii
ACKNOWLEDGEMENT
I wish to take this opportunity to express my deep gratitude to all the people who have
extended their cooperation in various ways during my Seminar. It is my pleasure to acknowledge
the help of all those individuals.
I thank Dr. K. Ashoka Reddy, Principal of Kakatiya Institute of Technology & Science,
Warangal, for his strong support.
In completing this Seminar successfully all our faculty members have given an excellent
cooperation by guiding us in every aspect. All your guidance helped me a lot and I am very
grateful to you
D. SATHWIKA
B22IT031
iii
ABSTRACT
iv
CONTENTS
ABSTRACT i
CONTENTS ii
1 CHAPTER 1: INTRODUCTION 1
1.1 Introduction 1
v
3 CHAPTER 3: METHODOLOGIES 8
7 CONCLUSION 18
8 REFERENCES 19
vi
LIST OF FIGURES
5.1.1
Top Performance Metrics in Machine Learning 14
vi
CHAPTER 1: INTRODUCTION
Machine learning is changing the healthcare approach because such algorithms and statistical
models analyze various kinds of complex medical data and can make predictions to offer
actionable insight. Thus, by learning patterns and relationships within data, it enables healthcare
professionals to make informed decisions, enhance patient care, and optimize clinical workflows.
1. Improved Diagnosis: Early and accurate detection of diseases- for example, cancer,
Alzheimer's Analysis of medical images to identify abnormalities
3. Personalized Medicine: Treatment planning from the genetics and history of the patient
With vast potential, healthcare deployed machine learning faces key challenges in issues of
privacy over data, integration into workflows, addressing the bias within the algorithms, and
1
compliance
2
with regulations such as HIPAA and GDPR. But still, the advancements that are happening in
federated learning, explainable AI, and ethical AI practices will resolve these challenges and
make adoption widespread. As machine learning continues to evolve, it holds the potential to
reshape the healthcare landscape, making it more efficient, accessible, and patient-centric.
Machine learning plays a transformative role in health care by enabling data-driven decision
making, improvement of diagnostics, optimization of operations, and improvement in patient
outcomes. It helps to analyze large, highly complex datasets, such as electronic health records,
medical images, genomic sequences, and wearable device data, to find patterns and make
accurate predictions.
Advances in medical imaging are being made by machine learning, where it is possible to clearly
interpret results from x-rays, MRIs, and CT scans using a technique called convolutional neural
networks, or CNNs. Beyond clinical care, it enhances the efficiency of operations by automating
clerical work, resource allocation, and remote monitoring by providing IoT-enabled devices.
3
CHAPTER 2: LITERATURE SURVEY
models, emphasizing
metrics like Precision,
Recall, F-Measure,
Accuracy, and Area
Under the Curve (AUC).
4
The primary objective The study utilized a dataset of The study encountered several
of the study was to 3264 T1-weighted contrast- limitations. The dataset was
enhance the detection enhanced MRI images, relatively small for training deep
and classification of categorized into glioma, neural networks, with only 3264
brain tumors using meningioma, pituitary gland images expanded to 9792 through
Magnetic Resonance tumors, and healthy brains. augmentation, which may
using convolutional deep learning methods
and chosen machine learning techniques.
convolutional deep and augmenting the dataset lacked diversity, as they were
learning methods and through rotations and vertical obtained from a single source and
traditional machine flipping, resulting in 9792 may not generalize well to broader
learning techniques. The samples. Two deep learning populations or other imaging
authors aimed to models were designed: a 2D modalities. Labeling issues were
develop a robust CNN and a Convolutional Auto- noted; annotating MRI data
computational approach Encoder Network. The 2D CNN requires expertise and is time-
for diagnosing three featured a hierarchical structure consuming, restricting the dataset
tumor types—glioma, with eight convolutional layers, size. The proposed models, though
meningioma, and four pooling layers, and batch optimized for accuracy, had
pituitary gland tumors— normalization to execution times constrained by
as well as distinguishing stabilize learning. computational complexity,
them from healthy especially for the auto-encoder,
brains. Recognizing the which required more resources.
limitations of manual
biopsy, which requires
invasive surgery, the
study sought to create an
automated, non-invasive
diagnostic tool with
high
accuracy and efficiency.
5
The research aims to Features such as age, bilirubin The dataset also presents feature
develop a robust levels, albumin, and enzyme correlations, such as between
machine learning counts were normalized to Direct and Total Bilirubin, which
framework for enhance uniformity. The Particle might lead to biased model
predicting and Swarm Optimization (PSO) outcomes or overfitting. Moreover,
classifying liver diseases feature selection technique was while machine learning models
to reduce the workload employed to identify critical like Random Forest and MLP
on healthcare attributes that significantly provide high accuracy, their lack
Machine Learning Classification
algorithms and evaluates Neighbor (KNN), Random learning for image-based diagnosis
their performance to Forest (RF), and Multilayer and does not incorporate temporal
create a model that can Perceptron (MLP), were data for tracking disease
reliably identify the implemented to classify liver progression over time.
likelihood and severity disease conditions effectively. Additionally, the absence of real-
of liver diseases. The time prediction capabilities and
ultimate goal is to build integration with clinical workflows
a system that processes limits the practical applicability of
patient data and predicts the models. Future expansions aim
outcomes with an to overcome these limitations by
accuracy benchmark of incorporating more diverse and
90%. By analyzing liver comprehensive datasets from
function test data and global populations, balancing
other patient-specific gender representation, and
attributes, the study addressing dataset imbalance
seeks to enhance the through advanced
decision-making process sampling techniques.
in clinical settings.
6
The primary objective The study employed a structured This study's key limitations
of the study is to data science workflow beginning include the small dataset size,
leverage machine with data collection, utilizing a which may restrict the
learning techniques to dataset with 27 attributes and generalizability of the results and
predict mental health 1,259 entries. Data the model's ability to perform well
issues and classify preprocessing was conducted to on diverse populations. The
Predicting Mental Health Illness
various mental health clean the dataset by addressing dataset's reliance on structured
disorders efficiently. missing values, errors, and attributes may fail to capture
using Machine Learning
This approach aims to inconsistencies, ensuring its complex mental health conditions
address the global readiness for analysis. This was that arise from unstructured data
challenge of early followed by data encoding, such as text, speech, or behavioral
Algorithms
7
Synergistic Analysis of Predictive Ability of Machine- Intelligent and Interactive
Lung Cancer's Impact on Learning Methods for Vitamin D Healthcare System (I²HS) Using
Cardiovascular Disease using Deficiency Prediction by Machine Learning
ML- based Techniques Anthropometric Parameters.
8
Technique for COVID-19 and Post-
Extraction with Machine Learning
Disease Based on Hybrid Feature
Prediction of Muscular Paralysis
COVID-19 Patients
Pregnancy Outcomes: A Systematic
Review, Synthesizing Framework
and Future Research Agenda
Machine Learning to Predict
Literature survey Several key contributions are identified in the area of CKD prediction using
machine learning. These studies established the utility of traditional models, such as Logistic
Regression and Random Forests for binary classification of CKD risk. Advanced ensemble
methods, such as Gradient Boosting and XGBoost, were introduced in later work; its
performance with the rich features set was better. In addition, neural nets and deep learning have
been used to model difficult-to-model relationships in large datasets, although they typically lack
interpretability.
Feature selection techniques, including Recursive Feature Elimination, are often used to clean
datasets and derive important clinical parameters, such as blood urea levels and glomerular
9
filtration rate. The preprocessing techniques include imputation for missing values and
oversampling to counter class imbalance, and these are important strategies in ensuring that the
final model developed is robust.
2. MRI-based brain tumor detection using convolutional deep learning methods and
chosen machine learning techniques.
The literature review articulates different machine learning and deep learning techniques applied
in the detection and classification of brain tumors. It focuses on the evolution and limitations of
these methods. Initial research focused on using traditional machine learning algorithms like
SVM and Random Forests, which work well with structured datasets but face a challenge in high
dimensionality and complexity when it comes to medical images. This has opened doors for
Convolutional Neural Networks (CNNs), which since have become a dominant tool for
analyzing MRI images due to their higher ability in extracting meaningful features.
10
Other developments are hybrid methodologies. For example, Gumaei et al. combined GIST
feature extraction with Extreme Learning Machines to improve the classification performance
while not conducting direct comparisons to modern deep learning models. Capsule networks
have been used in segmentation and classification applications, reportedly robust performance in
some studies. The classification accuracy remains suboptimal as compared to advanced CNN
architectures, such as those based on deeper networks like VGG16 and AlexNet.
The paper reviews the application of ML algorithms for predicting mental health conditions,
emphasizing their potential to overcome shortcomings in traditional methods of diagnosis. Five
ML techniques—Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, and
Stacking—are evaluated and focused on the Stacking technique with the highest accuracy of
81.75% in making the correct predictions. Literature survey discusses recent advancements in
ML, particularly ensemble classifiers and preprocessing methodologies like ILIOU, which
enhance the performance of models.
Studies indicate ML's utility in the detection of psychiatric illness, such as depression and
schizophrenia, through variety data sources like social media and wearables. However,
challenges including small dataset sizes, ethical issues, and integration with clinical workflows
are faced.
11
5. Improved Prediction of Thyroid Diseases with Machine Learning Method
A literature survey in this paper explores various machine learning techniques used in
predicting thyroid diseases while highlighting the urgency of accurate and efficient
diagnostic tools in healthcare. Support Vector Machines, Random Forest, Logistic
Regression, and k-Nearest Neighbors are some of the traditional algorithms explored for
thyroid disease classification. These techniques have often been applied in clinical and
biochemical data, with some showing high accuracy when classifying patients towards
thyroid conditions.
Whereas these models show good performance, data imbalance, feature selection, and
the complexity of clinical datasets remain significant issues. Several studies have
incorporated data preprocessing techniques such as class balancing with Synthetic
Minority Oversampling Technique (SMOTE) and feature ranking to enhance the
performance of these models. Recent advances further involve ensemble methods in
prediction steps such as Rotation Forest and Gradient Boosting, and deep learning
techniques such as Convolutional Neural Networks (CNNs) that are said to be capable of
processing and learning from complex, high-dimensional data like medical images.
The paper on the Intelligent and Interactive Healthcare System (I²HS) Using Machine
Learning discusses how artificial intelligence together with machine learning techniques
could be amalgamated into developing adaptive healthcare systems that are at the same
time intelligent and user-centric. It analyzes existing approaches to predictive modeling,
decision support systems, and personalized treatment plans-a focus on real-time data
processing coming from different sources, including electronic health records, wearable
devices, and patient feedback.
The study presents challenges such as data privacy assurance, heterogeneous data format
handling, and achieving explainability in machine learning models.
12
7. Predictive Ability of Machine-Learning Methods for Vitamin D Deficiency
Prediction by Anthropometric Parameters.
Other unsupervised learning methods, like K-means clustering and principal component
analysis (PCA), have been applied to patient segmentation and feature extraction from
datasets regarding multi-modal healthcare, where large amounts of data in the sizes found
in the UCI repository or clinical records allow rich training of robust ML models.
13
9. Prediction of Muscular Paralysis Disease Based on Hybrid Feature Extraction
with Machine Learning Technique for COVID-19 and Post-COVID-19 Patients
The literature survey of the paper "Prediction of Muscular Paralysis Disease Based on
Hybrid Feature Extraction with Machine Learning Technique for COVID-19 and Post-
COVID-19 Patients" focuses on the use of machine learning (ML) models to predict
muscular paralysis, especially among patients suffering from COVID-19 and its
aftereffects. The survey enumerates previous works that used supervised learning
techniques, among them, SVM, RF, and LR, for diagnosing many illnesses, including
neurological diseases. It also discusses how the hybrid method of feature extraction,
using biochemical markers along with anthropometric data, improves prediction
accuracy.
The machine learning literature survey in this paper, "Machine Learning to Predict
Pregnancy Outcomes: A Systematic Review, Synthesizing Framework and Future
Research Agenda," captures and conveys the sheer explosion of machine learning in the
prediction of pregnancy outcomes. It has discussed and analyzed several approaches of
ML techniques used for the prediction of various pregnancy-related complications
including gestational diabetes, preeclampsia, preterm birth, etc using support vector
machines (SVM), random forests (RF), artificial neural networks (ANNs), and logistic
regression (LR). Several studies mentioned in the survey focus on the clinical data usage,
including maternal age, blood pressure, glucose levels, and ultrasound measurements,
which are considered key features for proper prediction accuracy. The survey points out
the considerations during data preprocessing including feature selection, handling
imbalanced datasets, and missing data imputation that improve model performance.
14
CHAPTER 3: METHODOLOGIES
Machine learning has emerged as a powerful tool for healthcare applications in diagnostics,
treatment planning, patient monitoring, and predictive analytics. These methodologies used in
ML for healthcare range from techniques applied to different types of healthcare problems. A
few key methodologies include the following:
1. Supervised Learning
Definition: Supervised learning involves training a model on labeled data wherein the input
features as well as the target labels, or the outcomes, are known.
Techniques:
Classification: Used for applications such as disease diagnosis (e.g., predicting the presence of
cancer or classifying patients as having or lacking diabetes).
Algorithms: Decision Trees, Random Forests, Support Vector Machines (SVM), k-Nearest
Neighbors (k-NN), Logistic Regression.
Regression: Applied to predict continuous outcomes, including predicting patient age, hospital
stay length, or blood pressure.
Algorithms: Linear Regression, Ridge and Lasso Regression, Decision Trees for Regression.
Applications in Healthcare:
15
2. Unsupervised Learning
Definition: Models which are unsupervised learning techniques are used for the condition where
the data does not have an output label and is aimed at finding hidden patterns or structure in the
data.
Techniques:
Clustering: Group patients or data into clusters about similarities (for example, grouping patients
with similar symptoms or risk factor).
Dimensionality Reduction: is the technique of reducing the feature dimension while holding a
large amount of important information so that visualization can be done and improves the
performance of any model.
Healthcare Applications:
Patient subgroup detection to implement more personalized treatments, such as different types of
cancer or rare diseases.
Analysis of big complex clinical dataset to identify hidden correlations and trends.
3. Semi-Supervised Learning
It is a learning paradigm that combines a small amount of labeled data with a large set of
unlabeled data, thereby enhancing the accuracy of learning when labeled data is limited.
Methods.
Applications in Healthcare:
16
Labeling a small number of medical images and applying those labels to a larger, unmatched
collection
Prediction of patient outcomes where only partial or noisy labels are available
Definition: Reinforcement learning is learning through trial and error by interacting with an
environment in order to maximize a cumulative reward.
Techniques:
Applications in Healthcare:
Robotic surgery: Teaching robots to optimize the surgery sequences based on its past
5. Deep Learning
Definition: Subset of machine learning focusing on neural networks that possess multiple layers,
i.e., deep networks. It excels in processing large, unstructured datasets like images, text, and
time- series data.
Techniques:
Convolutional Neural Networks (CNNs): Used primarily in processing image data, such as
medical imaging (MRI, CT scans) to identify and classify diseases.
Recurrent Neural Networks (RNNs): Applied in sequential data, such as the monitoring data of
patients or clinical time-series data, for example ECG.
17
Autoencoders: Applied to anomaly detection and feature learning from high-dimensional data,
for instance identifying abnormal patterns in medical records.
Applications in Healthcare
Medical Imaging: Detecting tumors, fractures, or other issues from X-rays, MRIs, and CT scans
by using CNNs.
Predictive Analytics: Forecasting the progression or outcome of a disease from sequential data,
such as RNNs applied to forecast patient's vital signs or monitoring how a disease advances.
Natural Language Processing (NLP): Applied for the treatment of clinical notes or medical
literature from which pertinent information could be extracted or used for diagnoses obtained
from text.
6. Transfer Learning
Definition: Transfer learning takes the model to pre-training and fine-tunes it on a new dataset but
related, saving valuable time and computation resources.
Techniques:
Using pre-trained deep-learning models on large general data sets like ImageNet and then
adapting it for specific use in healthcare applications, for example, detecting disease within
medical images.
Applications in Healthcare:
Medical Imaging: Using models trained on general image datasets but fine-tuning for specific use
in medical conditions such as tumor detection in mammograms.
Text Classification: Adapt NLP models trained over massive textual corpus into use with specific
medical terminology in electronic health records.
7. Ensemble Learning
Definition: Ensemble methods combine multiple models for improved accuracy and robustness of
18
predictions.
19
Techniques
Bagging: Reduces variance by training multiple models on different subsets of the original data.
Example: Random Forest.
Boosting: Improves weak models through iteratively adding corrections that gradually improve
accuracy. Examples: Gradient Boosting, AdaBoost.
Stacking: Blends predictions from many models using another model to make the actual
prediction.
Healthcare Applications
Combining multiple models for patient outcome prediction. Examples include predicting survival
ratios or disease progression.
8. Federated Learning
Techniques:
Federated Averaging: FedAvg is a method that lets the different instances update their local models
independently, with only the model weights being shared. Such an approach does not transfer data.
Applications in Healthcare:
20
CHAPTER 4: MACHINE LEARNING APPLICATIONS
2. Predictive Care: Machine learning predicts the rates of rehospitalization for patients,
enhancing care management.
4. Health Support: These ML-based chatbots and virtual assistants provide health information
and guidance to the users.
5. Data privacy and fairness: Maintaining privacy in patient data and ensuring fair usage of the
ML technology End
21
CHAPTER 5: COMPARATIVE EVALUATION
1. Evaluation Metrics
Several evaluation metrics are used to compare machine learning models. These depend on the
type of problem to which it is being applied (classification, regression, etc.) and the purpose of
the particular task at hand. Typical evaluation metrics include:
The terms precision, recall, and F1-score are very helpful for imbalanced datasets. Precision
calculates the percentage of true positives against the number of predicted positives; recall
calculates the percentage of true positives among all actual positives; the F1-score is the
harmonic mean between precision and recall.
22
2. Model Comparison Criteria
In performing the comparative assessment of machine learning models, the following would
have to be taken into consideration:
Accuracy/Performance:
How well does the model perform on the task? For classification tasks, does it minimize false
positives or false negatives effectively? For regression, how close are the predictions to actual
values?
Computational Efficiency:
How much time and computational resources does the model require in terms of training and
prediction time? Some models, for example, deep learning-based models, may need much more
computation as compared with a simpler model like logistic regression.
23
FUTURE SCOPE IN MACHINE LEARNING
1. Explainable AI (XAI)
The current challenge for most machine learning models and deep learning models: They appear
to be "black boxes." Their decision-making process is hard to interpret.
Future scope: The development of Explainable AI (XAI) aims to make machine learning models
more transparent and interpretable, allowing users to understand how and why a model arrived at
a particular decision. This is critical for applications in sensitive areas such as healthcare,
finance, and legal systems, where model transparency is necessary for trust and accountability.
2. Federated Learning
Current challenge: There is significant concern over data sharing about medical, financial or
personal details, hindering the development of global machine learning systems.
Future prospects: With Federated Learning, models can now be trained on devices or servers
holding local data in decentralized locations without transferring the data itself. That way,
collaborative machine learning will be done while allowing for privacy and security over
sensitive information.
Example: Healthcare systems collaborating across hospitals to develop robust disease prediction
models while keeping patient data local and secure.
Personalized Medicine: ML algorithms will analyze genetic data, medical history, and lifestyle
to create tailored treatment plans.
24
Early Diagnosis and Predictive Analytics: ML can help detect diseases like cancer, Alzheimer's,
and heart disease earlier by identifying patterns in medical data such as imaging, lab results, and
patient history.
Drug Discovery: ML can speed up discovering and testing of drugs, saving time and money by
predicting how the drugs will react with the body and disease processes.
Algorithmic Trading: ML algorithms will become even more important to automate as well as
optimize trading decisions by identifying patterns in the market and forecasting stock trends.
Fraud Detection: ML systems will become more competent in detecting fraud by analyzing
transaction patterns, customer behavior, and other external data sources.
Climate Modeling: ML will improve the prediction of climate change through an enormous
amount of environmental data that can be analyzed.
Smart Grids: AI will enable better management of the delivery of electricity and minimize waste.
6. Education
Personalized Learning: ML will create tailored learning experiences, varying in terms of content
and pace to the requirements of each student.
Automated Grading: Through ML, the grading of assignments, essays, and exams will be
automated, allowing for easier feedback on a broader scale.
25
CONCLUSION
26
REFERENCES
[1] American Brain Tumor Association (ABTA), “Brain tumor statistics,” accessed Aug. 12,
2017. http://www.abta.org/about-us/news/braintumor-statistics/
[2] World Health Organization, Mental health: a call for action by world health ministers.
Geneva: World Health Organization, Department of Mental Health and Substance Dependence,
2001.http://www.foxnews.com/health/2012/02/10/hypothyroidismversuhyperthyroidism.html
(accessed dec 2015)
[3] Trends in maternal mortality 2000 to 2017: estimates by who, unicef, unfpa, world bank group
and the united nations population division. https://www.unfpa.org/featured- publication/trends-
maternal-mortality2000-2017. Accessed 10 Jan 2021.
[4]Esteva, A., et al. (2019). A guide to deep learning in healthcare. Nature Medicine, 25(1), 24
29.
[5] Razzak, M. I., Imran, M., & Xu, G. (2018). Big data analytics for predictive healthcare: A
survey. Journal of Healthcare Engineering, 2018, 1-17.
[6] Rajkomar, A., et al. (2018). Scalable and accurate deep learning for electronic health records.
NPJ Digital Medicine, 1(1), 18.
[7]Li, Y., et al. (2020). A review of machine learning in healthcare: Applications, challenges, and
opportunities. IEEE Access, 8, 132118-132131.
[8] Choi, E., et al. (2016). Doctor AI: Predicting clinical events via recurrent neural networks.
Proceedings of the 2016 International Conference on Health Informatics, 301-307.
27
28