Stroke Prediction Using Machine Learning
Stroke Prediction Using Machine Learning
Abstract: A Stroke is a health condition that causes damage by tearing the blood vessels in the brain. It can also occur
when there is a halt in the blood flow and other nutrients to the brain. According to the World Health Organization
(WHO), stroke is the leading cause of death and disability globally. Earlyrecognition of the various warning signs of a
stroke can help reduce the severityof the stroke. Different machine learning (ML) models have been developed to predict
the likelihood of a stroke occurring in the brain. This research uses a range of physiological parameters and machine
learning algorithms, such as Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF)
Classification, and Voting Classifier, to train four different models for reliable prediction. Random Forest was the best
performing algorithm for this task with an accuracy of approximately 96 percent. The dataset used in the development of
the method was the open-access Stroke Prediction dataset. The accuracy percentage of the models used in this
investigation is significantly higher than that of previous studies, indicating that the models used in this investigation are
more reliable. Numerous model comparisons have established their robustness, and the scheme can be deduced from the
study analysis.
Keywords: Stroke; machine learning; logistic regression; decision treeclassification; random forest classification; k-
nearest neighbors; support vector machine; Naïve Bayes classification.
1.INTRODUCTION
According to the Centers for Disease Control and Prevention (CDC), stroke is the fifth-leading cause of death in the
United States. Stroke is a non-communicable infection that is liable for around 11% of total deaths. Consistently, over
795,000 individuals in the United States experience the ill effects of a stroke . It is the fourth significant reason for death
in India. With the advancement of technology in the medical field, predicting the occurrence ofa stroke can be made
using Machine Learning. The algorithms present in Machine Learning are constructive in making an accurate prediction
and give correct analysis. The workspreviously performed on stroke mostly include the ones on Heart stroke prediction.
Very less works have been performed on Brain stroke. This paper is based on predicting the occurrence of a brain stroke
using Machine Learning. The key components of the approaches used and results obtained are that among the five
different classification algorithms used Naïve Bayes has best performed obtaining a higher accuracy metric. The
limitation with this model is thatit is being trained on textual data and not on real time brain images. The paper shows
the implementation of six Machine Learning classification algorithms. This paper can be further extended to
implementing all the current machine learning algorithms. A dataset is chosen from Kaggle with various physiological
traits as its attributes to proceed with this task.
These traits are later analyzed and used for the final prediction. The dataset is initially cleaned and made ready for the
machine learning model to understand. This step is called Data Preprocessing. For this, the dataset is checked for null
values and fill them. Then Label encoding is performed to convert string values into integers followed by one-hot
encoding, if necessary. After Data Preprocessing, the dataset is split into train and test data. A model is then built using
this new data using various Classification Algorithms. Accuracy is calculatedfor all these algorithms and compared to get
the best-trained model for prediction. After training the model and calculating the accuracy, an HTML page and a Flask
application are developed. The web application is for the user to enter the values for prediction. The flask application is
a framework that connects the trained model and the web application. After proper analysis, the paper concludes which
algorithm is most appropriate for the prediction ofstroke.
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 141
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
• There is limited previous work on utilizing machine learning algorithms to estimate perfusion parameters. In
this work, we present a novel bi-input convolutional neural network (bi-CNN) to approximate four perfusion parameters
without using an explicitdeconvolution method.
• These bi-CNNs produced good approximations for all four parameters, with relative average root Mean-Square
Errors (MSE) and Mean Absolute Error (MAE) less than equal of the maximum values.
• These results show that machine learning techniques area promising tool for perfusion parameter estimation
without requiring a standard deconvolution process.
ADVANTAGE
• Early prediction of stroke can be done.
• The cost of medication will be minimized.
• Accuracy rate will be high
• High performance.
1) Random forest
2) Decision tree
3) Logistic regression
1.RANDOM FOREST.
The classification algorithm chosen was RF classification . RFs are composed of numerous
independent decision trees that were trained individually on a random sample of data. -ese trees are created during
training, and the decision trees’ outputs are collected. A process termed voting is used to determine the final forecast
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 142
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 143
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
The flowchart for the logistic regression model . In the supervised learning approach, LR is one of the most commonly
used ML algorithms . It is a forecasting method that uses a collection of independent factors to predict a categorical
dependent variable. Utilizing logisticregression, the output of a categorical dependent variable is predicted. As a result,
the output must be discrete or categorical in nature. It may be yes or no, 0 or 1, true or false, etc., but probability values
between 0 and 1 are given. The classification problems are addressed with LR, and the regression problems are addressed
using linear regression. Instead of a regression line, we usean S-shaped logistic function that predicts the two maximum
values (0 or 1).
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 144
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 145
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 146
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
CONCLUSION
By doing so, it urges medical users to strengthen the motivation of health management and induce changes in their health
behaviors. A model for predicting stroke using machine learning algorithms. After, thoroughly reviewing various IEEE
papers we selected five different models such as decision tree, random forest and logistic regression . Key
attributes/features were selected under the guidance of medical practitioners. Visualizing health data allows
professionals to present key/common trends and information via graphs, charts and visuals that helps even a data analysts
understand the dataset. Hence, data visualization was our main objective. Used libraries like pandas, matplotlib, seaborn
and Pywaffle for informative and attractive representation of data. Predictive analytics is a popular business intelligence
trend. They help doctors make data driven decisions in no time which can even predict and prevent deadly diseases. In
this project, we have carried on categorical feature analysis, numerical feature analysis and multicollinearity successfully.
Applied different model on the dataset. A comparative study amongst the five different models showed that random
forest, logistic regression and K nearest neighbor has an accuracy of 95.5%, whereas decision tree was 91.13% accurate
and support vector machine exhibited accuracy of 92.43%. Finally, Random Forest was chosen as the best model with
high accuracy and less false negative. To facilitate seamless use of the application, a Graphical User Interface (GUI) was
created using tkinter.
FUTURE SCOPE
Stroke is dependent on a lifestyle attributes as well as past medical history. Here inthis paper, we have considered
seven lifestyle attributes and three medical conditions. In the future, or better performance of the model more medical
attributes can be considered suchhas Systolic blood pressure, diastolic blood pressure, pulse pressure, mean blood
pressure, The min, max and mean value of a pulse. Also, mRS score, NIHSS score, CHADS2 score can be added to get
a more accurate and precise output.
REFERENCES
1.A predictive analytics approach for stroke prediction using machine learning and neural network soumyddbrata Dev
a,b , Hewei Wang c,d , Chidozie Shamrock Nwosu , Nishtha Jain , Bharadwaj Veeravalli , Deepu John Healthcare
Analytics 2 (2022) 100032.
2.Analyzing the Performance of Stroke Prediction using ML Classification Algorithms Gangavarapu Sailasya1 , Gorli L
Aruna Kumari2 (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6,
2021.
3.Stroke Prediction Using Machine Learning Algorithms, Gangavarapu Sailasya , Gorli L Aruna Kumari, International
Journal of Innovative Research in Engineering & Management (IJIREM) ISSN: 2350-0557, Volume-8, Issue-4, July
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 147
IARJSET ISSN (O) 2393-8021, ISSN (P) 2394-1588
© IARJSET This work is licensed under a Creative Commons Attribution 4.0 International License 148