Interpreting Cardiovascular Disease Using Random Forest and Explainable AI
Interpreting Cardiovascular Disease Using Random Forest and Explainable AI
https://doi.org/10.22214/ijraset.2023.52922
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Abstract: These days, artificial intelligence and machine learning in trendy have proven terrific performances in lots of
obligations, from image processing to natural language processing, specifically with the advent of machine learning in
conjunction with studies development, they've encroached upon many specific fields and disciplines. a number of them require
excessive degree of duty and as a result transparency, as an instance the clinical region studies into Explainable Artificial
Intelligence (XAI) has been increasing in current years as a response to the need for extended transparency and believe in AI.
that is especially crucial as AI is utilized in sensitive domain names with societal, moral, and safety implications reasons for
system choices and predictions are as a consequence had to justify their reliability. This requires extra interpretability, which
frequently approach we need to understand the mechanism underlying the algorithms. by means of applying the same
categorization to interpretability in clinical research, it is hoped that (1) clinicians and practitioners can in the end method those
strategies with caution, (2) insights into interpretability could be born with greater issues for scientific practices, and (3)
initiatives to push ahead statistics-based totally, mathematically- and technically-grounded scientific schooling is recommended.
I. INTRODUCTION
Artificial Intelligence (AI) has become more and more popular within the healthcare industry for sickness diagnosis, analysis, and
remedy. however, the "black box" nature of a few AI algorithms has raised concerns regarding the interpretability and transparency
of the results produced[1]. This loss of interpretability is specifically tricky within the case of complicated sicknesses together with
cardiovascular ailment (CVD), that's the leading cause of loss of life globally. To address this problem, Explainable AI (XAI) has
emerged as a new paradigm in AI studies that specializes in growing fashions that could offer a clean and comprehensible
clarification in their decision-making manner.
Explainable AI (XAI) refers to the development of artificial intelligence systems that may provide clean and understandable causes
for their decision-making processes.
The purpose of XAI is to make AI systems greater transparent and responsible, so that human beings can consider them and make
informed decisions primarily based on their output. XAI objectives to enhance the transparency and interpretability of AI
algorithms, making them more accessible and useful for clinicians and patients. XAI is an active area of research, and there are
numerous exclusive procedures to developing explainable AI structures.
A few strategies include building AI structures that use simpler models or are trained on smaller, more comprehensible datasets, or
creating visualization equipment that permit people to look the internal workings of an AI system. other strategies contain
developing algorithms that can generate natural language motives for his or her output or that could spotlight the most relevant
capabilities in the facts that contributed to a selected selection.
In general, XAI is a vital area inside the development of accountable and ethical AI systems, and it is likely to play an increasingly
more vital position as AI becomes more incorporated into our everyday lives.
The Explainable AI (XAI) program goals to create a collection of system gaining knowledge of techniques that:
1) Produce more explainable models, while maintaining a high stage of gaining knowledge of performance (prediction accuracy)
2) Enable human users to understand, appropriately trust, and efficaciously manage the emerging generation of artificially
intelligent partners.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5578
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Explainable AI (XAI) is an emerging field that goals to make artificial intelligence (AI) more transparent and understandable to
human beings. within the context of cardiovascular disease evaluation[1], XAI can assist clinicians and researchers understand how
AI models arrive at their predictions, that can result in better decision-making and patient outcomes. a number of the goals of XAI
for cardiovascular disease analysis encompass:
a) Transparency: XAI can assist uncover the black box of AI models, permitting clinicians and researchers to peer the reasoning
behind the model's predictions.
b) Interpretability: XAI can help give an explanation for how AI model arrive at their predictions, permitting clinicians and
researchers to understand the factors that contribute to the model's output.
c) Trust: by way of making AI greater transparent and interpretable, XAI can assist build consider among clinicians and AI
models, that is crucial for the adoption of AI in healthcare.
d) Scientific selection-making: XAI can help clinicians make greater knowledgeable decisions by means of supplying them with
the reason behind the AI model’s predictions.
e) Patient Outcomes: By means of allowing better decision-making, XAI can in the long run result in advanced patient
consequences within the prognosis, remedy, and control of cardiovascular ailment.
A. Dataset
For our system we have used heart disease dataset, which is collected from UCI repository. All information about the dataset is
given in following table.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5579
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
A. LIME
LIME (Local Interpretable Model-agnostic Explanations) is an explainable AI technique used to interpret the predictions made by a
machine learning model at the local level. It provides an explanation of how a particular prediction is made by the model by
highlighting the important features that contributed to the prediction. Here we are using random forest as our model to generate
explanations
LIME creates a surrogate model that is trained on a small subset of the original data, in the vicinity of the prediction to be explained.
The surrogate model is usually a linear model or a decision tree that is easier to interpret. The important features are identified by
perturbing the input features and observing the impact on the prediction. The features that have the greatest impact on the prediction
are considered the most important.
LIME can be applied to a wide range of machine learning models, including complex models such as neural networks. It is model-
agnostic, which means it can be used with any machine learning algorithm, including black box models.
The output of LIME is a set of local feature importance scores that indicate the importance of each input feature in the prediction for
a particular instance. These scores can be visualized using various techniques such as bar charts or heat maps, to help users
understand how the model makes predictions[2].
LIME is useful for explaining individual predictions and gaining insights into how a machine learning model works. It can help
increase the transparency and trust in the model by providing interpretable explanations for its predictions.
Mathematical formula
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5580
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
B. SHAP
Shapley Additive exPlanations (SHAP) values determine how each attribute contributes to the model’s prediction. The technique
assigns an importance value to each feature or risk factor based on how much it contributes to the prediction made by the model.
This importance value is then used to create an explanation for the model's prediction. In the context of cardiovascular disease
analysis, SHAP can help identify the most important risk factors associated with the disease. These risk factors may include age,
gender, blood pressure, cholesterol levels, smoking status, and family history of heart disease. By understanding the importance of
these risk factors, doctors and healthcare professionals can make more informed decisions about treatment and prevention strategies
for patients.
SHAP sets a mean prediction (base value) of the model and identifies the relative contribution of every feature to the deviation of
the target from the base. It can give both local as well as global explanations[3].
Mathematical Formula:
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5581
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
1) Local Explanations
We will execute on several instances in order to show how the SHAP values behave as a local explanation method. Local
explanations refer to the explanation of a single prediction made by a machine learning model. Specifically, it involves calculating
the contribution of each feature (or variable) to the prediction made by the model for a particular instance (or observation) in the
dataset.
2) Global Explanations
The collective SHAP values got shows how much each feature contributes, how it contributes i.e. positively or negatively to the
final prediction. Global SHAP explanations refer to an explanation of how the model behaves on average across the entire dataset.
This means that for each feature, the global SHAP value represents its average contribution to the model output over all the
instances in the dataset. Global SHAP values can be visualized using a SHAP summary plot, which shows the features sorted by
importance and the direction of their impact on the model's output.
Fig 5: Impact of feature with scale positive and negative in the prediction
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5582
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
C. What-if Tool
The What-If Tool (WIT) provides an easy-to-use interface for expanding understanding of a black-box classification or regression
ML model. What-If Tool is an interactive and user-friendly visual interface used for exploring and understanding machine learning
models. It allows users to perform "what-if" analysis on a trained model by modifying input features and observing the effect on the
model's predictions in real-time.
The What-If Tool provides several features that help users to understand how the model works and how to improve its performance.
Some of the key features of the What-If Tool include:
1) Data exploration: Users can visualize and explore the dataset used to train the model, and filter data based on different criteria.
2) Model analysis: Users can examine the model's performance on different subsets of the data, and evaluate its accuracy and bias.
3) Input modification: Users can modify the input features to the model and observe the effect on the prediction. This allows users
to test different scenarios and understand how the model responds.
4) Counterfactual analysis: Users can generate counterfactual examples by modifying input features to achieve a desired outcome.
This can help users understand how to improve the model's performance and make it more fair and transparent.
The What-If Tool is an open-source software developed by Google, and can be used with a wide range of machine learning models
and input data types. It can be used in various applications such as healthcare, finance, and marketing, to provide insights and
recommendations for decision-making.
In proposed system What-if analysis is done using TensorFlow decision-forest algorithm.
4) ROC Curve
A Receiver Operating Characteristic (ROC) curve is a graphical representation of the performance of a binary classifier. The ROC
curve plots the true positive rate (sensitivity) against the false positive rate (1 - specificity) for different threshold values of the
classifier.
The true positive rate (TPR) is the proportion of actual positive samples that are correctly classified as positive by the classifier. The
false positive rate (FPR) is the proportion of actual negative samples that are incorrectly classified as positive by the classifier.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5583
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
The area under the ROC curve (AUC) is a widely used metric to evaluate the overall performance of a binary classifier. The AUC is
a number between 0 and 1, with a higher value indicating better classifier performance. An AUC of 1 corresponds to a perfect
classifier, while an AUC of 0.5 corresponds to a random classifier.
5) Confusion Matrix
1) True positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
2) True negatives (TN): We predicted no, and they don't have the disease.
3) False positives (FP): We predicted yes, but they don't actually have the disease.
4) False negatives (FN): We predicted no, but they actually do have the disease.
Fig 7. Datapoint editor in WIT showing actual and counterfactual values for every feature
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5584
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Fig 9. Scatter plot of ST-slope and exercise angina colored by Chest pain type
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5585
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
III. RESULT
By analysis of the above algorithms performed on the dataset, we observed that the accuracy is majorly increased due to Random
Forest after preprocessing data using normalization, label encoding and k-fold validation.
Random Forest is an ensemble learning method that creates multiple decision trees and aggregates their predictions to make a final
prediction. The feature importance score is calculated based on the reduction in impurity that is achieved when a feature is used to
split the data. The feature importance score can be used to identify the most important features in a dataset and to understand which
features have the greatest impact on the model's predictions.
In scikit-learn, the feature importance score is normalized to add up to 1, and can be accessed through the feature_importances_
attribute of the trained RandomForestRegressor or RandomForestClassifier object. This attribute returns an array of importance
scores, with the index of each score corresponding to the index of the input feature.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5586
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
IV. CONCULSION
An application of this concept is in healthcare sector to provide addition information about a disease on the basis of previous data.
The explanations generated can correct or incorrect in both cases useful insights are gained. If explanations are correct the trust in
model increase hence, encouraging use of XAI and its methodologies in healthcare. If the explanations are incorrect it implies our
model is not trained enough. This gives scope for improvement and development of better models and experimentation. The clinical
can get better insights about a health condition by looking at the data.
REFERENCES
[1] Urja Pawar, Donna O’Shea, Susan Rea, Ruairi O’Reilly “Explainable AI in Healthcare”.
[2] Marco Tulio Riberio, Sameer Singh, Carlos “Why Should I Trust You?” Explaining the Predictions of Any Classifier.
[3] Scott M. Lundberg, Su-In Lee “A Unified Approach to Interpreting Model Predictions”.
[4] Sandra Wachter, Brent Mittelstadt & Chris Russell “CounterFractual Explanations Without opening the black box: Automated Decision and the GDPR.
[5] Deepti Saraswat, Pronaya Bhattacharya, Ashwin Verma, Vivek Kumar Prasad, Sudeep Tanwar, Gulshan Sharma, Pitshou N. Bokoro & Ravi Sharma
“"Explainable AI for healthcare 5.0: opportunities and challenge”.
[6] Wagle, V., Kaur, K., Kamat, P., Patil, S., & Kotecha, K. (2021). Explainable ai for multimodal credibility analysis: Case study of online beauty health (mis)-
information. IEEE Access, 9, 127985-128022.
[7] J. Wexler, et al.,"The What-If Tool: Interactive Probing of Machine Learning Models".
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5587