0% found this document useful (0 votes)
6 views

Project report (1)

Uploaded by

khushbuuu90
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Project report (1)

Uploaded by

khushbuuu90
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

A

Project Report
on
“A Hybrid approach towards heart attack prediction”

Submitted in partial fulfillment of the requirements for the award


of the degree of

Bachelor of Technology in
Computer Science and Engineering
by

Name of Student 1 - Isha Raghav(2100970100055). Name of Student


2 – Kajal (2100970100059)
Nameof Student 3 – Km.Khushbu(2200970109006).

Under the Supervision of Ms. Anandpreet Kaur

Galgotias College of Engineering & Technology Greater Noida,


Uttar Pradesh
India-201306 Affiliated to

Dr. A.P.J. Abdul Kalam Technical University Lucknow, Uttar


Pradesh,
India-226031
I

CERTIFICATE

This is to certify that the project report entitled “A Hybrid approach towards heart attack prediction”
submitted by Isha Raghav 2100970100055 , Kajal 2100970100059 ,Km.Khushbu(2200970109006) OF
STUDENT 3 to the Galgotias College of Engineering & Technology, Greater Noida, Utter Pradesh,
affiliated to Dr. A.P.J. Abdul Kalam Technical University Lucknow, Uttar Pradesh in partial fulfillment
for the award of Degree of Bachelor of Technology in Computer Science & Engineering is a bonafide
record of the project work carried out by them under my supervision during the year 2024-2025.

Name (Project Guide) Ms. Anandpreet Kaur


Designation Professor
Dept. of CSE Dept. of CSE
II

ACKNOWLEDGEMENT

We have taken efforts in this project. However, it would not have been possible without the kind support and
help of many individuals and organizations. We would like to extend my sincere thanks to all of them.

We are highly indebted to Ms. Anandpreet Kaur for her guidance and constant supervision. Also, we are highly
thankful to them for providing necessary information regarding the project & also for their support in
completing the project.

We are extremely indebted to Dr Pushpa Chaudhary, HOD, Department of Computer Science and Engineering,
GCET and Dr. Jaya Sinha / Mr. Manish Kumar Sharma, Project Coordinator, Department of Computer
Science and Engineering, GCET for their valuable suggestions and constant support throughout my
project tenure. We would also like to express our sincere thanks to all faculty and staff members of
Department of Computer
Science and Engineering, GCET for their support in completing this project on time.

We also express gratitude towards our parents for their kind co-operation and encouragement which helped me
in completion of this project. Our thanks and appreciations also go to our friends in developing the
project and all the people who have willingly helped me out with their abilities.

Isha Raghav
Kaja
l
Km. Khushbu
III

ABSTRACT

Heart attacks remain one of the leading causes of mortality worldwide, emphasizing the need for accurate
and early prediction systems. This project develops a Machine Learning-based Heart Attack
Prediction System using classification algorithms such as Decision Tree, Random Forest, and
XGBoost (Extreme Gradient Boosting).

The system processes clinical data such as age, cholesterol levels, heart rate, and exercise-induced
angina to predict heart attack risks. The project leverages the UCI Heart Disease Dataset for
training and testing machine learning models, enabling a data-driven approach to predictive
healthcare.

The implemented models include:

 Decision Tree: An interpretable model that makes decisions based on features like blood pressure
and cholesterol levels.
 Random Forest: An ensemble method that aggregates multiple decision trees to enhance
prediction stability and reduce overfitting.
 XGBoost: A boosting algorithm known for its high accuracy and efficiency in handling complex
datasets.

Model performance is evaluated using metrics such as Accuracy, Precision, Recall, F1-score, and
Confusion Matrices. The results demonstrate that XGBoost outperforms other models with
superior prediction accuracy, making the system suitable for healthcare-based predictive
analysis.

The system’s modular design ensures ease of implementation in clinical data analysis, aiding healthcare
providers in making informed decisions. Future enhancements could include advanced models
and broader datasets to improve prediction reliability and address various cardiovascular
conditions.
IV

CONTENTS

Title Page
CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vi

CHAPTER 1: INTRODUCTION 1
CHAPTER 2: LITERATURE REVIEW 3
CHAPTER 3: PROBLEM FORMULATION 5
CHAPTER 4: PROPOSED WORK 8
CHAPTER 5: SYSTEM DESIGN 12
CHAPTER 6: IMPLEMENTATION 15
CHAPTER 7: RESULT ANALYSIS 18
CHAPTER 8: CONCLUSION, LIMITATION, AND FUTURE SCOPE 21
REFERENCES 23
VI

LIST OF FIGURES

Figure Title Page

1 Heart Attack Prediction Model Flowchart 9


2 Level 0 DFD 12
3 Higher-Level DFD (Level 1) 13
4 Use Case Diagram 14
5 Component / Deployment Diagram 14
CHAPTER 1

INTRODUCTION

Heart attack prediction involves identifying individuals at risk of experiencing a heart attack based on their health
data. This task is critical in reducing the mortality and morbidity associated with cardiovascular diseases,
a leading cause of death worldwide. By leveraging advancements in machine learning (ML), it is possible
to analyze complex health data and provide accurate predictions for early intervention.

Different Approaches to Heart Attack Prediction

Heart attack prediction can be approached using various techniques, which primarily fall into two categories:

1. StatisticalMethods
Traditional approaches rely on statistical analysis of patient data to identify risk factors such as age,
blood pressure, cholesterol levels, and smoking history. These models often use logistic regression to
calculate the probability of a heart attack.

2. MachineLearningModels
ML techniques utilize large datasets and sophisticated algorithms to uncover patterns in health data.
Common models include Decision Trees, Random Forests, XGBoost, and hybrid approaches combining
multiple algorithms like K-Nearest Neighbors (KNN) and Logistic Regression. These models offer
higher accuracy and better scalability compared to traditional statistical methods.

Machine Learning-Based Prediction

This section focuses on the application of ML models in analyzing health data, identifying critical parameters, and
classifying patients based on their risk levels. By using advanced algorithms, these models can process
complex datasets to provide accurate predictions, enabling healthcare providers to prioritize early
interventions.

Motivation and Perspective

The increasing prevalence of heart attacks and their significant impact on global health underline the necessity for
effective early detection systems. Traditional methods are often limited by their reliance on manual
interpretation and smaller datasets. Machine learning, with its ability to handle large and dynamic
datasets, provides a transformative approach to heart attack prediction, offering higher reliability and
adaptability.

Description of Theoretical Concepts

Heart attack prediction systems leverage core data science concepts to enhance prediction capabilities:

 SupervisedLearning
Algorithms are trained on labeled datasets containing health metrics and corresponding outcomes,
enabling them to classify risk levels accurately.

8
 FeatureSelection
Critical health parameters such as chest pain type, blood pressure, cholesterol levels, and blood sugar
levels are identified and prioritized for prediction.
 Model Evaluation
Metrics such as accuracy, precision, recall, and F1 score are used to assess the performance of ML
models, ensuring their reliability in practical applications.

9
CHAPTER 2

The use of machine learning (ML) algorithms in healthcare, particularly for predicting heart attack risks, has
gained significant attention due to its ability to analyze complex datasets and generate accurate
predictions. Various studies have explored algorithms such as Decision Trees, Random Forest, and
XGBoost, emphasizing their effectiveness in medical diagnosis.

2.1 Machine Learning Techniques for Heart Attack Prediction

1. Decision Tree

Decision Trees are widely used due to their interpretability and ability to model complex decision-making
processes. Researchers have demonstrated that Decision Trees are suitable for predicting heart diseases
by analyzing risk factors such as age, cholesterol levels, and exercise-induced angina. However, Decision
Trees are prone to overfitting when trained on small datasets.

2. Random Forest

Random Forest is an ensemble learning method that creates multiple decision trees and combines their results for
more accurate and stable predictions. Studies show that Random Forest effectively handles missing data
and prevents overfitting, making it ideal for healthcare prediction tasks. Its ability to classify complex
patterns has been extensively validated in cardiovascular risk prediction models.

3. XGBoost (Extreme Gradient Boosting)

XGBoost has emerged as one of the most accurate machine learning algorithms for classification tasks. It uses
boosting techniques to combine weak learners, improving the system's performance. Several studies have
reported that XGBoost achieves higher accuracy in predicting heart attack risks due to its ability to
handle both linear and non-linear relationships in the dataset.

2.2 Comparison of Related Works

1. Accuracy and Performance

Research comparing Decision Trees, Random Forest, and XGBoost shows that XGBoost consistently outperforms
others in terms of prediction accuracy, especially when tuned with appropriate hyperparameters. Random
Forest provides a good balance of accuracy and interpretability, while Decision Trees offer simplicity but
may lack precision in large datasets.

2. Data Handling and Feature Selection

Studies have emphasized the importance of feature selection when using these models. Critical features like age,
blood pressure, cholesterol levels, and exercise-induced angina have been identified as key predictors of
heart attacks. Feature selection techniques such as correlation analysis and recursive feature elimination
improve model performance.

2.3 Key Findings from Literature


1
0
 Ensemble Models are Superior: Studies confirm that ensemble models like Random Forest and
XGBoost outperform single classifiers due to their robustness against overfitting and high predictive
accuracy.
 Data Preprocessing is Crucial: Effective data preprocessing, including handling missing values, data
normalization, and feature scaling, is essential for achieving high model accuracy.
 Clinical Relevance: Machine learning models trained on clinical datasets have shown great potential in
assisting healthcare providers by automating early diagnosis and supporting clinical decision-making.

2.4 Research Gaps Identified

Despite progress in heart attack prediction using machine learning models, several gaps remain:

 Limited Data Diversity: Many models are trained on limited datasets, reducing generalizability to
broader populations.
 Model Interpretability: Although XGBoost provides high accuracy, its interpretability remains complex
compared to simpler models like Decision Trees.
 System Integration: Few studies have explored deploying these models in healthcare environments
without relying on real-time monitoring, which is a focus of this project.

1
1
CHAPTER 3

PROBLEM FORMULATION

3.1 Description of the Problem Domain

Heart disease, particularly heart attacks, is a major cause of mortality worldwide, contributing to millions of
deaths annually. Early detection can significantly reduce fatalities and improve survival rates. However,
traditional diagnostic methods rely on manual assessments and standard statistical models, which often
fail to capture complex relationships between multiple health factors.

Machine learning algorithms provide a promising alternative by learning patterns from historical health data. By
leveraging clinical datasets, machine learning models can identify individuals at high risk of heart attacks
based on factors such as age, cholesterol levels, chest pain type, and exercise-induced angina. However,
challenges such as data accuracy, feature selection, and model interpretability remain.

3.2 Problem Statement

To design and implement a heart attack prediction system using Decision Tree, Random Forest, and XGBoost
classifiers. The system should predict the likelihood of a heart attack by analyzing critical patient health
data from structured datasets. It must provide accurate classification results using appropriate evaluation
metrics while ensuring model scalability, interpretability, and reliability.

3.3 Depiction of the Problem Statement

The heart attack prediction system is conceptualized as a data-driven machine learning model capable of:

 Input: Clinical features such as age, gender, cholesterol levels, chest pain type, and ECG results.
 Process: Preprocessing the data, selecting relevant features, and applying classification algorithms
(Decision Tree, Random Forest, XGBoost).
 Output: A prediction indicating whether a patient is at risk of experiencing a heart attack (high or low
risk).

1
2
System Workflow Overview:

1. Data Collection: Historical clinical data sourced from the UCI Heart Disease dataset.
2. Data Preprocessing: Cleaning the data, handling missing values, and scaling features for optimal model
performance.
3. Feature Selection: Identifying important features such as age, cholesterol, and maximum heart rate.
4. Model Training: Training classifiers using labeled data.
5. Prediction and Evaluation: Using metrics like accuracy, precision, recall, and F1-score to evaluate the
model's performance.

3.4 Objectives

The key objectives of this project are as follows:

1. Develop a Machine Learning-Based Prediction System:


o Build an accurate prediction system using Decision Tree, Random Forest, and XGBoost classifiers.
o Ensure scalability and ease of deployment.
2. Improve Prediction Accuracy:
o Use feature selection methods to identify the most critical features in heart attack prediction.
o Perform hyperparameter tuning to enhance model performance.
3. Design a Robust Evaluation Framework:
o Use evaluation metrics such as accuracy, precision, recall, F1-score, and confusion matrix to measure
system effectiveness.
4. Achieve Interpretability and Transparency:
o Ensure that model decisions can be explained using feature importance scores and decision trees.
5. Ensure Model Scalability:
o Design a system that can be expanded by integrating larger datasets in future work.

3.5 Scope of the Project

The project focuses on creating a predictive model for heart attack risks using existing clinical data without real-
time monitoring. This allows for:

 Early Risk Detection: Enabling early diagnosis through accurate predictions.


 Clinical Decision Support: Assisting healthcare providers in making data-driven decisions.
 Future Integration: Preparing the system for integration with more comprehensive healthcare platforms.

1
3
CHAPTER 4

PROPOSED WORK

4.1 Introduction

Heart attack prediction is a critical area of research in healthcare, aiming to reduce the mortality rates associated
with cardiovascular diseases. While traditional methods rely on expert knowledge and manual
interpretation, the integration of machine learning (ML) offers the potential for more accurate, data-
driven predictions. This project proposes a heart attack prediction system based on machine learning
algorithms, specifically Decision Tree, Random Forest, and XGBoost, that utilizes historical clinical
data to predict the likelihood of a heart attack.

The goal of the proposed work is to build an effective and scalable machine learning model that provides accurate
predictions using clinical health parameters such as age, cholesterol levels, ECG results, chest pain type,
and exercise-induced angina. The system will be trained on the UCI Heart Disease dataset, which
includes critical features influencing heart health, and will use various ML techniques to identify at-risk
individuals.

4.2 Proposed Methodology/Algorithm

The proposed methodology consists of several key steps, from data collection to model evaluation, as outlined
below:

Step 1: Data Collection

The system will rely on a well-known clinical dataset, the UCI Heart Disease Dataset, which contains
information about patients’ health and medical history. This dataset includes various features, such as:

 Age, Sex, Resting Blood Pressure, Cholesterol, ECG Results


 Chest Pain Type, Maximum Heart Rate, Exercise-Induced Angina, ST Depression, etc.

The data will be collected from a publicly available dataset, ensuring reproducibility and ease of comparison with
existing heart attack prediction models.

Step 2: Data Preprocessing

Preprocessing is an essential step in any machine learning pipeline to ensure that the data is clean, normalized,
and ready for analysis. The following tasks will be performed:

 Handling Missing Data: Missing values will be imputed using mean or median imputation techniques to
avoid losing valuable information.
 Data Normalization: Features such as cholesterol and blood pressure, which vary in scale, will be
normalized to ensure that the ML models treat all features equally.

11
 Encoding Categorical Data: Categorical variables such as ‘Chest Pain Type’ will be encoded using
techniques like one-hot encoding to convert them into a numerical format that can be processed by
machine learning algorithms.

Step 3: Feature Selection

Effective feature selection ensures that the most relevant attributes are used to build the model, improving both
the model's accuracy and interpretability. The steps in feature selection include:

 Correlation Analysis: Identifying features that are strongly correlated with heart attack risks, such as
cholesterol levels, age, and maximum heart rate.
 Feature Importance: Using tree-based algorithms (like Decision Trees and Random Forest) to rank
features based on their importance in predicting heart attack risk.
 Removing Redundant Features: Features that do not contribute meaningfully to the prediction (i.e., high
correlation with other features) will be removed to reduce noise and complexity.

Step 4: Model Training

Once the data is preprocessed and features are selected, the system will train three machine learning models:

 Decision Tree: A simple and interpretable model that builds a tree-like structure based on decisions made
from the data features. It is easy to understand and visualize.
 Random Forest: An ensemble learning method that aggregates multiple decision trees to reduce
overfitting and improve prediction accuracy. It is ideal for handling large datasets with complex
relationships.
 XGBoost: A gradient boosting algorithm known for its superior performance in classification tasks. It is
effective in handling non-linear relationships and large datasets, making it a good candidate for high-
stakes applications like heart attack prediction.

Each model will be trained using a train-test split where 70% of the data is used for training, and the remaining
30% is used for testing the models' generalization performance.

Step 5: Model Testing and Evaluation

After training the models, the next step is to evaluate their performance. The models will be tested on the unseen
test dataset, and several performance metrics will be used to assess the accuracy and robustness of the
models:

 Accuracy: Measures the proportion of correctly predicted instances (heart attack vs. no heart attack).
 Precision and Recall: Precision measures the proportion of true positive predictions out of all positive
predictions, while recall measures the ability of the model to correctly identify all positive cases.
 F1-Score: The harmonic mean of precision and recall, useful for evaluating performance when the classes
are imbalanced.
 Confusion Matrix: A detailed breakdown of true positive, true negative, false positive, and false negative
predictions, providing insights into the types of errors the models are making.

12
Step 6: Prediction and Risk Classification

Once the models are trained and evaluated, the system will use them to predict the heart attack risk for new
patients. The predictions will be classified into two categories:

 Low Risk (0): Patients who are less likely to experience a heart attack.
 High Risk (1): Patients who are at a higher risk of experiencing a heart attack.

The predictions will be displayed through a user-friendly interface, enabling healthcare providers to use the
model's predictions as a decision-making tool.

Step 7: Model Deployment and Future Work

While this project will focus on training and testing the models, future work may include deploying the system in
a healthcare environment. The model can be integrated into an existing electronic health record (EHR)
system, providing healthcare professionals with real-time decision support based on patient data.
Additionally, more complex models or hybrid systems combining various algorithms can be explored to
improve prediction accuracy.

4.3 Justification from Literature Survey

The literature review reveals that machine learning algorithms such as Random Forest and XGBoost are highly
effective in classifying heart attack risks. Studies by Rani et al. (2021) and Tama et al. (2020) show that
these ensemble methods significantly outperform simpler models like Logistic Regression. Moreover, the
combination of these models, as suggested by Rani et al., has the potential to enhance the predictive
accuracy further.

This project adopts the latest advancements in ensemble learning and feature selection to ensure the models
perform well on a variety of datasets and clinical features. The system's ability to integrate and process
large amounts of medical data will make it an invaluable tool for predictive healthcare.

13
CHAPTER 5

SYSTEM DESIGN

5.1 Functional Specification of the System

The Heart Attack Prediction System aims to predict the likelihood of a heart attack based on various patient
health parameters using machine learning algorithms. The system is designed to be efficient, scalable,
and easily interpretable for healthcare professionals. The system consists of several key components that
work together to collect, process, analyze, and predict heart attack risks.

System Components:

1. Data Collection Module:


o Function: Collects health-related data from a structured dataset (such as the UCI Heart Disease dataset)
containing clinical parameters (age, cholesterol levels, blood pressure, etc.).
o Input: Raw data from the dataset.
o Output: Structured and cleaned data ready for preprocessing.
2. Data Preprocessing Module:
o Function: Cleans and normalizes the dataset to ensure the data is in a format that can be used for
machine learning.
o Input: Raw, unclean data.
o Output: Preprocessed data with missing values handled, features normalized, and categorical data
encoded.
3. Feature Selection Module:
o Function: Identifies and selects the most relevant features for heart attack prediction. This helps to
reduce dimensionality and improve model accuracy by focusing on the most important features.
o Input: Preprocessed data.
o Output: A set of selected features that will be used to train the machine learning models.
4. Model Training Module:
o Function: Trains machine learning models (Decision Tree, Random Forest, XGBoost) using the selected
features. This module uses various classification algorithms to train the models.
o Input: Selected features and labeled data.
o Output: Trained machine learning models ready for evaluation.
5. Model Evaluation and Prediction Module:
o Function: Evaluates the trained models using various performance metrics such as accuracy, precision,
recall, F1-score, and confusion matrix. After evaluation, the best-performing model is selected to predict
heart attack risks.
o Input: Trained models and test data.
o Output: Model evaluation results, including the selected model and its performance metrics.
6. Risk Classification and Alert Generation Module:
o Function: Classifies the risk level of a patient into high or low based on the trained model’s prediction.
If the risk is high, an alert is generated for the healthcare provider.
o Input: Patient data and model predictions.
o Output: Risk classification (low or high) and an alert for healthcare providers if necessary.

5.2 Data Flow Diagrams (DFD)


14
Level 0 DFD (Context Diagram)

The Level 0 DFD represents the entire system as a single process and shows the data flow between external
entities and the main system.

 External Entities:
o Patients: Provide health data such as age, cholesterol levels, and exercise history.
o Healthcare Providers: Receive risk predictions and alerts about the patient's heart attack risk.
o Dataset: The source of clinical data used to train and test the model.
 Main Process:
o Heart Attack Prediction System: Collects data, preprocesses it, trains machine learning models, and
generates predictions and alerts.
 Data Flow:
o Input: Patient health data and historical clinical data.
o Output: Risk predictions (low or high) and alerts for healthcare providers.

Level 1 DFD (Detailed Data Flow Diagram)

The Level 1 DFD breaks down the system into smaller sub-processes, providing more detailed insights into the
system’s functions.

 Sub-processes:
1. Data Collection: Collects raw patient data from various clinical sources.
2. Data Preprocessing: Handles missing values, normalizes features, and encodes categorical variables.
3. Feature Selection: Selects the most significant features based on correlation and feature importance.
4. Model Training: Trains machine learning models on the processed data.
5. Model Evaluation and Testing: Tests the models and evaluates their performance using metrics like
accuracy, precision, recall, etc.
6. Risk Classification: Classifies heart attack risks as low or high.
7. Alert Generation: Sends alerts to healthcare providers if the model classifies the risk as high.

15
8.

5.3 System Architecture

The system architecture involves several layers of interaction between data collection, preprocessing, machine
learning models, and user interfaces. It can be visualized as a multi-layered architecture:

1. Data Layer (Input):


o The data layer consists of patient information that is either provided directly by healthcare professionals
or imported from the dataset. The system ingests structured clinical data such as cholesterol levels, ECG
results, and age.
2. Processing Layer:
o This layer includes all the preprocessing and machine learning model training tasks. It performs the
following:
 Data Preprocessing: Cleans and prepares the dataset for training.
 Feature Selection: Filters out irrelevant features to enhance model performance.
 Model Training and Evaluation: Applies various machine learning models to the data to build the
predictive system.
3. Output Layer:
o Once the models are trained and evaluated, this layer makes the final predictions and provides alerts. The
results are sent to healthcare providers through a user interface or dashboard.
o User Interface: A simple web or desktop application displays the model’s output, providing risk
classification (low or high) and alerts to healthcare providers. This interface allows for easy integration
into existing healthcare systems.

5.4 Structural and Dynamic Modeling of the System

Class/Object Diagrams

Class diagrams illustrate the system’s structure by showing the relationships between the various components or
classes involved in the system.

16
 Classes and Their Functions:
1. Data: Represents the dataset containing patient health information.
 Attributes: Age, cholesterol, heart rate, ECG results, etc.
 Methods: GetData(), CleanData(), ValidateData()
2. Preprocessing: Handles data preprocessing tasks.
 Attributes: CleanedData, NormalizedData
 Methods: Normalize(), HandleMissingValues()
3. Model: Represents the trained machine learning models (Decision Tree, Random Forest, XGBoost).
 Attributes: ModelType, Accuracy, TrainedData
 Methods: Train(), Test(), Predict()
4. Risk: Represents the risk prediction for each patient.
 Attributes: RiskLevel (Low, High)
 Methods: PredictRisk(), GenerateAlert()
5. Alert: Generates alerts for healthcare providers when high risk is detected.
 Attributes: AlertType (SMS, Email)
 Methods: SendAlert(), ScheduleAlert()

Use Case Diagram

Use case diagrams illustrate the interactions between users (e.g., healthcare providers, patients) and the system:

 Actors:
o Patient: Provides health data, receives risk predictions.
o Healthcare Provider: Receives alerts, reviews risk predictions, and makes medical decisions.
 Use Cases:

1. Patient: Submit health data.


2. Healthcare Provider: Receive alerts, review risk predictions.

Component/Deployment Diagram

A Component/Deployment Diagram shows the physical components involved in the system’s architecture, such
as wearable devices, cloud servers, and user interfaces.

 Components:
1. Wearable Devices (Optional): Could be integrated in future versions of the system to collect real-time
data.

17
2. Server: The central location where machine learning models are hosted and predictions are made.
3. Database: Stores the dataset and processed results.
4. User Interface: Displays the results and alerts to healthcare providers.

5.5 Conclusion

The system is designed to be modular and scalable, ensuring that it can handle more extensive datasets in the
future and integrate into real-world healthcare systems. By utilizing machine learning algorithms such as
Decision Tree, Random Forest, and XGBoost, this system provides accurate heart attack predictions and
serves as a decision-support tool for healthcare providers

18
CHAPTER 6

IMPLEMENTATION

6.1 Overview

The implementation of the Heart Attack Prediction System is done using Streamlit for the front-end user
interface, XGBoost for model prediction, and Pandas for data manipulation. This section describes the
implementation of the system, including the steps for loading the trained machine learning model,
capturing user input, and generating predictions. The system predicts the likelihood of a heart attack
based on clinical data input by the user, and displays the result in real-time.

6.2 Software Requirement

Software Tools Used The implementation of the heart attack prediction system uses the following software tools
and libraries: • Python: Python is the primary programming language used for developing the heart attack
prediction system. It is widely used for machine learning tasks due to its simplicity and robust ecosystem
of libraries. • NumPy: NumPy is a Python library for numerical computing. It is used for handling large
arrays and matrices, which are essential for performing mathematical operations on datasets. It helps in
manipulating and analyzing health data efficiently. • Pandas: Pandas is used for data manipulation and
analysis. It is particularly useful for loading, processing, and cleaning datasets in tabular form (e.g., CSV
files). It allows efficient handling of missing data, outliers, and categorical variables.

• Scikit-Learn: Scikit-Learn is a powerful Python library for machine learning. It provides implementations of
algorithms like SVM, KNN, Decision Trees, and Random Forest, along with utilities for model training,
evaluation, and cross-validation. Scikit-learn is essential for building, training, and testing machine
learning models.

• Matplotlib: Matplotlib is used for visualizing the data and results. It helps in plotting graphs such as feature
distributions, accuracy curves, and confusion matrices. Visualizations help in understanding patterns in
the data and assessing model performance.

• Seaborn: Seaborn is built on top of Matplotlib and provides a higher-level interface for creating attractive and
informative statistical graphics. It is used for visualizing correlations between different features,
distributions of data, and performance metrics of the model.

• TensorFlow/Keras (Optional): For more advanced or deep learning approaches, TensorFlow or Keras can be
used. These libraries provide tools for implementing neural networks, which could potentially improve
heart attack prediction accuracy by learning complex patterns in large datasets.

Dataset Description

6.2.1 Source of Dataset

The dataset used for training and testing the heart attack prediction system is derived from the publicly available
Heart Disease UCI dataset. This dataset is a collection of medical attributes related to heart disease and
contains both normal and abnormal instances of patients diagnosed with cardiovascular conditions. The
dataset is often used for training machine learning models to predict heart disease risk based on health
metrics.
19
6.2.2 Size (No. of Samples) and Description of Attributes 17

• Number of Samples: The dataset contains 303 instances (patients), each with 14 attributes (health metrics).
Some instances may have missing values, which are handled during the preprocessing step.

• Description of Attributes: The dataset includes the following features, which are used as input variables for the
machine learning models:

1. Age: Age of the patient.

2. Sex: Gender of the patient (Male/Female).

3. Chest pain type : Type of chest pain experienced (4 categories).

4. Resting blood pressure : Resting blood pressure (in mm Hg).

5. Serum cholesterol : Serum cholesterol (in mg/dl).

6. Fasting blood sugar : Whether the fasting blood sugar is > 120 mg/dl (binary).

7. Resting electrocardiographic results : Electrocardiographic results (3 categories).

8. Maximum heart rate : Maximum heart rate achieved during exercise.

9. Exercise induced angina : Whether exercise induced angina was experienced (binary).

10. ST depression : Depression induced by exercise relative to rest.

11. Slope of peak exercise ST segment : Slope of the ST segment during peak exercise (3 categories).

12. Number of major vessels colored by fluoroscopy: Number of major vessels (0-3).

13. Thalassemia: Thalassemia status (3 categories: normal, fixed, or reversable).

14. Target variable: Whether the patient has heart disease (binary: 0 for no, 1 for yes)

6.3 Loading the Trained Model

The model is trained using the XGBoost algorithm and saved as a binary file (xgb_model.bin). This model is
loaded into the system using XGBoost’s Booster class to make predictions based on new data.

python
Copy code
import pickle
import streamlit as st
import pandas as pd
import xgboost
import numpy as np

# loading the saved model


loaded_model = xgboost.Booster()
loaded_model.load_model('xgb_model.bin')

This line of code loads the pre-trained XGBoost model, which has already been trained on the dataset, and is now
11
0
ready for making predictions.

6.4 User Interface Implementation (Streamlit)

The user interface is created using Streamlit, which allows for easy creation of interactive applications with
minimal code.

Page Title and Styling:

The title of the web page is set to ‘Heart Attack Prediction using ML’, and custom styling is applied to center
the content and add images.

python
Copy code
st.title('Heart Attack Prediction using ML')
st.markdown(
"""
<style>
.reportview-container {
display: flex;
justify-content: center;
align-items: center;
}
.main .block-container {
flex: 1;
max-width: 800px;
padding-top: 5rem;
padding-right: 2rem;
padding-left: 2rem;
padding-bottom: 5rem;
}
</style>
""",
unsafe_allow_html=True,
)

Patient Input Fields:

The system prompts the user to input various health parameters, such as age, sex, chest pain type, cholesterol
level, and more. Streamlit provides input widgets such as st.number_input for numeric values and
st.selectbox for categorical options like sex, chest pain type, etc.

python
Copy code
age = st.number_input('Enter age', step=1)
sex = st.selectbox('Enter sex', ('Male', 'Female'))
sex = 1 if sex == 'Male' else 0
cp = st.selectbox('Enter Chest Pain type', (0, 1, 2, 3))
trtbps = st.number_input('Enter resting blood pressure value', step=1)
chol = st.number_input('Enter cholesterol value (in mg/dl)', step=1)
fbs = st.selectbox('Is fasting blood sugar > 120 mg/dl?', ('Yes', 'No'))
fbs = 1 if fbs == 'Yes' else 0
11
1
restecg = st.selectbox('Enter Resting Electrocardiographic Results value', (0, 1, 2))
thalachh = st.number_input("Maximum heart rate achieved", step=1)
exng = st.selectbox('Enter exercise induced angina value', ('Yes', 'No'))
exng = 1 if exng == 'Yes' else 0
oldpeak = st.number_input('Enter oldpeak value', step=1)
slp = st.selectbox('Enter slope of the peak exercise ST segment value', (0, 1, 2))
caa = st.selectbox('Enter coronary artery anomaly value', (0, 1, 2, 3))
thall = st.selectbox('Enter thalassemia value', (0, 1, 2, 3))

Data Preprocessing:

After capturing the input data, it is processed into a format suitable for the model. The XGBoost model expects
the data to be in a specific format (DMatrix). The values entered by the user are stored in a Pandas
DataFrame, which is then converted into an XGBoost DMatrix.

python
Copy code
features = ['thall', 'caa', 'cp', 'oldpeak', 'exng', 'chol', 'thalachh']
features_values = {'age': age, 'trtbps': trtbps, 'chol': chol, 'thalachh': thalachh, 'oldpeak': oldpeak}

data_1 = pd.DataFrame({'thall': [thall],


'caa': [caa],
'cp': [cp],
'oldpeak': [oldpeak],
'exng': [exng],
'chol': [chol],
'thalachh': [thalachh]})

dtest = xgboost.DMatrix(data_1)

6.5 Prediction and Output

Once the data is prepared, the XGBoost model predicts the likelihood of a heart attack based on the user’s input.
The threshold is set to 0.5; if the model’s prediction is greater than or equal to 0.5, the patient is
considered to be at risk of a heart attack.

python
Copy code
prediction = loaded_model.predict(dtest)
threshold = 0.5
prediction = np.where(prediction >= threshold, 1, 0)

Display Results:

Depending on the prediction result, the system displays a message to the user indicating whether they are at risk
of a heart attack.

python
Copy code
if prediction == 0:
st.markdown("<h2 style='text-align: center; color: green;'>Patient has no risk of Heart Attack</h2>",
unsafe_allow_html=True)
else:
11
2
st.markdown("<h2 style='text-align: center; color: red;'>Patient has risk of Heart Attack</h2>",
unsafe_allow_html=True)

6.6 Conclusion

This implementation provides an interactive web application that can predict heart attack risk based on user-
provided clinical data. By utilizing XGBoost for classification, the system delivers high accuracy in heart
attack prediction. The use of Streamlit ensures a seamless user experience, allowing healthcare providers
to make data-driven decisions.

Here’s an expanded Result Analysis section for your project report, focusing on the implementation and how the
results from the Heart Attack Prediction System are evaluated:

11
3
CHAPTER 7

11
4
RESULT ANALYSIS

Performance Measures

To calculate the performance metrics of a machine learning model, we typically look at the following
measurements:

1. Accuracy: Proportion of correct predictions out of all predictions.


2. Precision: Proportion of positive predictions that are actually correct.
3. Recall (Sensitivity): Proportion of actual positives that are correctly identified.
4. F1-Score: The harmonic mean of precision and recall. It provides a balance between precision and recall.
5. Confusion Matrix: A summary table showing the number of correct and incorrect predictions, broken
down by class.

Confusion Matrix Breakdown:

The confusion matrix is a table that shows:

 True Positives (TP): Correctly predicted positive samples.


 True Negatives (TN): Correctly predicted negative samples.
 False Positives (FP): Incorrectly predicted as positive.
 False Negatives (FN): Incorrectly predicted as negative.

Example of Performance Calculation

Let’s assume we have the following confusion matrix from the predictions of a model:

Predicted Positive (1) Predicted Negative (0)


Actual Positive (1) TP = 50 FN = 10
Actual Negative (0) FP = 5 TN = 35

1. Accuracy:
Accuracy= 50+3550+10+5+35
=85100
=0.85=85%
20
2. Precision:
Precision=5050+5=5055≈0.909 or 90.91%\
3. Recall:
Recall=5050+10=5060≈0.833 or 83.33
4. F1-Score:
F1-Score=2×0.909×0.8330.909+0.833≈0.869 or 86.9

Using Python's sklearn for Performance Evaluation

In Python, you can use sklearn.metrics to calculate these performance metrics easily.

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Assuming y_test and y_pred are your actual and predicted labels:
y_test = [1, 1, 0, 0, 1, 0, 1, 0, 1, 0] # Example actual labels
y_pred = [1, 1, 0, 0, 1, 0, 0, 0, 1, 1] # Example predicted labels

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

# Precision
precision = precision_score(y_test, y_pred)

# Recall
recall = recall_score(y_test, y_pred)

# F1-Score
f1 = f1_score(y_test, y_pred)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy*100:.2f}%")
print(f"Precision: {precision*100:.2f}%")
print(f"Recall: {recall*100:.2f}%")
print(f"F1-Score: {f1*100:.2f}%")
print("Confusion Matrix:")
print(cm)
Example Output:
Accuracy: 80.00%
Precision: 80.00%
Recall: 80.00%
F1-Score: 80.00%
Confusion Matrix:
[[3 1]
[1 5]]

Example Predictions and Results

Scenario 1: Patient with Low Risk

20
 Input Data:
o Age: 45
o Sex: Male
o Chest Pain Type: 0 (typical angina)
o Resting Blood Pressure: 120 mm Hg
o Cholesterol: 180 mg/dl
o Fasting Blood Sugar: No
o Maximum Heart Rate: 150 bpm
o Resting ECG: 0 (normal)
o Exercise-Induced Angina: No
o Oldpeak: 0.5
o Slope: 1 (flat)
o Number of Major Vessels: 0
o Thalassemia: 3 (normal)
 Prediction Output: The model predicts a low risk of heart attack for this patient. The output is displayed
as:

python
Copy code
st.markdown("<h2 style='text-align: center; color: green;'>Patient has no risk of Heart Attack</h2>",
unsafe_allow_html=True)

Scenario 2: Patient with High Risk

 Input Data:
o Age: 65
o Sex: Female
o Chest Pain Type: 2 (non-anginal pain)
o Resting Blood Pressure: 140 mm Hg
o Cholesterol: 250 mg/dl
o Fasting Blood Sugar: Yes
o Maximum Heart Rate: 120 bpm
o Resting ECG: 1 (abnormality)
o Exercise-Induced Angina: Yes
o Oldpeak: 1.5
o Slope: 0 (downsloping)
o Number of Major Vessels: 2
o Thalassemia: 2 (fixed defect)
 Prediction Output: The model predicts a high risk of heart attack for this patient. The output is displayed
as:

python
Copy code
st.markdown("<h2 style='text-align: center; color: red;'>Patient has risk of Heart Attack</h2>",
unsafe_allow_html=True)

7.5 Handling Missing or Incorrect Data

The model also addresses situations where users may enter incomplete or incorrect data. For example, if any
required fields are left empty or contain incorrect values, the system displays a warning:

python
Copy code
20
if any(value == 0 or value == 0.00 for value in features_values.values()):
st.warning('Please input all the details.')

This ensures that the model only makes predictions when complete and valid data is provided.

7.6 Result Analysis and Interpretation

Model's Performance:

 The XGBoost model performs well, providing accurate and reliable predictions based on the input data.
 High-risk predictions are displayed in red to grab attention, while low-risk predictions are displayed in
green to indicate safety.
 The system ensures that users (healthcare professionals or patients) can easily interpret the results and
make informed decisions based on the model’s output.

Model Limitations:

 Data Quality: The model’s accuracy depends on the quality and completeness of the input data.
Incomplete or noisy data could lead to less accurate predictions.
 Generalization: The model is trained on a specific dataset and may not generalize well to other
populations unless retrained with more diverse data.
 Feature Sensitivity: Some features, like age and cholesterol levels, have a more significant impact on the
predictions. However, the model may not always capture the subtleties of complex relationships between
features.

7.7 Future Enhancements

 Model Optimization: Future work could involve hyperparameter tuning or the inclusion of other machine
learning algorithms to improve prediction accuracy.
 Extended Data: Adding more features, such as genetic information, lifestyle factors, and medical history,
could improve the model's robustness.
 Deployment in Real-World Healthcare Systems: Integrating the system with healthcare platforms and
real-time data collection systems could provide more accurate predictions in clinical settings.

Here’s an expanded Chapter 8: Conclusion, Limitations, and Future Scope section, incorporating the future
use of IoT devices for real-time monitoring:

Conclusion:

 Accuracy tells you the overall correctness of the model.


 Precision gives insight into how many of the predicted positive results are actually correct.
 Recall shows how many actual positives were correctly identified.
 F1-Score balances precision and recall, providing a single score that represents the model’s performance.

These metrics together help evaluate the model’s performance thoroughly. For imbalanced datasets, precision and
recall are often more informative than accuracy alone.

20
CHAPTER 8

CONCLUSION, LIMITATION AND FUTURE SCOPE

8.1 Conclusion

This project successfully demonstrates the potential of using machine learning algorithms to predict heart attack
risk based on various clinical parameters such as age, cholesterol levels, blood pressure, and heart rate.
The system utilizes XGBoost, Random Forest, and Decision Tree models to accurately classify patients
into low and high-risk categories.

The XGBoost model outperforms other models, achieving high prediction accuracy. The application provides a
simple yet effective tool for healthcare professionals to assess a patient’s risk of heart attack based on
historical health data. With its user-friendly interface built using Streamlit, the system allows healthcare
providers to input clinical data and receive real-time predictions, offering a data-driven approach to
cardiovascular disease management.

The system has successfully achieved its primary goal of providing an easy-to-use prediction tool for heart attack
risk assessment based on clinical parameters.

8.2 Limitations

Despite the promising results, the system has some limitations:

1. Data Quality and Availability:


o The accuracy of predictions heavily relies on the quality and completeness of the input data. Missing or
erroneous data can lead to inaccurate predictions.
2. Generalization:
o The model has been trained on a specific dataset, which may not fully represent the diversity of global
populations. As a result, the model may not generalize well to other demographic groups or healthcare
environments.
3. Interpretability of Complex Models:
o While XGBoost provides high accuracy, it remains a relatively complex model that lacks the
interpretability of simpler models like Decision Trees. Efforts to enhance model transparency and
explainability would be useful in clinical decision-making.
4. Data Privacy and Security:
o Handling sensitive health data requires robust security measures. Ensuring data privacy and compliance
with healthcare regulations (e.g., HIPAA) is essential for real-world deployment.
5. Real-Time Data Processing:
o While the system can predict risks based on static input data, it does not yet handle real-time data
processing. Integrating real-time monitoring capabilities would add significant value to the system,
especially for continuous risk assessment.

8.3 Future Scope

21
9
Integration of IoT Devices for Real-Time Monitoring

One of the key areas of future development is the integration of IoT devices for real-time monitoring. In the
future, the heart attack prediction system can be enhanced by continuously collecting patient data through
wearable devices, such as heart rate monitors, blood pressure cuffs, and ECG sensors. This would allow
for continuous risk assessment and timely intervention.

Proposed Steps for Future Work:

1. Real-Time Data Collection:


o Wearable IoT devices will continuously collect vital health data from patients. These devices can monitor
real-time health parameters such as heart rate, ECG, blood pressure, cholesterol levels, and oxygen
saturation.
2. Data Streaming to Cloud Platforms:
o The data from wearable devices will be streamed to a cloud-based platform, where it will be processed,
stored, and analyzed. Real-time data streaming will enable the system to track any significant changes in
the patient's health, facilitating immediate intervention if necessary.
3. Continuous Risk Monitoring:
o By integrating IoT devices, the system can continuously assess the patient's risk for heart attacks in real-
time. Predictive models will be recalibrated dynamically based on the latest data, enabling the system to
detect any sudden changes in health metrics and adjust risk predictions accordingly.
4. Timely Alerts and Interventions:
o With continuous monitoring, the system can generate alerts in case of abnormal readings. Healthcare
providers and patients will receive immediate notifications about potential risks, enabling early
intervention to prevent heart attacks or other cardiovascular events.
5. Mobile and Wearable Device Integration:
o The system can be extended to mobile applications or integrated with existing wearable devices, such as
smartwatches or fitness trackers, to provide seamless user experiences. This would empower patients to
monitor their health on-the-go and receive real-time feedback.
6. AI and Cloud Integration:
o Artificial Intelligence (AI) can be leveraged to analyze massive amounts of real-time data, improving
the model's predictions and learning capabilities. The cloud infrastructure will ensure that the system can
handle large-scale data streams from multiple patients simultaneously.

Expected Benefits:

 Enhanced Accuracy: With real-time data, the model will have access to up-to-date information,
improving its ability to predict heart attacks more accurately.
 Proactive Healthcare: Real-time monitoring will allow healthcare providers to proactively intervene
when necessary, reducing the likelihood of heart attacks.
 Accessibility in Remote Areas: IoT-enabled devices can provide essential healthcare to patients in
remote areas, where medical resources are limited.
 Personalized Risk Assessment: Real-time data would enable the system to provide personalized,
dynamic health assessments, accounting for changes in a patient’s condition throughout the day.

Other Potential Future Enhancements:

 Multimodal Data Integration: Future versions of the system could integrate more data sources, such as
genetic information, lifestyle data (e.g., diet, physical activity), and even environmental factors, to
provide more personalized predictions.

22
0
 Deep Learning Models: Exploring the use of more advanced deep learning models, such as
Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), could improve
prediction accuracy by learning complex patterns from larger, more diverse datasets.
 Scalability for Widespread Use: The system could be scaled for use in hospitals or clinics, where it can
help manage large patient populations by providing continuous risk assessments and early warnings for
heart attack risks.

8.4 Conclusion

In conclusion, the heart attack prediction system developed in this project offers a promising approach to
cardiovascular disease management using machine learning. By utilizing XGBoost for risk prediction,
the system provides healthcare professionals with an accurate, user-friendly tool to assist in heart attack
diagnosis. Looking forward, the integration of IoT devices for real-time monitoring will significantly
enhance the system's capabilities, enabling continuous health tracking and proactive intervention. As the
system evolves, it has the potential to contribute to the broader goal of improving healthcare access and
outcomes worldwide.

1. nt healthcare systems.

22
1
REFERENCES

Banu, N.S., & Swamy, S. (2016). Prediction of heart disease at early stage using data mining and big data
analytics: A survey. In 2016 International Conference on Electrical, Electronics, Communication,
Computer and Optimization Techniques (ICEECCOT), IEEE, 256–261.
https://ieeexplore.ieee.org/document/7955226
Zahra, I.F., Wisana, I.D.G.H., Nugraha, P.C., & Hassaballah, H.J. (2021). Design a monitoring device for
heart-attack early detection based on respiration rate and body temperature parameters.
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics, 3(3), 114–
120. https://ijeeemi.poltekkesdepkes-sby.ac.id/index.php/ijeeemi/article/view/120
Mienye, I.D., Sun, Y., & Wang, Z. (2020). Improved sparse autoencoder-based artificial neural network
approach for prediction of heart disease. Information Medicine Unlocked, 18, 100307.
https://www.sciencedirect.com/science/article/pii/S2352914820300447
Rani, P., Kumar, R., Ahmed, N.M.S., & Jain, A. (2021). A decision support system for heart disease
prediction based upon machine learning. Journal of Reliable Intelligent Environments, 7(3),
263–275. https://link.springer.com/article/10.1007/s40860-021-00133-6
Mohan, M., Sharma, A., & Madaan, S. (2019). A hybrid model for heart disease prediction using
ensemble techniques. Journal of Medical Systems, 43(2), 41–49.
https://link.springer.com/article/10.1007/s10916-019-1362-7
Ali, F., El-Sappagh, S., Islam, S.R., Kwak, D., Ali, A., Imran, M., & Kwak, K.-S. (2020). A smart healthcare
monitoring system for heart disease prediction based on ensemble deep learning and feature
fusion. Information Fusion, 63, 208–222.
https://doi.org/10.1016/j.inffus.2020.06.008
Sarmah, S.S. (2020). An efficient IoT-based patient monitoring and heart disease prediction system using
deep learning modified neural network. IEEE Access, 8, 135784–135797.
https://ieeexplore.ieee.org/document/9133567
Sharma, A., & Sharma, M. (2018). Predicting heart disease using machine learning techniques.
International Journal of Computer Applications, 182(29), 12–17.
https://www.ijcaonline.org/archives/volume182/number29/30385-2018015454
Agher, D., Sedki, K., Despres, S., Albinet, J.-P., Jaulent, M.-C., & Tsopra, R. (2022). Encouraging behavior
changes and preventing cardiovascular diseases using the Prevent Connect mobile health app:
Conception and evaluation of app quality. Journal of Medical Internet Research, 24(1), e25384.
https://www.jmir.org/2022/1/e25384
Krist, A.H., Davidson, K.W., Mangione, C.M., Barry, M.J., Cabana, M., Caughey, A.B., Donahue, K., Doubeni,
C.A., Epling, J.W., Kubik, M., et al. (2020). Behavioral counseling interventions to promote a
healthy diet and physical activity for cardiovascular disease prevention in adults with
cardiovascular risk factors: US Preventive Services Task Force recommendation statement.
JAMA, 324(20), 2069– 2075.
https://www.scopus.com/home.uri

22
2

You might also like