0% found this document useful (0 votes)
353 views70 pages

Heart Disease Prediction

Uploaded by

niti gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
353 views70 pages

Heart Disease Prediction

Uploaded by

niti gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Heart Disease Prediction

REPORT FILE
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR
MINOR PROJECT

Duration
(January to May 2024)

SUBMITTED BY
Aarhaam Jain 2103005 (102/21)
Gaurav Bhagat 2103022 (119/21)
Mohit Kapoor 2103039 (136/21)
Niti Gupta 2103046 (143/21)

Department of Computer Science & Engineering


DAV Institute of Engineering & Technology
Jalandhar, India
DAV INSTITUTE OF ENGINEERING AND TECHNOLOGY
JALANDHAR

CANDIDATE DECLARATION

I hereby certify that the work which is being presented in the thesis entitled “HEART DISEASE
PREDICTION SYSTEM USING MACHINE LEARNING” by ‘NITI GUPTA’ in partial
fulfilment of requirements for the award of degree of B.Tech (Computer Science and
Engineering) submitted in Department of Computer Science and Engineering at DAV
INSTITUTE OF ENGINEERING AND TECHNOLOGY under I.K GUJRAL PUNJAB
TECHNICAL UNIVERSITY , JALANDHAR is an authentic record under supervision of DR.
VINAY CHOPRA , Assistant Professor , Department of Computer Science and Engineering,
DAV INSTITUTE OF ENGINEERING AND TECHNOLOGY. The matter presented in this
thesis has not been submitted by me in any other University/Institute for reward of B.Tech
Degree.

Signature of Student

This is to certify that the above statement made by the candidates is correct to best of my/our
knowledge.

Signature of SUPERVISOR (S)

The B.Tech Viva-Voce Examination Of NITI GUPTA, MOHIT KAPOOR, AARHAAM


JAIN, GAURAV BHAGAT has been held on _________________ and accepted.

Signature of Supervisor(s) Signature of External Examiner

Signature of H.O.D
ABSTRACT

Globally, heart disease is the leading cause of death. In order to treat cardiac disease, numerous
cutting-edge technologies are employed. It is the most prevalent issue in hospitals since many
medical professionals lack the knowledge and experience necessary to manage patients, as a
result, they make poor decisions that occasionally result in death. Hospitals are finding it easier
to undertake automatic diagnosis because of these issues, which are being addressed by the use
of machine learning algorithms and data mining approaches to forecast cardiac disease. The
research paper mainly focuses on which patient is more likely to have a heart disease based on
various medical attributes. We prepared a heart disease prediction system to predict whether the
patient is likely to be diagnosed with a heart disease or not using the medical history of the
patient.

Analysis of the patient's various health factors can be used to forecast the development of heart
disease. A variety of algorithms, including naïve Bayes, k Nearest Neighbor (KNN), decision
trees, and artificial neural networks (ANNs), can be used to predict cardiac disease.To forecast
cardiac disease, we have employed a variety of criteria. These factors include things like blood
pressure (bp), fasting blood sugar test (fbs), age, gender, and cerebral palsy (CP). Our study
report makes use of an integrated dataset.Using the same dataset, we have applied five distinct
approaches to predict heart disease. Naive Bayes, k Nearest Neighbor (KNN), Decision trees,
Artificial Neural Networks (ANN), and Random Forest are some of the implemented
algorithms.This study looks into which method, depending on health characteristics, predicts
heart disease most accurately. Naïve Bayes had the maximum accuracy of 85%, according to the
experiment.

Thus, by utilizing the provided model to determine the likelihood that the classifier can correctly
and precisely diagnose cardiac illness, a sizable amount of pressure has been released. The Given
heart disease prediction system lowers costs and improves medical care. We get important
information from this experiment that will aid in the prediction of heart disease patients. It is
used with the.pynb file format.

ii
ACKNOWLEDGEMENT

I am highly grateful to the Dr. Sanjeev Naval(o) , Principal, DAV Institute of Engineering &
Technology, Jalandhar, for providing this opportunity to carry out the working of major project.

The constant guidance and encouragement received from Dr. Harpreet Kaur Bajaj, HoD
Department of Computer Science & Engineering, DAVIET Jalandhar has been of great help in
carrying out the project work and is acknowledged with reverential thanks.

I would like to express my gratitude to Dr. Vinay Chopra , Assistant Professor Department of
Computer Science & Engineering DAVIET Jalandhar for his stimulating guidance, continuous
encouragement and supervision throughout the course of present work. Without the wise counsel
and able guidance, it would have been impossible to complete the report in this manner.

I express gratitude to other faculty members of Computer Science & Engineering department of
DAVIET for their intellectual support throughout the course of this work.

Niti Gupta (2103046)

Mohit Kapoor (2103039)

Aarhaam Jain (2103005)

Gaurav Bhagat (2103022)

iii
LIST OF FIGURES

Fig No. Figure Description Page No.

2.1 Use case Diagram 13


2.2 Data Model 14
2.3 Spiral Model of SDLC 18
3.1 Data flow Diagram of model 22
3.2 ER model 26
3.3 Learning Process 27
5.1 Importing Libraries 50
5.2 Reading the Dataset 51
5.3 Analysing Dataset 51
5.4 Analysing the Values 52
5.5 Checking the null values 52
5.6 Calculating the null values 53
5.7 Code For chart Representation 54
5.8 Bar Chart representing levels of features 54
5.9 Code For Scatter chart Representation 55
5.10 Scatter chart for cholesterol vs age 55
5.11 Splitting target and feature Column 56
5.12 Logistic Regression 57
5.13 K-Nearest Neighbor 58
5.14 Gaussian NB 58
5.15 Predictive Model 59
5.16 Database Table 60
iv

LIST OF TABLES

Table No. Table Description Page No.

1 Comparing the accuracy of preprocessing techniques 28

2 Datasets Characteristics 28

v
LIST OF CONTENTS

Contents Page No.

Certificate i
Abstract ii
Acknowledgement iii
List of Figures iv
List of Tables v
Table of Contents vi-vii
Chapter 1: Introduction 1-9
1.1 Introduction to Project 1
1.2 Project Category 1-3
1.2.1 Research Based Category 2-3
1.2.2 Application Development Category 3-4
1.3 Objective 4-5
1.4 Problem Formulation 5-6
1.5 Identification Of Need 6-7
1.6 Existing System 7-8
1.7 Proposed System 8
1.8 Unique Features Of the System 8-9
Chapter 2: Requirement Analysis And System 10-18
Specification
2.1 Feasibility Study 10-11
2.1.1 Economic Feasibility 10
2.1.2 Operational Feasibility 10-11
2.1.3 Technical Feasibilty 11
2.2 Software Requirement Specification 12-14
2.3 Validation 14-15
2.4 Expected Hurdles 15-17
2.5 SDLC model to Used 17-18
Chapter 3: System Design 19-32
3.1 Design Approach 19
3.2 Detail Design 19-22
3.3 System Design Using Various Tools 23-24
3.4 User Interface Design 24-25
3.5 Database Design 26-
3.5.1 ER Diagram 26
3.5.2 Normalization 26-28
3.5.3 Database Connection Control
3.6 Methodology Of the System 30-32
Chapter 4: Implementation, Testing, Maintenance 33-39
4.1 Introduction To Language, IDEs, Tools used 33-34

vi
4.1.1 Programming Languages Used 33
4.1.2 Integrated Development Environment 33
4.1.3 Tools and Technologies 34
4.2 Coding Standard of Language used 35-36
4.3 Techniques and Testing Plans 36-39
4.3.1 Technique Plans 36-37
4.3.2 Testing Plans 38-39
Chapter 5: Results And Discussions 40-59
5.1 User Interface Representation 40-45
5.1.1 Brief description of various modules 42-45
5.2 Snapshots Of the System 46-48
5.3 Backend of The System 49-59
5.3.1 Snapshot Of Database Table 59
Chapter 6:Conclusion and Future Scope 60
References 61

vii
CHAPTER 1

INTRODUCTION

1.1 Introduction To Project

Worldwide, heart disease is the leading cause of death. Every year, heart disease claims the lives
of an estimated 12 million individuals, making cardiovascular diseases the leading cause of death
in the world. In the US, heart disease claims one life every 34 seconds.
Heart attacks, which occur when blood supply to the heart or brain is blocked, are frequently
catastrophic events. Individuals who are at risk of heart disease may have elevated levels of
stress, blood pressure, hyperglycemia, and lipids. Basic health facilities can simply measure each
of these factors at home.

Heart disease is classified into three categories: cardiovascular disease, myopathy, and coronary
heart disease. A wide range of illnesses that impact the heart, blood arteries, and the way fluid
enters and moves through the bloodstream throughout the body are collectively referred to as
"heart diseases." Cardiovascular disease (CVD) is a major cause of death, disability, and other
illnesses. In medicine, diagnosing an illness is a crucial and intricate task.

Medical diagnosis is seen as an important but challenging duty that must be completed
successfully and swiftly.This task has been automated, which is really useful. Regretfully, not all
medical professionals are subject-matter experts, and in many areas, resources are scarce. Data
mining can be used to uncover information and hidden patterns that could help with wise
decision-making. This is crucial for healthcare practitioners to make wise judgments and offer
the general public high-quality services. It's also crucial how the healthcare organization treats
professionals who lack advanced training and expertise. The capacity to make precise
conclusions when needed is one of the primary drawbacks of current methodologies.

In our method, we forecast heart disease based on certain health characteristics by employing
several data mining approaches and machine learning algorithms, such as Naïve Bayes, k Nearest
Neighbor (KNN), Decision tree, Artificial Neural Network (ANN), and Random Forest.

1
1.2 Project Category

1.2.1 Research-based Project Category:

Heart disease remains one of the leading causes of mortality worldwide. Predictive modeling
offers a promising avenue for early detection and intervention, potentially reducing the burden
of heart-related ailments. This research-based project aims to delve into the realm of predictive
analytics to develop robust models for heart disease prediction.

Methodology:

The methodology encompasses several key steps:

a) Literature Review: A comprehensive review of existing research papers, datasets, and


methodologies related to heart disease prediction.
b) Data Collection and Preprocessing: Acquisition of relevant datasets and preprocessing steps
such as data cleaning, normalization, and feature engineering.
c) Model Development: Experimentation with various machine learning algorithms including
logistic regression, decision trees, random forests, support vector machines, and neural
networks. Additionally, exploring ensemble methods and deep learning architectures.
d) Evaluation: Performance evaluation of developed models using metrics such as accuracy,
precision, recall, F1-score, and area under the ROC curve (AUC).
e) Interpretation and Insights: Interpretation of model predictions, identification of significant
features, and deriving insights to inform clinical practice.

Results and Discussion:

The results section will present the performance of developed models and compare them against
existing benchmarks. The discussion will delve into the implications of findings, potential
limitations of the study, and avenues for future research.

Conclusion:

The research-based project aims to contribute to the growing body of knowledge in predictive
analytics for heart disease. By exploring diverse methodologies and datasets, this endeavor seeks
to enhance our understanding of factors influencing heart disease and pave the way for more
accurate predictive models.

2
1.2.2 Application Development Project Category:

The application development project focuses on translating research findings into a practical,
user-friendly tool for heart disease prediction. Leveraging the insights gained from research
endeavors, this project aims to develop an application that can assist healthcare professionals and
individuals in assessing their risk of heart disease.

Methodology:

a) User Requirements Analysis: Understanding the needs and preferences of potential users
through surveys, interviews, and stakeholder consultations.
b) UI/UX Design: Designing a user-friendly interface that facilitates easy input of relevant data
and provides clear output of prediction results.
c) Backend Development: Implementing backend systems for data processing, model inference,
and result visualization. Utilizing appropriate technologies and frameworks for scalability
and reliability.
d) Integration: Integrating the application with relevant healthcare systems or platforms,
ensuring seamless data exchange and interoperability, if required.
e) Testing and Feedback: Conducting usability testing with target users to identify pain points
and gather feedback for iterative improvements.

Results and Discussion:

The results section will showcase the developed application, highlighting its features,
functionality, and usability. Discussion will revolve around user feedback, challenges
encountered during development, and future enhancements.

Conclusion:

The application development project aims to bridge the gap between research and practice by
providing a tangible tool for heart disease prediction. Through effective design and
implementation, this endeavor seeks to empower healthcare professionals and individuals in
managing and mitigating the risks associated with heart disease.

3
1.3 Objective

1. Development of an Effective Prediction System:

The primary objective of this project report is to detail the development of a robust heart disease
prediction system model. This system aims to accurately assess the risk of heart disease in
individuals based on various input parameters such as medical history, lifestyle factors, and
diagnostic test results.

2. Evaluation of Predictive Performance:

Another key objective is to evaluate the predictive performance of the developed model
comprehensively. This involves assessing metrics such as accuracy, sensitivity, specificity,
precision, recall, and area under the ROC curve (AUC). Through rigorous evaluation, the
effectiveness and reliability of the prediction system can be determined.

3. Feature Importance Analysis:

An important aspect of the project is to analyze the importance of different features in predicting
heart disease. This involves identifying which variables contribute most significantly to the
predictive accuracy of the model. Understanding feature importance can provide valuable
insights for healthcare professionals in identifying risk factors and designing preventive
strategies.

4. Comparison with Existing Methods:

The project aims to compare the performance of the developed prediction system with existing
methods or models. This comparative analysis helps in understanding the strengths and
limitations of the proposed approach and provides benchmarking against established standards
in the field.

5. Clinical Interpretability and Usability:

Additionally, the project seeks to ensure that the developed prediction system is clinically
interpretable and user-friendly. This involves interpreting model predictions in the context of
clinical practice and designing an intuitive user interface for healthcare professionals to easily
input patient data and interpret results.

4
6. Ethical Considerations and Data Privacy:

Ethical considerations and data privacy are paramount in healthcare-related projects. Thus, the
project report aims to address ethical considerations such as patient consent, data anonymization,
and adherence to regulatory guidelines (e.g., HIPAA). Ensuring the ethical and responsible use
of patient data is essential for the credibility and trustworthiness of the prediction system.

7. Potential for Real-world Implementation:

Lastly, the project report discusses the potential for real-world implementation of the heart
disease prediction system. This involves considering factors such as scalability, integration with
existing healthcare systems, deployment in clinical settings, and stakeholder engagement.
Assessing the feasibility and practicality of implementing the prediction system in real-world
healthcare settings is crucial for its adoption and impact.

1.4 Problem Formulation

Medical research has recently identified risk factors that may contribute to the development of
heart disease, however, additional study is required to apply this information to lower the
incidence of heart disease. Heart disease is primarily associated with high blood cholesterol,
diabetes, and hypertension. Numerous investigations and studies have been conducted on
lowering the risk of heart disease. Population study data, including blood pressure, smoking
status, cholesterol and blood pressure readings, and diabetes, have been used to predict the risk
of heart disease. These prediction algorithms have been modified by researchers to create
simplified score sheets that patients can use to determine their risk of heart disease.The goal of
this work was to create an intelligent system for the classification of heart disease based on risk
factor categories, using algorithms for classification.

1.5 Identification of Need

Heart disease remains a leading cause of mortality globally, underscoring the critical importance
of early detection and preventive measures. However, traditional methods for assessing heart
disease risk often rely on subjective assessments or are limited in their scope. There exists a
pressing need for a more accurate, reliable, and scalable approach to predict the likelihood of
heart disease in individuals. A heart disease prediction system addresses this need by leveraging
advanced machine learning algorithms and comprehensive datasets to analyze various risk
5
factors and generate personalized risk assessments. Such a system not only empowers individuals
to proactively manage their health but also equips healthcare professionals with valuable insights
for targeted interventions.

1. Growing Burden of Heart Disease:

Heart disease remains a leading cause of mortality globally, imposing a significant burden on
healthcare systems and societies. There is a pressing need to develop proactive strategies for early
detection and intervention to mitigate the impact of heart disease on public health.

2. Challenges in Early Detection:

Early detection of heart disease is often challenging due to the asymptomatic nature of the
condition in its early stages. Traditional risk assessment methods based on demographic and
clinical factors may not capture subtle changes indicative of developing heart disease.

3. Potential for Preventive Interventions:

Early identification of individuals at high risk of heart disease presents an opportunity for targeted
preventive interventions. By identifying risk factors and implementing lifestyle modifications or
medical treatments, the progression of heart disease can be delayed or prevented, leading to
improved health outcomes.

4. Variability in Risk Factors:

Risk factors for heart disease can vary widely among individuals based on factors such as age,
gender, genetic predisposition, lifestyle choices, and comorbidities. A personalized approach to
risk assessment is needed to account for this variability and provide tailored recommendations
for disease prevention.

5. Advancements in Data Analytics and Machine Learning:

Advances in data analytics and machine learning offer new opportunities for predictive modeling
in healthcare. Leveraging large-scale datasets and sophisticated algorithms, it is possible to
develop accurate prediction models that integrate diverse sources of information to assess an
individual's risk of heart disease.

6. Need for Clinical Decision Support Tools:

Healthcare providers require effective clinical decision support tools to aid in risk assessment
and treatment planning for patients at risk of heart disease. A reliable prediction system can

6
complement clinical judgment by providing quantitative risk estimates based on objective data
analysis.

7. Empowerment of Patients:

Empowering individuals to take proactive steps towards heart health is essential for disease
prevention and management. A heart disease prediction system can serve as an educational tool,
enabling individuals to understand their risk factors and make informed decisions about lifestyle
changes and healthcare utilization.

8. Public Health Impact:

Implementing a heart disease prediction system at a population level has the potential to have a
significant public health impact by reducing the incidence of heart disease, lowering healthcare
costs, and improving overall quality of life.

1.6 Existing System

By using a robust prediction algorithm, the current system components produce detailed reports.
The patient and the doctor provided the input details for this project. Heart illness is then analyzed
utilizing artificial intelligence algorithms based on the doctor's inputs. The resulting result is now
determined to be enhanced when compared to the results of current models within the same
domain. The primary goals of the current approach are to assess the likelihood that a given patient
may get heart disease in the future by comparing and evaluating previous patients with illness
outputs and new patients. The aforementioned model's implementation will help us achieve the
objective of creating a system that can predict the new patient's chance of having a heart attack
with a higher rate of accuracy. Using K Neighbors, Support Vector, Decision Tree, and Random
Forest classifiers, patterns are found in the patient data from heart disease that was gathered from
the UCI laboratory. The accuracy and performance of the outcomes using these AI algorithms
are compared. The Heart Disease Prediction System model is designed to use various AI
algorithms and methodologies. However, the accuracy is relatively low when utilizing all of the
current technologies.

7
1.7 Proposed System

The problem with risk factors related to heart disease is that there are many risk factors involved
like age, usage of cigarette, blood cholesterol, person's fitness, blood pressure, stress and etc. and
understanding and categorizing each one according to its importance is a difficult task. Also, a
heart disease is often detected when a patient reaches advanced stage of the disease. Hence the
risk factors were analyzed from various sources. The dataset was composed of 12 important risk
factors which were sex, age, family history blood pressure, Smoking Habit, alcohol consumption,
physical inactivity, diabetes, blood cholesterol, poor diet, obesity. The system indicated whether
the patient had risk of heart disease or not.

1.8 Unique Features Of The System

The heart disease prediction system project incorporates several unique features that set it apart
from existing approaches in cardiovascular risk assessment. Firstly, it integrates a wide array of
data sources, including demographic information, medical history, lifestyle factors, and
diagnostic test results, to provide a comprehensive assessment of an individual's risk profile. This
multi-modal approach ensures that the prediction model captures the complex interplay of factors
contributing to heart disease, resulting in more accurate risk estimates.

Moreover, the project emphasizes the development of interpretable models, enabling healthcare
professionals to understand the underlying factors driving the predictions. By providing insights
into the relative importance of different risk factors, the system empowers clinicians to make
informed decisions regarding patient care and intervention strategies. Additionally, the project
prioritizes scalability and usability, with the aim of deploying the prediction system in diverse
healthcare settings and facilitating seamless integration into clinical workflows.

Furthermore, the project places a strong emphasis on ethical considerations and data privacy,
ensuring that patient confidentiality and consent are upheld throughout the data collection and
analysis process. By adhering to rigorous ethical standards, the project seeks to build trust among
stakeholders and promote responsible use of predictive analytics in healthcare.

Overall, the heart disease prediction system project stands out for its holistic approach to risk
assessment, emphasis on interpretability and usability, and commitment to ethical principles. By
8
leveraging these unique features, the project aims to advance the field of cardiovascular risk
prediction and ultimately improve outcomes for individuals at risk of heart disease.

9
CHAPTER 2

REQUIREMENT ANALYSIS AND SYSTEM SPECIFICTAION

2.1 Feasibility Study

A Feasibility Study is a preliminary study undertaken before the real work of a project starts to
ascertain the likely hood of the project's success. It is an analysis of possible alternative solutions
to a problem and a recommendation on the best alternative.

Feasibility Study for Heart Disease Prediction System:

2.1.1 Economic Feasibility:

a) Cost-Benefit Analysis: Conducting a cost-benefit analysis to evaluate the financial viability


of developing and implementing the heart disease prediction system. This includes estimating
the initial investment required for system development, data acquisition, infrastructure setup,
and ongoing maintenance costs. Additionally, assessing the potential economic benefits, such
as reduced healthcare expenditures due to early detection and prevention of heart disease,
improved patient outcomes, and increased efficiency in healthcare delivery.
b) Return on Investment (ROI): Calculating the expected ROI of the prediction system over
a specified time horizon. This involves comparing the projected benefits, such as healthcare
cost savings and productivity gains, against the total investment to determine whether the
project is economically feasible.
c) Funding Sources: Exploring various funding sources, including government grants, private
investments, and healthcare organizations' budgets, to support the development and
implementation of the prediction system.

2.1.2 Operational Feasibility:

a) User Acceptance: Assessing the willingness of healthcare providers to adopt and integrate
the heart disease prediction system into their clinical workflows. Conducting surveys,
interviews, and usability testing to understand user preferences, concerns, and requirements
regarding system functionality and usability.
10
b) Training Needs: Identifying the training needs of healthcare professionals to effectively
utilize the prediction system. Developing training programs and educational materials to
ensure that users are proficient in using the system and interpreting its results.
c) Workflow Integration: Evaluating the compatibility of the prediction system with existing
healthcare IT infrastructure and clinical workflows. Ensuring seamless integration with
electronic health records (EHR) systems, decision support tools, and other relevant healthcare
applications to facilitate efficient data exchange and workflow coordination.

2.1.3 Technical Feasibility:

a) Data Availability and Quality: Assessing the availability and quality of data sources
required for training and testing the prediction model. This includes electronic health records,
medical imaging data, laboratory test results, and patient-reported information. Addressing
data quality issues such as missing values, inaccuracies, and inconsistencies through data
preprocessing and cleansing techniques.
b) Model Development: Evaluating the technical feasibility of developing accurate and reliable
prediction models for heart disease using machine learning and data mining techniques.
Experimenting with various algorithms, feature selection methods, and model architectures
to optimize predictive performance while minimizing computational resources and
processing time.
c) Scalability and Performance: Considering the scalability and performance requirements of
the prediction system to accommodate large volumes of data and user requests. Designing
scalable and efficient algorithms, deploying the system on robust and scalable infrastructure,
and implementing performance monitoring and optimization mechanisms to ensure optimal
system performance under varying workloads.

2.2 Software Requirement Specification Document

1. Introduction

a) Purpose: The purpose of this document is to provide a detailed description of the software
requirements for the development of a heart disease prediction system.
b) Scope: The heart disease prediction system aims to analyze patient data and predict the risk
of heart disease using machine learning algorithms.
11
c) Audience: This document is intended for software developers, project managers,
stakeholders, and healthcare professionals involved in the development and deployment of
the system.

2. Functional Requirements

a) User Authentication: Users should be able to register for an account or log in with existing
credentials to access the system.
b) Data Input: Users should be able to input relevant patient data, including demographic
information, medical history, and diagnostic test results.
c) Data Preprocessing: The system should preprocess input data to handle missing values,
normalize features, and encode categorical variables.
d) Predictive Modeling: The system should build machine learning models to predict the risk
of heart disease based on preprocessed data.
e) Result Interpretation: Prediction results should be presented to users along with
visualizations and insights into risk factors contributing to heart disease.
f) User Feedback: Users should be able to provide feedback and ratings on their experience
with the system for continuous improvement.
g) Security: The system should ensure data privacy, confidentiality, and compliance with
healthcare regulations such as HIPAA.
h) Scalability: The system should be scalable to handle large volumes of data and user requests
efficiently.

3. Non-Functional Requirements

a) Performance: The system should be able to process data and generate predictions within a
reasonable timeframe to provide timely results to users.
b) Reliability: The system should be reliable, with minimal downtime and robust error handling
mechanisms to ensure uninterrupted service.
c) Usability: The system should have an intuitive user interface and provide clear instructions
for data input and result interpretation to facilitate ease of use.
d) Compatibility: The system should be compatible with different web browsers and operating
systems to ensure accessibility across various devices.
e) Maintainability: The system should be modular and well-documented, allowing for easy
maintenance, updates, and modifications by developers.

12
f) Ethical Considerations: The system should uphold ethical principles such as fairness,
transparency, and accountability in predictive modeling and result interpretation.

4. System Architecture

a) The heart disease prediction system follows a client-server architecture, with a web-based
user interface for data input and result presentation, and a backend server for data processing,
predictive modeling, and database interaction.
b) The system architecture includes three tiers: presentation layer (UI), application layer
(backend server), and data layer (database).

5. Constraints

a) The system must comply with regulatory requirements such as HIPAA for handling patient
data and ensuring privacy and security.
b) The system must be developed using scalable and cost-effective technologies to
accommodate potential increases in data volume and user traffic.

6. Glossary

HIPAA: Health Insurance Portability and Accountability Act, a US legislation that protects the
privacy and security of patients' health information.

7. Appendix

Use Case Diagram:

Fig 2.1: Use Case Diagram


13
Data Model:

Fig 2.2 : Data Model

2.3 Validation

Validation of a heart disease prediction system is essential to ensure its accuracy, reliability, and
clinical utility in real-world healthcare settings. The validation process involves assessing the
performance of the predictive models, evaluating their generalization ability, and verifying their
clinical relevance.

One aspect of validation involves assessing the predictive performance of the models using
appropriate metrics such as accuracy, precision, recall, and area under the receiver operating
characteristic curve (AUC-ROC). This involves dividing the dataset into training and testing sets,
training the models on the training set, and evaluating their performance on the independent
testing set to assess their ability to generalize to unseen data.

Furthermore, external validation on independent datasets from different populations or healthcare


settings is crucial to assess the robustness and generalizability of the models across diverse
patient populations. This helps identify potential biases or overfitting issues and ensures that the
models perform consistently across different demographics and clinical scenarios.

In addition to statistical validation, clinical validation is essential to assess the clinical relevance
and utility of the prediction system. This involves collaborating with healthcare professionals to
evaluate the predictive models in real clinical settings, assessing their impact on clinical decision-
making, patient outcomes, and healthcare resource utilization.

14
Furthermore, validation should consider ethical and regulatory considerations, including patient
privacy, data security, and compliance with healthcare regulations such as the Health Insurance
Portability and Accountability Act (HIPAA) in the United States. Ensuring transparency,
accountability, and fairness in model development and deployment is paramount to build trust
and acceptance among healthcare providers and patients.

Overall, a comprehensive validation process that combines statistical analysis, clinical


evaluation, and ethical considerations is essential to ensure the accuracy, reliability, and clinical
utility of heart disease prediction systems, ultimately improving patient care and health
outcomes.

2.4 Expected Hurdles

Developing a heart disease prediction system comes with various challenges and hurdles, ranging
from technical complexities to ethical considerations. Here are some expected hurdles in the
development and deployment of such a system:

1. Data Quality and Availability:

Obtaining high-quality and comprehensive datasets for training predictive models can be
challenging due to data privacy regulations, data fragmentation across healthcare systems, and
biases in collected data.

2. Feature Selection and Engineering:

Identifying relevant features and engineering informative predictors from raw data poses a
challenge, especially in complex medical datasets where relationships between variables may be
nonlinear or multifactorial.

3. Model Generalization and Interpretability:

Ensuring that predictive models generalize well to unseen data and are interpretable by healthcare
professionals is crucial for real-world adoption. Complex models may sacrifice interpretability

15
for predictive performance, making it challenging to understand the underlying decision-making
process.

4. Handling Imbalanced Data:

Imbalanced datasets, where one class (e.g., presence of heart disease) is significantly more
prevalent than others, can lead to biased models. Strategies such as oversampling,
undersampling, or using advanced techniques like synthetic data generation may be required to
address class imbalance.

5. Ethical and Legal Concerns:

Ensuring fairness, transparency, and accountability in predictive modeling is paramount to avoid


unintended consequences such as biased predictions, discrimination, or violation of patient
privacy rights. Compliance with healthcare regulations (e.g., HIPAA) and ethical guidelines is
essential.

6. Model Validation and Evaluation:

Properly validating predictive models and evaluating their performance using appropriate metrics
is challenging due to the complexity of healthcare data and the need for rigorous statistical
methods. Cross-validation, bootstrapping, and external validation on independent datasets are
essential steps in model evaluation.

7. User Adoption and Trust:

Gaining acceptance and trust from healthcare professionals and patients in the reliability and
accuracy of the prediction system is crucial. Effective communication of model limitations,
uncertainties, and risk factors is necessary to build confidence and encourage adoption.

8. Integration with Healthcare Systems:

Integrating the prediction system with existing electronic health record (EHR) systems, clinical
workflows, and decision support tools can be complex due to interoperability issues, data
standards, and compatibility requirements.

16
9. Scalability and Deployment:

Scaling the prediction system to handle large volumes of data and user requests while maintaining
performance and reliability requires robust infrastructure, cloud deployment, and efficient
resource management.

10. Continuous Improvement and Updates:

Ensuring the prediction system remains up-to-date with the latest medical knowledge, research
findings, and technological advancements requires continuous monitoring, feedback
mechanisms, and iterative improvement cycles.

2.5 SDLC Model To Be Used

The methodology of software development is the method in managing project development.


There are many models of the methodology are available such as Waterfall model model,
Incremental model, RAD model, Agile model, Iterative model and Spiral model. However, it still
need to be considered by developer to decide which is will be used in the project. The
methodology model is useful to manage the project efficiently and able to help developer from
getting any problem during time of development. Also, it help to achieve the objective and scope
of the projects. In order to build the project, it need to understand the stakeholder requirements.

Methodology provides a framework for undertaking the proposed DM modeling. The


methodology is a system comprising steps that transform raw data into recognized data patterns
to extract knowledge for users.

17
Fig 2.3

:Spiral Model for Software Development

There are four phases that involve in the spiral model:

1) Planning phase: Phase where the requirement are collected and risk is assessed. This phase
where the title of the project has been discussed with project supervisor. From that discussion,
Heart Prediction System has been proposed. The requirement and risk was assessed after doing
study on existing system and do literature review about another existing research.

2) Risk analysis Phase: Phase where the risk and alternative solution are identified. A prototype
are created at the end this phase. If there is any risk during this phase, there will be suggestion
about alternate solution.

3) Engineering phase: At this phase, a software are created and testing are done at the end this
phase.

18
4) Evaluation phase: At this phase, the user do evaluation toward the software. It will be done
after the system are presented and the user do test whether the system meet with their expectation
and requirement or not. If there is any error, user can tell the problem about system.

CHAPTER 3

SYSTEM DESIGN

3.1 Design Approach

I will be using the experimental type of research design. It is a quantitative research method.
Basically, it is a research conducted with a scientific approach, where a set of variables are kept
constant while other set of variables are being measured as the subject of the experiment. This is
more practically while conducting heart disease detection as it monitors the behaviours and
patterns of a subject to be used to acknowledge whether the subject matches all details presented
and cross checked with previous data. It is an effect research method as it is time bound and
focuses on the relationship between the variables that give actual results.

3.2 Detail Design

Designing a heart disease prediction system involves defining the architecture, components, and
interactions of the system to achieve accurate risk assessment, result interpretation, and user
interaction. Here's a detailed design outline for the heart disease prediction system:

1. System Architecture:

a) Client-Server Architecture: The system follows a client-server model, where the client
interacts with a web-based user interface, and the server handles data processing, predictive
modelling, and result generation.
b) Three-Tier Architecture: The system architecture consists of three tiers: presentation layer
(UI), application layer (backend server), and data layer (database), each responsible for
specific functionalities.
19
2. Components:

a. Presentation Layer (UI):

1. Web Interface: A user-friendly web interface allows users to input data, view prediction
results, and access additional features.
2. Responsive Design: The UI is designed to be responsive, accessible, and compatible with
different devices and screen sizes.
3. Interactive Elements: The UI includes interactive elements such as input forms, buttons,
charts, and tooltips for enhanced user experience.

b. Application Layer (Backend Server):

1. Data Preprocessing Module: Cleans, transforms, and preprocesses input data to ensure
compatibility with predictive models.
2. Predictive Modeling Module: Utilizes machine learning algorithms to build predictive
models based on preprocessed data and generate risk assessment scores.
3. Result Interpretation Module: Interprets prediction results, generates visualizations,
and provides insights into risk factors contributing to heart disease.
4. User Management Module: Handles user authentication, registration, profile
management, and access control.

c. Data Layer (Database):

1. Users Table: Stores user account information, including usernames, passwords, and
registration details.
2. Patient Data Table: Stores demographic information, medical history, and input data for
heart disease prediction.
3. Predictions Table: Records prediction results, including risk scores, confidence
intervals, and timestamps.
4. Feedback Table: Logs user feedback, ratings, and comments for system evaluation and
improvement.

20
3. Interaction Flow:

a) User Registration/Login: Users register for accounts or log in with existing credentials to
access the system.
b) Input Data Submission: Users input relevant data such as age, gender, medical history, and
diagnostic test results through the UI.
c) Data Processing: The backend server preprocesses input data, trains predictive models, and
generates risk assessment scores.
d) Result Presentation: Prediction results are displayed to users via the UI, along with
visualizations and insights for interpretation.
e) Feedback Collection: Users provide feedback and ratings on their experience with the
system, which are logged for evaluation and refinement.

4. Security and Privacy Considerations:

a) Data Encryption: Sensitive data such as passwords and medical records are encrypted
during transmission and storage to ensure confidentiality.
b) Access Control: Role-based access control mechanisms restrict access to sensitive features
and data based on user roles and permissions.
c) Compliance: The system complies with regulatory standards such as HIPAA to protect
patient privacy and ensure data security.

5. Scalability and Performance:

a) Scalable Architecture: The system architecture is designed to scale horizontally to


accommodate growing user base and data volume.
b) Performance Optimization: Optimization techniques such as caching, indexing, and
parallel processing are implemented to enhance system performance and responsiveness.

21
6. Deployment and Maintenance:

a) Cloud Deployment: The system is deployed on cloud infrastructure (e.g., AWS, Azure) for
scalability, reliability, and accessibility.
b) Continuous Monitoring: Monitoring tools are employed to track system health,
performance metrics, and user interactions for proactive maintenance and troubleshooting.
c) Version Control: Version control systems (e.g., Git) are used to manage codebase changes,
collaborate on development, and ensure code quality.

DATA FLOW DIAGRAM:

22
Fig 3.1 : Data Flow Diagram of the model

3.3 System Design Using Various Tools

3.3.1 Application Development Tools

For application development, the following Software Requirements are:

Operating System: Windows 10 or any Linux Debian Distro.

Language: Python

Shiny Tools: RStudio IDE, Microsoft Excel.

23
Technologies used: R, Unix, Shiny.

3.3.2 Hardware Requirements

Processor: Intel

RAM: 1024 MB

Space on disk: minimum 100mb

For running the application:

Device: Any device that can access the internet

Minimum space to execute: 20 MB

The effectiveness of the proposal is evaluated by conducting experiments with a cluster formed
by 3 nodes with identical setting, configured with an Intel CORE™ i7-4770 processor (3.40GHZ,
4 Cores, 8GB RAM, running Ubuntu 18.04 LTS with 64-bit Linux 4.31.0 kernel).

3.3.3 Software Requirements

Operating System: Any OS with clients to access the internet

Network Visio: Wi-Fi Internet or cellular Network

Studio: Create and design Data Flow and Context Diagram

GitHub: Versioning Control

Google Chrome: Medium to find reference to do system testing, display and run
shinyApp.

3.4 User Interface Design

24
After a model is developed and gives better results on the holdout or the real-world dataset

then we deploy it and monitor its performance. This is the main part where we use our
learning from the data to be applied in real-world applications and use cases.

STEP 1: Choose the desired option from drop down menu

STEP 2: After filling all the particulars click predict to find whether user has a heart disease or
not

25
STEP 3: Model predicting whether the user has heart disease or not. After filling the particulars
we found out that the user has a chance of heart disease.

3.5 Database Design

26
3.5.1 ER Diagram

Fig3.2 : ER Model

3.5.2 Normalization

The normalization is a process of transforming or scaling each value in the instances to an equal
contribution. It handles three major problems of data such as minimizing the bias of features
whose numerical influence is higher in differentiating the corresponding class, outliers, and
which impede the training process of classifier construction [29]. Consider, original data with N
samples and F features which are represented as, D = {Xi,n,Yn i ℇ Fand n ℇ N} where X
represents the data to be trained by classification algorithm and Y is the target class. In this work,
three different normalization technique strategies are taken into consideration to analyze their
effect on the overall performance of the classification algorithm. The data normalization
techniques utilize in this work are described as follows:

MIN-MAX NORMALIZATION (MMN)

27
Min-Max normalization gives a linear transformation to normalize the raw original dataset, this
method rescaled the input data into a new fixed range of -1 to 1 or 0 to 1. This method reserves
the connection between the input value, and the scaled value [3, 27]. The normalized feature is
given as follows: X’i,n = (max Xnew - min Xnew) + min Xnew (1) where Xi is an original value,
Xi’ is the normalized value, max and min denotes the maximum and minimum value of the
feature respectively. The changed rescale value of data is denoted with min Xnew and max Xnew.
In this work, both [-1,1] and [0,1] ranges are considered to analyse the performance of the
classification.

Z-SCORE NORMALIZATION (ZSN)

Z-Score normalization technique uses the standard deviation and mean measures to normalize
(transform or rescale) each input values of feature vector such that subsequent features have unit
variance and a zero mean. Each sample, Xi, n in the dataset is changed into X’i,n. It is also known
as zero-mean normalization [12]. The normalized feature is given through: X’i,n = (2) where µ,
represents the mean and standard deviation value of ith feature respectively, X’i,n is the new
value of a feature, Xi,n is the old value of the feature. where, X’i,n=
(Xi1+Xi2+Xi3+……………X in), is computed as the square root of the variance of Xi,n.

Fig 3.3 : Learning Process

28
29
3.5.3 Database Connection Control And Strings

In a heart disease prediction system, database connection control and handling database
connection strings are essential for managing interactions with the database where user data,
prediction results, and other relevant information are stored. Here's how database connection
control and connection strings were implemented:

1. Database Connection Control:

a) Connection Pooling:Implement connection pooling to efficiently manage database


connections and minimize the overhead of establishing new connections for each user
request. Connection pooling helps improve performance and scalability by reusing existing
connections from a pool of pre-established connections.
b) Connection Management: Use a connection manager or connection pool manager to handle
the lifecycle of database connections, including opening, closing, and recycling connections
as needed. Proper management of database connections ensures optimal resource utilization
and prevents issues such as connection leaks or exhaustion.
c) Error Handling: Implement robust error handling mechanisms to gracefully handle database
connection errors, retries, and failover strategies in case of connectivity issues or database
server downtime. Proper error handling helps maintain system availability and reliability.

2. Database Connection Strings:

a) Connection String Configuration: Store database connection strings in a centralized


configuration file or environment variables to separate configuration from code and
facilitate easy management and updates. This allows for flexibility in changing database
connection parameters without modifying the application code.
b) Security Considerations: Ensure that database connection strings are stored securely and
are not exposed to unauthorized access. Avoid hardcoding connection strings in source code
or configuration files that may be accessible to unauthorized users. Instead, use secure
methods for storing sensitive information, such as encrypted configuration files or secret
management services.
c) Parameterization: Use parameterized connection strings to dynamically construct
connection strings based on runtime parameters or environment variables. Parameterization
allows for customization of connection parameters such as database server address, port

30
number, database name, authentication credentials, etc., based on deployment environments
(e.g., development, staging, production).
d) Connection String Format: Follow standard connection string formats supported by the
database management system (DBMS) being used (e.g., MySQL, PostgreSQL, SQLite).
Refer to the documentation of the specific DBMS for guidelines on constructing connection
strings with the required parameters and options.

3. Implementation Example (Python with SQLAlchemy):

from sqlalchemy import create_engine

# Define database connection string

DB_CONNECTION_STRING =
'postgresql://username:password@hostname:port/database_name'

# Create database engine

engine = create_engine(DB_CONNECTION_STRING)

# Establish database connection

connection = engine.connect()

# Perform database operations (e.g., execute queries, fetch data)

# Close database connection

connection.close()

In this example, SQLAlchemy is used to create a database engine and establish a connection to
the PostgreSQL database using a connection string (`DB_CONNECTION_STRING`). The
connection string follows the format specified by SQLAlchemy for PostgreSQL connections and
includes parameters such as username, password, hostname, port, and database name. Finally,
the connection is closed after performing database operations to release resources.

31
3.6 Methodology Of System

The methodology of the model used in a heart disease prediction system typically involves
several steps, including data collection, preprocessing, model selection, training, evaluation, and
deployment. Here's a general overview of the methodology:

1. Data Collection:

a) Collect relevant data from various sources, such as electronic health records (EHRs), medical
databases, research studies, or patient surveys.
b) Gather demographic information, medical history, lifestyle factors, and diagnostic test results
related to heart disease.

2. Data Preprocessing:

a) Clean the raw data to handle missing values, outliers, and inconsistencies.
b) Perform data transformation, normalization, and feature scaling to prepare the data for
modeling.
c) Encode categorical variables into numerical representations using techniques such as one-hot
encoding or label encoding.

3. Feature Selection and Engineering:

a) Identify relevant features (predictors) that are likely to influence the risk of heart disease.
b) Conduct feature engineering to create new informative features or transform existing features
to improve predictive performance.

4. Model Selection:

a) Choose appropriate machine learning algorithms for heart disease prediction based on the
nature of the problem (classification or regression), data characteristics, and project
requirements.
b) Consider algorithms such as logistic regression, decision trees, random forests, support vector
machines (SVM), or deep learning models (e.g., neural networks).

32
5. Model Training:

a) Split the dataset into training and testing sets for model training and evaluation.
b) Train the selected models on the training data using appropriate techniques such as cross-
validation, hyperparameter tuning, and regularization to optimize model performance.

6. Model Evaluation:

a) Evaluate the trained models on the testing data using performance metrics such as accuracy,
precision, recall, F1-score, area under the receiver operating characteristic curve (AUC-
ROC), or confusion matrix.
b) Perform statistical analysis and validation techniques to assess the robustness, generalization,
and reliability of the models.

7. Model Interpretation:

a) Interpret the prediction results to understand the factors contributing to the risk of heart
disease.
b) Generate visualizations, feature importance plots, or decision explanations to explain the
model's predictions and provide insights to healthcare professionals.

8. Deployment and Monitoring:

a) Deploy the trained model into production or clinical environments for real-time prediction of
heart disease risk.
b) Implement monitoring mechanisms to track model performance, data drift, and model drift
over time, and update the model as needed to ensure continued accuracy and reliability.

9. Ethical Considerations:

Consider ethical and regulatory considerations related to patient privacy, data security, fairness,
transparency, and accountability throughout the model development and deployment process.

33
CHAPTER 4

IMPLEMENTATION, MAINTENANCE, TESTING

4.1 Introduction To Languages, IDE’s, Tools and Technologies used for


Implementation

4.1.1 Programming Language Used:

Python:

Python is chosen as the primary programming language for implementing the heart disease
prediction system due to its versatility, extensive libraries, and popularity in the field of data
science and machine learning. Python offers powerful tools and frameworks for data
manipulation, statistical analysis, and machine learning model development.

4.1.2 Integrated Development Environments (IDEs):

Jupyter Notebook:

Jupyter Notebook provides an interactive and exploratory environment for developing and
testing machine learning models. Its ability to combine code, visualizations, and explanatory text
in a single document makes it ideal for prototyping and experimentation.

PyCharm:

PyCharm is a robust Python IDE that offers advanced features such as code completion,
debugging, version control integration, and project management. It provides a professional
development environment for writing, testing, and debugging code efficiently.

34
4.1.3 Tools and Technologies:

1. Scikit-learn:

Scikit-learn is a comprehensive machine learning library for Python that offers a wide range of
algorithms for classification, regression, clustering, and dimensionality reduction. It provides
tools for data preprocessing, model evaluation, and hyperparameter tuning, making it suitable for
building predictive models for heart disease.

2. TensorFlow and Keras:

TensorFlow and Keras are deep learning frameworks that enable the development and training
of neural network models for complex pattern recognition tasks. They offer high-level APIs and
powerful abstractions for building and optimizing deep learning architectures, which can
complement traditional machine learning approaches in predicting heart disease.

3. Pandas and NumPy:

Pandas and NumPy are fundamental libraries for data manipulation and numerical computation
in Python. Pandas offers data structures such as DataFrames for handling structured data, while
NumPy provides support for mathematical operations on arrays and matrices. These libraries are
essential for preprocessing input data and performing feature engineering tasks.

4. Matplotlib and Seaborn:

Matplotlib and Seaborn are Python libraries for data visualization. Matplotlib offers a wide range
of plotting functions for creating static and interactive visualizations, while Seaborn provides a
higher-level interface for generating statistical graphics. These libraries are valuable for
visualizing data distributions, relationships, and model performance metrics.

5. Flask:

Flask is a lightweight web framework for Python that enables the deployment of machine
learning models as web services. It allows the heart disease prediction system to be accessible
through a web-based interface, facilitating user interaction and integration with other systems or
applications.

35
4.2 Coding standards of Language used

When developing the heart disease prediction system, adhering to coding standards is crucial for
ensuring code quality, maintainability, and readability. Since Python is the primary programming
language chosen for the implementation, we can follow the widely accepted PEP 8 guidelines,
which are the official style guide for Python code. Here are some key coding standards to be
followed:

1. Indentation:

a) Used 4 spaces per indentation level.


b) Avoided tabs for indentation.

2. Line Length:

a) Limited the lines to a maximum of 79 characters.


b) Broke lines at appropriate points using parentheses or backslashes for readability.

3. Imports:

a) Imported modules and packages on separate lines.


b) Grouped imports in the following order: standard library imports, related third-party imports,
and local application/library specific imports.
c) Used absolute imports over relative imports.

4. Whitespace:

a) Surrounded operators with a single space on both sides.


b) Used a single space after commas, colons, and semicolons within data structures (e.g., lists,
dictionaries).
c) Used blank lines to separate logical sections of code.

5. Naming Conventions:

a) Used descriptive names for variables, functions, classes, and modules.


b) Used lowercase for variable names, separated by underscores (snake_case).
c) Used CamelCase for class names.
36
d) Avoid single-character variable names unless used as iterators in short loops.

6. Function and Method Definitions:

a) Defined functions and methods with descriptive names that convey their purpose.
b) Used docstrings to provide documentation for functions and methods.
c) - Followed a consistent style for function and method definitions (e.g., def my_function() vs.
def myFunction()).

7. Comments:

a) Used comments to explain non-obvious code or provide context for complex algorithms.
b) Wrote comments in complete sentences with proper grammar and punctuation.
c) Avoided redundant comments that merely repeat what the code does.

8. Error Handling:

a) Used try-except blocks for handling exceptions gracefully.


b) Caught specific exceptions rather than using broad except blocks.
c) Included informative error messages to aid in debugging and troubleshooting.

9. Testing:

a) Wrote unit tests to verify the correctness of individual components and functions.
b) Followed a consistent naming convention for test functions and test classes (e.g.,
test_function_name, TestClassName).
c) Aimed for comprehensive test coverage to identify and prevent regressions.

4.3 Testing Techniques and Test Plans

Testing Techniques and Test Plans for Heart Disease Prediction System:

4.3.1 Testing Techniques:

37
a. Unit Testing:

1. Objective: Validate the functionality of individual components, functions, and methods


within the heart disease prediction system.
2. Implementation: Write unit tests using Python's built-in `unittest` or third-party testing
frameworks such as `pytest`. Test cases should cover various input scenarios and edge cases
to ensure robustness.

b. Integration Testing:

a) Objective: Verify the interaction and interoperability of different modules and


components within the heart disease prediction system.
b) Implementation: Develop integration test cases to validate data flow, communication
between modules, and overall system behavior. Use mock objects or stubs to simulate
dependencies and external services.

c. System Testing:

1. Objective: Evaluate the heart disease prediction system as a whole, including its user
interface, data processing, predictive modeling, and result interpretation.
2. Implementation: Perform end-to-end testing of the system using real-world scenarios and
test data. Validate system performance, scalability, and usability under varying conditions.

d. Acceptance Testing:

1. Objective: Confirm that the heart disease prediction system meets the specified requirements
and satisfies user expectations.
2. Implementation: Collaborate with stakeholders to define acceptance criteria and conduct
acceptance testing. Demonstrate the system's functionality to users and gather feedback for
iterative improvements.

38
4.3.2 Test Plans:

a. Unit Test Plan:

1. Scope: Focus on testing individual functions, methods, and classes within the heart disease
prediction system.
2. Test Cases: Define test cases for each function/method, covering normal behavior, boundary
conditions, and error handling.
3. Tools: Utilize Python testing frameworks (e.g., `unittest`, `pytest`) for writing and executing
unit tests.
4. Coverage: Aim for high test coverage to ensure thorough validation of code functionality.

b. Integration Test Plan:

1. Scope: Validate the integration and interaction between different modules, components, and
external dependencies of the heart disease prediction system.
2. Test Cases: Design integration test cases to verify data flow, communication protocols, and
error handling mechanisms.
3. Tools: Use tools for mocking and stubbing (e.g., `unittest.mock`) to simulate interactions
with external services and components.
4. Scenarios: Include test scenarios for typical use cases, edge cases, and failure scenarios to
assess system resilience.

c. System Test Plan:

1. Scope: Evaluate the heart disease prediction system as a whole, including its user interface,
data processing pipeline, predictive modeling algorithms, and result interpretation
capabilities.
2. Test Cases: Develop comprehensive test cases covering end-to-end system functionality,
performance benchmarks, and usability testing.
3. Tools: Utilize automation frameworks for system testing, along with manual testing
techniques for exploring user interfaces and user experience.
4. Scenarios: Include test scenarios representing real-world usage patterns and user interactions
to validate system behavior in diverse environments.

39
d. Acceptance Test Plan:

1. Scope: Confirm that the heart disease prediction system meets user requirements and
expectations, as specified in the project documentation.
2. Test Cases: Define acceptance criteria based on user stories, use cases, and functional
requirements. Develop test scenarios to demonstrate system capabilities to stakeholders.
3. Tools: Use demonstration tools, user acceptance testing (UAT) frameworks, and feedback
collection mechanisms to facilitate acceptance testing.
4. Feedback: Gather feedback from stakeholders during acceptance testing sessions and
incorporate necessary changes to meet user needs and preferences.

40
CHAPTER 5

RESULTS AND DISCUSSION

5.1 User Interface Representation

The user interface (UI) of the heart disease prediction system should be intuitive, user-friendly,
and visually appealing to ensure ease of use for both healthcare professionals and individuals
seeking to assess their risk of heart disease. Below is a conceptual representation of the UI
components and features is incorporated into the system:

1. Landing Page:

a) The landing page serves as the entry point to the heart disease prediction system.
b) It provides a brief overview of the system's purpose and functionality.
c) Users can access login/registration options or proceed directly to the prediction tool.

2. User Registration/Login:

a) Users can create accounts or log in to access personalized features and save their prediction
results.
b) Registration may require basic user information such as name, email address, and password.
c) Login functionality allows returning users to access their accounts securely.

3. Input Data Form:

a) The input data form allows users to enter relevant information for heart disease risk
assessment.
b) Fields may include demographic details (age, gender), medical history (family history, past
diagnoses), lifestyle factors (smoking status, exercise habits), and diagnostic test results
(blood pressure, cholesterol levels).
c) Form validation ensures that all required fields are filled out correctly before proceeding to
prediction.

41
4. Predictive Model Output:

a) Upon submitting the input data, the system generates a prediction of the user's risk of heart
disease.
b) Prediction results are displayed prominently, indicating the calculated risk score along with
any associated confidence intervals or risk categories.
c) Users can view additional details about the prediction methodology and interpretation
guidelines.

5. Visualizations and Insights:

a) Visual representations such as charts, graphs, or diagrams are accompanied to the prediction
results to aid in interpretation.
b) Insights and explanations are provided alongside visualizations to help users understand the
factors influencing their risk of heart disease.

6. Personalized Recommendations:

a) Based on the prediction results, the system can offer personalized recommendations for
disease prevention and management.

7. User Profile and History:

a) Users can access their profiles to view past prediction results, update personal information,
and manage account settings.
b) A history log displays a record of previous predictions, including date/time stamps, input
data, and corresponding risk scores.

8. Additional Resources:

a) The UI provide links to educational resources, articles, and guidelines on heart disease
prevention, risk factors, and treatment options.
b) Users can access information about the scientific basis of the prediction model, data sources,
and validation studies.

42
9. Help and Support:

a) The UI include options for users to seek help, report issues, or contact support staff if they
encounter difficulties or have questions.
b) Help documentation, FAQs, and user guides are available to assist users in navigating the
system and understanding its features.

5.1.1 Brief Description of Various Modules of the system

1. User Management Module:

• This module is responsible for managing user accounts, registration, authentication, and
profile management.
• Users can create accounts, log in securely, and update their personal information.
• It handles user permissions, access control, and session management.

2. Input Data Collection Module:

• The input data collection module allows users to input relevant data for heart disease risk
assessment.
• It provides a user-friendly interface for entering demographic information, medical history,
lifestyle factors, and diagnostic test results.
• Data validation ensures that all required fields are filled out correctly.

3. Data Preprocessing Module:

• The data preprocessing module is responsible for cleaning, transforming, and preparing input
data for predictive modeling.
• It handles tasks such as handling missing values, normalizing features, and encoding
categorical variables.
• Data preprocessing ensures that input data is standardized and suitable for analysis.

4. Predictive Modeling Module:

• The predictive modeling module builds machine learning models to predict the risk of heart
disease based on input data.
43
• It utilizes algorithms such as logistic regression, decision trees, random forests, or deep
learning models.
• Model training involves fitting the data to the chosen algorithm and optimizing model
parameters.

5. Result Interpretation Module:

• The result interpretation module interprets prediction results and provides insights into the
calculated risk scores.
• It generates visualizations, charts, or explanations to help users understand the factors
contributing to their risk of heart disease.
• Personalized recommendations for disease prevention and management may be included
based on the prediction results.

6. User Interface Module:

• The user interface module encompasses the design and implementation of the graphical user
interface (GUI) for the heart disease prediction system.
• It provides an intuitive and user-friendly interface for users to interact with the system, input
data, view results, and access additional features.
• The UI module ensures a seamless user experience across different devices and screen sizes.

7. Data Management Module:

• The data management module handles the storage, retrieval, and manipulation of data used
by the heart disease prediction system.
• It may involve interacting with databases, file systems, or external data sources to store input
data, prediction results, and user profiles.
• Data security and privacy measures are implemented to protect sensitive information.

8. Integration and Deployment Module:

• The integration and deployment module focuses on integrating various system components
and deploying the heart disease prediction system in production environments.
• It ensures compatibility with existing healthcare systems, web servers, and infrastructure
requirements.
44
• Continuous integration (CI) and continuous deployment (CD) practices may be employed to
automate testing, build, and deployment processes.

>> In a heart disease prediction system implemented in Python, several modules and libraries are
commonly used to facilitate various tasks, including data preprocessing, predictive modeling,
result interpretation, and user interface development. Here are some key Python modules and
libraries frequently utilized in such systems:

1. Pandas:

Pandas is a powerful data manipulation library that provides data structures such as DataFrame
and Series, making it easy to clean, transform, and analyze structured data. It is often used for
loading and preprocessing datasets before model training.

2. NumPy:

NumPy is a fundamental library for numerical computing in Python. It offers support for multi-
dimensional arrays, mathematical functions, and linear algebra operations, which are essential
for data manipulation and statistical analysis in predictive modeling.

3. Scikit-learn:

Scikit-learn is a comprehensive machine learning library that provides tools and algorithms for
data mining, classification, regression, clustering, and dimensionality reduction. It offers a wide
range of models and utilities for building predictive models in heart disease prediction systems.

4. TensorFlow / Keras:

TensorFlow and Keras are deep learning frameworks for building and training neural network
models. They offer high-level APIs and abstractions for constructing complex architectures,
enaling the implementation of deep learning algorithms for heart disease risk assessment.

5. Matplotlib / Seaborn:

Matplotlib and Seaborn are visualization libraries used for creating static, interactive, and
publication-quality plots and charts. They are valuable for visualizing data distributions,
relationships, and model performance metrics in the heart disease prediction system.
45
6. Scipy:

Scipy is a scientific computing library that provides functions for optimization, integration,
interpolation, and statistical analysis. It complements NumPy and provides additional tools for
advanced numerical computations and statistical modeling.

7. Flask / Django:

Flask and Django are web frameworks for building web applications in Python. They offer
features for routing, request handling, template rendering, and user authentication, making them
suitable for developing user interfaces and deploying heart disease prediction systems as web
services.

8. Pickle:

Pickle is a module used for serialization and deserialization of Python objects. They are often
used to save trained machine learning models to disk and reload them later for prediction,
allowing for easy model persistence and deployment.

5.2 Snapshots of system


The following pictures shows our deployed model. The UI of the model has been created
using Scikit learn library.

STEP 1: Choose the desired option from drop down menu

46
Fig: Screenshot of Working application

47
STEP 2: After filling all the particulars click predict to find whether user has a heart disease or
not

Fig: Filling the various choices

48
STEP 3: Model predicting whether the user has heart disease or not. After filling the particulars
we found out that the user has a chance of heart disease.

Fig : Working model

49
5.3 Back End Representation

STEPS FOR IMPLEMENTATION:

1. Install the required packages for building the model.

2. Load the libraries into the workspace from the packages.

3. Read the input data set.

4. Normalize the given input dataset.

5. Divide this normalized data into two parts: A. Train data. B. Test data (Note: 80% of
Normalized data is used as Train data, 20%the Normalized data is used as Test data).

1. Importing the necessary libraries

a) All the important libraries are imported which are used on the dataset. These libraries
include all the libraries of mathematical purposes, data visualization and the machine
learning libraries.

Fig 5.1 : Importing libraries


50
b) After formulating any problem statement the main task is to calculate data that can help
us in our analysis and manipulation. Sometimes data is collected by performing some
kind of survey and there are times when it is done by performing scrapping.

Fig 5.2 :Reading the dataset

2.Analyzing the dataset

Most of the real-world data is not structured and requires cleaning and conversion into
structured data before it can be used for any analysis or modelling.

Fig 5.3 : Analyzing dataset

51
Fig 5.4 : Analyzing the values

3.Data PreProcessing And Cleaning

This is the step in which we try to find the hidden patterns in the data at hand. Also, we try to
analyze different factors which affect the target variable and the extent to which it does so.
How the independent features are related to each other and what can be done to achieve the
desired results all these answers can be extracted from this process as well. This also gives us
a direction in which we should work to get started with the modeling process.

Fig 5.5 : Checking the null values

52
Fig 5.6 : Calculating the Null values

4.Data Visualization

Data visualization is the graphical representation of information and data in a pictorial or


graphical format (Visualization of Data could be: charts, graphs, and maps). Data
visualization tools provide an accessible way to see and understand trends, patterns in data,
and outliers. Data visualization tools and technologies are essential to analyzing massive
amounts of information and making data-driven decisions.

53
4.1.1 Using Bar Graph Representation

Fig 5.7 : Code for bar chart representation

Fig 5.8 : Bar chart represesnting levels of various features

54
4.1.2 Using Scattter Plot

Fig 5.9 : Code for scatter chart representation

Fig 5.10 :Scatter chart for age vs cholesterol level

55
4.Model Building

Different types of machine learning algorithms as well as techniques have been developed
which can easily identify complex patterns in the data which will be a very tedious task to be
done by a human.

Fig 5.11 : splitting target and features column

LOGISTIC REGRESSION MODEL

Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False,
etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.

The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:

1. We know the equation of the straight line can be written as:


56
2. In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):

3. But we need range between -[infinity] to +[infinity], then take logarithm of the equation it
will become:

Fig 5.12 : Logistic Regression Model

K-NEAREST NEIGHBORS

K-NN algorithm assumes the similarity between the new case/data and available cases and put
the new case into the category that is most similar to the available categories.

57
K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.

Below are some points to remember while selecting the value of K in the K-NN algorithm:

1. There is no particular way to determine the best value for "K", so we need to try some values
to find the best out of them. The most preferred value for K is 5.
2. A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers
in the model.
3. Large values for K are good, but it may find some difficulties.

Fig 5.12 : K-Nearest Neighbor

GAUSSIAN NB

Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems. It is mainly used in text classification that includes a
high-dimensional training dataset. Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine learning models that can make
quick predictions.

1. Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
2. The formula for Bayes' theorem is given as:

58
Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

Fig 5.14 : Gaussian NB model

4. Building a Predictive System


After a model is developed and gives better results on the holdout or the real-world dataset
then we deploy it and monitor its performance. This is the main part where we use our learning
from the data to be applied in real-world applications and use cases.

Fig 5.15 : Predictive System


59
5.3.1 Snapshots of Database Tables with brief description

Fig 5.16 : Database Table

This dataset consists of features that can be used to predict which patients have a high risk of
heart disease. It has 303 rows and 14 columns.

Source: This data comes from the University of California Irvine's Machine Learning
Repository at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.

60
CHAPTER 6

CONCLUSION AND FUTURE SCOPE

The proposed system is GUI-based, user-friendly, scalable, reliable and an expandable system.
The proposed working model can also help in reducing treatment costs by providing Initial
diagnostics in time. The model can also serve the purpose of training tool for medical students
and will be a soft diagnostic tool available for physician and cardiologist. General physicians can
utilize this tool for initial diagnosis of cardio-patients. There are many possible improvements
that could be explored to improve the scalability and accuracy of this prediction system. As we
have developed a generalized system, in future we can use this system for the analysis of different
data sets. The performance of the health’s diagnosis can be improved significantly by handling
numerous class labels in the prediction process, and it can be another positive direction of
research. In DM warehouse, generally, the dimensionality of the heart database is high, so
identification and selection of significant attributes for better diagnosis of heart disease are very
challenging tasks for future research.

61
REFERENCES

• A, A. S., & Naik, C. (2016). Different Data Mining Approaches for Predicting Heart
Disease, 277 281. https://doi.org/10.15680/IJIRSET.2016.0505545 Beyene, C., &
Kamat, P. (2018).

• Survey on prediction and analysis the occurrence of heart disease using data mining
techniques. International Journal of Pure and Applied Mathematics, 118(Special Issue 8),
165–173. https://www.scopus.com/inward/record.uri?eid=2-s2.0 Retrieved
85041895038&partnerID=40&md5=2f0b0c5191a82bc0c3f0daf67d73bc81 from
Brownlee, J. (2016).

• Naive Bayes for Machine Learning. Retrieved March 4, 2019, from


https://machinelearningmastery.com/naive-bayes-for-machine-learning/ of Kirmani, M.
(2017).

• Cardiovascular Disease Prediction using Data Mining Techniques. Oriental Journal


Computer Science https://doi.org/10.13005/ojcst/10.02.38 and Technology, 10(2), 520–
528. Polaraju, K., Durga Prasad, D., & Tech Scholar, M. (2017).

• Prediction of Heart Disease using Multiple Linear Regression Model. International


Journal of Engineering Development and Research, 5(4), 2321–9939. Retrieved from
www.ijedr

62

You might also like