50% found this document useful (2 votes)
668 views45 pages

Heart Disease Prediction Using Machine Learning Report

This document describes a student project that aims to predict heart disease using machine learning. It was submitted by three students - C.Shivaram Reddy, SK.Nagur Basha, and S.Indrasena Reddy - to fulfill their Bachelor of Technology degree in Computer Science and Engineering. The project is supervised by Mrs.B.N.Swarna Jyothi and evaluates algorithms like Naive Bayes, decision trees, support vector machines, and KNN for heart disease prediction.

Uploaded by

Sanga Anil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
668 views45 pages

Heart Disease Prediction Using Machine Learning Report

This document describes a student project that aims to predict heart disease using machine learning. It was submitted by three students - C.Shivaram Reddy, SK.Nagur Basha, and S.Indrasena Reddy - to fulfill their Bachelor of Technology degree in Computer Science and Engineering. The project is supervised by Mrs.B.N.Swarna Jyothi and evaluates algorithms like Naive Bayes, decision trees, support vector machines, and KNN for heart disease prediction.

Uploaded by

Sanga Anil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

HEART DISEASE PREDICTION USING

MACHINE LEARNING

Project report submitted

in partial fulfillment of the requirement for award of the degree of

Bachelor of Technology
in
Computer Science and Engineering

by

C.SHIVARAM REDDY (U18CN227)


SK.NAGUR BASHA (U18CN191)
S.INDRA SENA REDDY (U18CN206)

Under the guidance of

Mrs.B.N.Swarna Jyothi

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SCHOOL


OF COMPUTING
BHARATH INSTITUTE OF HIGHER EDUCATION AND RESEARCH
(Deemed to be University Estd u/s 3 of UGC Act, 1956)

CHENNAI 600073, TAMILNADU, INDIA

April, 2022

I
CERTIFICATE
This is to certify that the project report entitled “Heart Disease Prediction Using Machine
Learning” submitted by “C.Shivaram Reddy (U18CN227), SK.Nagur Basha (U18CN191),
S.Indrasena Reddy (U18CN206)” to the Department of Computer Science and Engineering,
Bharath Institute of Higher Education and Research, in partial fulfillment for the award of the degree
of B. Tech in (Computer Science and Engineering) is a bonafide record of project work carried out
by them under my supervision. The contents of this report, in full or in parts, have not been
submitted to any other Institution or University for the award of any other degree.

Mrs.B.N.Swarna
Jyothi
Computer Science & Engineering
School of Computing
Bharath Institute of Higher Education and Research
April, 2022

Dr.B.Persis Urbana Ivy


Professor & Head
Computer Science & Engineering
School of Computing
Bharath Institute of Higher Education and Research
April, 2022

INTERNAL EXAMINER EXTERNAL EXAMINER

II
DECLARATION
We declare that this project report titled “Heart Disease Prediction Using Machine Learning”
submitted in partial fulfillment of the degree of B.Tech in (Computer Science and Engineering)
is a record of original work carried out by us under the supervision of Mrs.B.N.Swarna Jyothi,
and has not formed the basis for the award of any other degree or diploma, in this or any other
Institution or University. In keeping with the ethical practice in reporting scientific information,
due acknowledgements have been made wherever the findings of others have been cited.

C.Shivaram Reddy
(U18CN227)

SK.Nagur Basha
(U18CN191)

S.Indrasena Reddy
(U18CN206)

Chennai

Date: 28/ 04/2022

III
ACKNOWLEDGMENT
First, we wish to thank the almighty who gave us good health and success throughout our
project work.

We express our deepest gratitude to our beloved President Dr. J. Sundeep Aanand, and
Managing Director Dr.E. Swetha Sundeep Aanand for providing us the necessary facilities for
the completion of our project.

We take great pleasure in expressing sincere thanks to Vice Chancellor Dr. K. Vijaya
Baskar Raju, Pro Vice Chancellor (Academic) Dr. M. Sundararajan, Registrar Dr. S.
Bhuminathan and Additional Registrar Dr. R. Hari Prakash for backing us in the project.We
thank our Dean Engineering Dr. J. Hameed Hussain for providing sufficient facilities for the
completion of this project.

We express our immense gratitude to our Academic Coordinator Mr. G. Krishna Chaitanya for
his eternal support in completing this project.

We thank our Dean, School of Computing Dr. S. Neduncheliyan for his encouragement and
the valuable guidance throughout the project.

We record indebtedness to our Head, Department of Computer Science and Engineering Dr.
B. Persis Urbana Ivy for immense care and encouragement towards us throughout the course of
this project.

We also take this opportunity to express a deep sense of gratitude to our Internal Supervisor
Mrs.B.N.Swarna Jyothi for her cordial support, valuable information and guidance, he
helped us in completing this project through various stages.

We thank our department faculty, supporting staff and friends for their help and guidance
to complete this project

C.SHIVARAM REDDY (U18CN227)

SK.NAGUR BASHA (U18CN191)

S.INDRASENA REDDY (U18CN206)

IV
ABSTRACT

In today’s era deaths due to heart disease has become a major issue , approximately one
Person dies per every minute due to heart disease. This is considering both male and female
Category and this ratio may vary according to the region also this ratio is considered for
the people of all age groups. This does not indicate that the people with other age group will
not be affected by heart diseases. This problem may start in early age groups also.
To predict the cause and disease became a major challenge nowadays. Here in this paper, We
discussed various algorithms like Navie bayes’s algorithm, Decision tree, Support vector
machine and KNN algorithm and tools used for prediction of heart diseases.

V
TABLE OF CONTENTS

DESCRIPTION PAGE NO

CERTIFICATE II
DECLARATION III
ACKNOWLEDGEMENT IV
ABSTRACT V
LIST OF FIGURES VIII
ABBREVIATIONS IX
1. INTRODUCTION 1
2. LITERATURE SURVEY 3
3. SYSTEM ANALYSIS 7
3.1 EXISTING SYSTEM
7
3.2 DISADVANTAGES
7
3.3 PROPOSED SYSTEM
7
3.4 ADVANTAGES
8
4. SYSTEM SPECIFICATION 9
4.1 HARDWARE REQUIREMENTS
9
4.2 SOFTWARE REQUIREMENTS
9
5. IMPLEMENTATION 10
5.1 MODULES
10
5.2 MODULES DESCRIPTIO.....................................................................................N
10
5.2.1 User module 10
5.2.2 Admin module 11
5.2.3 Disease Analysis module 11
5.2.4 Disease Prediction module 11
5.3 Methodology 11
5.3.1 Navie Bayes Algorithm 11
5.3.2 Decision Tree Algorithm 15
5.3.3 K-Nearest Neighbor 15
5.3.4 Support vector machine 16

VI
6. SYSTEM DESIGN 17
6.1 SYSTEM ARCHITECTURE...................................................................................
17
6.2 DATA FLOW DIAGRAM......................................................................................
18
6.4 USE CASE DIAGRAM............................................................................................
20
6.5 CLASS DIAGRAM..................................................................................................
21
6.6 ACTIVITY DIAGRAM............................................................................................
22
6.7 SEQUENCE DIAGRAM
23
6.8 COLLABORATION DIAGRAM
24
7. SCREENSHOTS OF PROJECT 25 – 28
8. CONCLUSION 29
9. FUTURE ENHANCEMENTS 30
10. REFERENCE 31
SAMPLE CODE
VII

List of Figures
S.NO Fig No Description Page No
1 6.1 System Architecture 17
2 6.2 Data Flow Diagram 18
3 6.4 Use Case Diagram 20
4 6.5 Class Diagram 21
5 6.6 Activity Diagram 22
6 6.7 Sequence Diagram 23
7 6.8 Collaboration Diagram 24
8 7.1 User Login Page 25
9 7.2 User Home Page 25
10 7.3 User Details Page 26
11 7.4 Prediction 26
12 7.5 Detection 27
13 7.6 Danger Analysis 27
14 7.7 Danger Detection 28
15 7.8 Graph 28

VIII
ABBREVIATIONS

WHO - World Health Organization

KNN - K Nearest Neighbor

ADNI - Alzheimer’s Disease Neuroimaging Initiative

MCI - Mild Demographic Index

SDI - Socio Demographic Index

DFA - Direct Fluorescent Antibody

SVM - Support Vector Machine

UML - Unified Modelling Language

DFD - Data Flow Diagram


IX
CHAPTER 1
Introduction

The contents of this paper mainly focus on various data mining practices that are valuable in
heart disease forecast with the assistance of dissimilar data mining tools that are accessible. If the
heart doesn’t function properly, this will distress the other parts of the human body such as brain,
kidney etc. Heart disease is a kind of disease which effects the functioning of the heart. In
today’s era heart disease is the primary reason for deaths. WHO-World Health Organization has
anticipated that 12 million people die every year because of heart diseases. Some heart diseases
are cardiovascular, heart attack, coronary and knock. Knock is a sort of heart disease that occurs
due to strengthening, blocking or lessening of blood vessels which drive through the brain or it
can also be initiated by high blood pressure. The major challenge that the Healthcare industry
faces now-a-days is superiority of facility. Diagnosing the disease correctly & providing
effective treatment to patients will define the quality of service. Poor diagnosis causes disastrous
consequences that are not accepted. Records or data of medical history is very large, but these
are from many dissimilar foundations. The interpretations that are done by physicians are
essential components of these data. The data in real world might be noisy, incomplete and
inconsistent, so data preprocessing will be required in directive to fill the omitted values in the
database. Even if cardiovascular diseases is found as the important source of death in world in
ancient years, these have been announced as the most avoidable and manageable diseases. The
whole and accurate management of a disease rest on on the well-timed judgment of that disease.
An correct and methodical tool for recognizing high-risk patients and mining data for timely
analysis of heart infection looks a serious want. Different person body can show different
symptoms of heart disease which may vary accordingly. Though, they frequently include back
pain, jaw pain, neck pain, stomach disorders, and tininess of breath, chest pain, arms and
shoulders pains. There are a variety of different heart diseases which includes heart failure and
stroke and coronary artery disease. Heart expert’s create a good and huge record of patient’s
database and store them. It also delivers a great prospect for mining a valued knowledge from
such sort of datasets.

There is huge research going on to determine heart disease risk factors in different patients,
different researchers are using various statistical approaches and numerous programs of data
mining approaches. Statistical analysis have acknowledged the count of risk factors for heart
diseases counting smoking, age, blood pressure, diabetes, total cholesterol, and hypertension,
heart disease training in family, obesity and lack of exercise. For prevention and healthcare of
patients who are about to have addicted of heart disease it is very important to have awareness of
heart diseases.
2

CHAPTER 2
Literature survey

1)Machine learning-based method for Personalized and cost-effective


detection of Alzheimer's disease
AUTHORS: Escudero J, Ifeachor E, Zajicek JP, Green C, Shearer J,
Pearson S
YEAR : 2012

Diagnosis of Alzheimer's disease is often difficult, especially early in the disease process at the
stage of mild cognitive impairment.Yet, it is at this stage that treatment is most likely to be
effective, so there would be great advantages in improving the diagnosis process. We describe
and test a machine learning approach for personalized and cost-effective diagnosis of AD. It uses
locally weighted learning to tailor a classifier model to each patient and computes the sequence
of biomarkers most informative or cost-effective to diagnose patients. Using ADNI data, we
classified AD versus controls and MCI patients who progressed to AD within a year, against
those who did not. The approach performed similarly to considering all data at once, while
significantly reducing the number (and cost) of the biomarkers needed to achieve a confident
diagnosis for each patient. Thus, it may contribute to a personalized and effective detection of
AD, and may prove useful in clinical settings.

2)Effect of Meterorological Conditions on Occurance of Hand,Foot and


Mouth Disease in Wuwei City, Northwestern china
AUTHORS : Shan Zheng, Minzhen Wang, Shigong Wang, Kezheng Shang,
Lilli Hu, Jinrong Dong
YEAR : 2014

The main objective of this paper is to supply scientific basics for preventing and forecasting the
prevalence of hand, foot and mouth disease to explore the effect of different meteorological
conditions on occurrence of hand, foot and mouth disease in Wuwei City, northwestern China.
Here the data about the diseases and weather was collected from 2008-2010, and the correlation
analysis, multiple linear regression and exponential curve fitting methods were made. The results
showed that 2688 cases of hand, foot and mouth disease were collected from 2008 to 2010, and
the annual average incidence was 47.62/100,000. The average prevalence of hand, foot and
mouth disease at Liangzhou District, Minqin County, Gulang County and Tianzhu Tibetan
Autonomous County were 42.69, 38.52, 65.92 and 49.18 per 100,000 respectively. This disease
occurred year-round in Wuwei City, but had a clear seasonal climax. Generally, the incidence
increased from April and rose to the first peak in May, Jun, July respectively. The second peak
was in September or October every year. Different meteorological factors had different impact
on the epidemic of disease in four areas, such as average temperature, relative humidity,
atmospheric pressure, rainfall and evaporation capacity. The results of multiple linear regressions
indicated that relative humidity and atmospheric pressure were the main influence factors in
Liangzhou District, average temperature in Gulang County, atmospheric pressure in Tianzhu
County. The incidence of the disease and average sunshine hours showed exponential function
relationship in Minqin County. In conclusion, different weather conditions have different impact
on the prevalence of hand, foot and mouth disease. A high correlation exists in four areas of
Wuwei City between meteorological factors and hand, foot and mouth disease occurrence. And
summer and autumn were the important seasons to prevent and control the disease.

3)Developing an Index for Detection and Identification of Disease Stages


AUTHORS : Zhang Y, Long JD, Mills JA, Warner JH, Lu W, Paulsen JS
YEAR : 2014

Spectral data have been widely used to estimate the disease severity levels of different plants.
However, such data have not been evaluated to estimate the disease stages of the plant. This
study aimed at developing a spectral disease index that is able to identify the stages of wheat leaf
rust disease at various DS levels.

To meet the aim of the study, the reflectance spectra of infected leaves with different symptom
fractions and DS levels were measured with a spectroradiometer.
Then, pure spectra of the different disease symptoms at the leaf scale were analyzed, and a new
function was developed to find the wavelengths most sensitive to disease symptom fraction. The
reflectance spectra with highest sensitivity were found at 675 and 775 nm. Finally, the
normalized difference of DS and the ratio ρ675/ρ775 was used as a new SDI to discriminate
three different levels of the disease stage at the canopy level. The suggested SDI showed a
promising performance to improve the detection disease stages in precision plant protection.

4)Quantized Analysis for Heart Valve Disease based on Cardiac Sound


Characteristic Waveform Method

AUTHORS : Hu Yuliang ; Qiao Junxuan ; Wang Haibin ; Wei Xiubo


YEAR : 2010

In order to analyze heart valve disease accurately and effectively, a new quantized diagnosis
method was proposed to analyze four clinical heart valve sounds, namely cardiac sound
characteristic waveform.BIOPAC acquiring system was used to collect signal. The recorded data
is transmitted to a computer by ethernet for storage ǃ ǃanalysis and display in real-time.
Analytical model of single degree-of- freedomwas established to extract characteristic
waveform. Furthermore, diagnosis parameters were calculated to discriminate heart sound of
normal and heart valve disease by easy-understanding graphical representation, so that, even for
an inexperienced user is able to monitor his or her pathology progress easily. Finally, a case
study on a heart valve disease patient before and after surgery is demonstrated to validate the
usefulness and efficiency of the proposed method.

5)Non-Linear Analysis of Heart Rate Variability in Patients with Coronary


Heart Disease
AUTHORS : Chengyu Liu1, David Springer2, Qiao Li1, Benjamin Moody3,
Ricardo Abad Juan4,5, Francisco J Chorro6
YEAR : 2003

The article emphasizes clinical and prognostic significance of non-linear measures of the heart
rate variability, applied on the group of patients with coronary heart disease and age-matched
healthy control group. Three different methods were applied: Hurst exponent, Detrended
Fluctuation Analysis and approximate entropy.Hurst exponent of the R-R series was determined
by the range rescaled analysis technique. DFA was used to quantify fractal long-range-
correlation properties of heart rate variability. Approximate entropy measures the
unpredictability of fluctuations in a time series. It was found that the short-term fractal scaling
exponent. The patients with CHD had lower Hurst exponent in each program of exercise test
separately, as well as aproximate entropy than healthy control group.
6
CHAPTER 3
System Analysis

3.1 Existing system

The before all existing system works on sets of both Deep learning and data mining. The existing
system modules generates comprehensive report by implementing the strong prediction
algorithm The main aims of the existing system to compare and check the before patient whose
having disease outputs and new patient disease and determine future possibilities of the heart
disease to a particular patient By Implementing the above mentioned model we will get the goal
of developing a system with increased rate of accuracy of estimating the new patient getting
heart attack percentage. The model which is proposed for Heart Attack Prediction System is
invented for using Deep learning algorithms and approach. But by using all the existing systems
the accuracy is very less.
3.2 Disadvantages

 Prediction of heart disease results is not much accurate.


 Less security.
 The system is not fully automated it needs data from user for full diagnosis.
 There is no feedback system.
 Data mining techniques does not help to provide effictive decision making.
 Can not handle datasets for patient records.

3.3 Proposed system

This proposed system have a data which classified if patients have heart disease or not according
to features in it. This proposed system can try to use this data to create a model which tries
predict (reading data and data Exploration) if a patient has this disease or not.

In this proposed system, use logistic regression (classification) algorithm. Implements Naive
Bayes algorithm to getting accuracy result. Finally analysing the results by the help of
Comparing Models and Confusion Matrix. From the data we are having, it should be classified
into different structured data based on the features of the patient heart. From the availability of
the data, we have to create a model which predicts the patient disease using logistic regression
algorithm. First, we have to import the datasets. Read the datasets, the data should contain
different variables like age, gender, sex, cp (chest pain), slope, target. The data should be
explored so that the information is verified. Create a temporary variable and also build a model
for logistic regression. Here, we use sigmoid function which helps in the graphical representation
of the classified data. By using logistic regression, naïve Bayes the accuracy rate increases.

3.4 Advantages

 Easily analyse the disease.


 User can search for doctor’s help at any point of time.
 Reduce the time complexity of doctors.
 Cost effective for patients.
 Very useful in case of emergency.
 User can get instant diagnosis.

CHAPTER 4
System specification

4.1 Hardware requirements

 RAM : 2 - 4GB

 HARD DISK : 500 GB

 SYSTEM : INTEL CORE I3,I5,I7

4.2 Software requirements

 Operating System : Windows 7,8,10


 Programming Language :  Python

 Front End : HTML,CSS

 Back End : Python

 Data Base : MySQL

CHAPTER 5
Implementation

5.1 Modules
 User Module
 Admin Module
 Disease Analysis Module
 Disease prediction Module

5.2 Modules description

5.2.1 User module

In this Module, Patient can act as an user.


1. Login

This is the first activity that opens the website. User needs to provide a correct contact
number and a password, which user enters while registering, in order to login into the app. If
information provided by the user matches with the data in the database table then user
successfully login into the app else message of login failed is displayed and user need to reenter
correct information. A link to the register activity is also provided for registration of new users.

2. Registration

A new user who wants to access the website needs to register first before login. By clicking on
register button in login activity, the register activity gets open. A new user registers by entering
full name, password and contact number. A user needs to enter password again in confirm
password textbox for confirmation. When user enters the information in all textboxes, on the
click of register button, the data is transferred to database and user is directed to login activity
again. Registered user then needs to login in order to access the app.
10
Validations are applied on all the textboxes for proper functioning of the app. Like information
in each textbox is must that is each textbox, either it is of name, contact, password or confirm
password, will not be empty while registering. If any such textbox is empty app will give
message of information is must in each textbox. Also data in password and confirm password
fields must match for successful registration. Another validation is contact number must be valid
one that is of 10 digits. If any such validation is violated then registration will be unsuccessful
and then user needs to register again. message that app will display when one of the field is
empty. If all such information is correct user will be directed to login activity for login into the
app.

5.2.2 Admin module

In this module, admin can add and view new doctor details, disease details and drug details. And
then admin can view feedback provided by various users.

5.2.3 Disease analysis module


In this module, we can analyse the disease and calculate how much probability of the disease
will be caused.

5.2.4 Disease prediction module

Patient will specify the symptoms caused due to his illness. System will ask certain question
regarding his illness and system predict the disease based on the symptoms specified by the
patient and system will also suggest doctors based on the disease.

5.3 Methodology
5.3.1 Navie Bayes Algorithm

It is a classification technique based on Bayes’ Theorem with an assumption of independence


among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.

11

For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability that this fruit is an apple and
that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification
methods.

Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|
c). Look at the equation below:

Above,
 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

How Naive Bayes algorithm works?

Let’s understand it using an example. Below I have a training data set of weather and
corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to
classify whether players will play or not based on weather condition. Let’s follow the below
steps to perform it.

Step 1: Convert the data set into a frequency table.

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.

12

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class.
The class with the highest posterior probability is the outcome of prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

Naive Bayes uses a similar method to predict the probability of different class based on various
attributes. This algorithm is mostly used in text classification and with problems having multiple
classes.

What are the Pros and Cons of Naive Bayes?

Pros:

 It is easy and fast to predict class of test data set. It also perform well in multi class
prediction.
 When assumption of independence holds, a Naive Bayes classifier performs better
compare to other models like logistic regression and you need less training data.
 It perform well in case of categorical input variables compared to numerical variable(s).

13

Cons:

 If categorical variable has a category (in test data set), which was not observed in training
data set, then model will assign a 0 (zero) probability and will be unable to make a
prediction. This is often known as “Zero Frequency”. To solve this, we can use the
smoothing technique. One of the simplest smoothing techniques is called Laplace
estimation.

 On the other side naive Bayes is also known as a bad estimator, so the probability outputs
from predict_proba are not to be taken too seriously.

 Another limitation of Naive Bayes is the assumption of independent predictors. In real


life, it is almost impossible that we get a set of predictors which are completely
independent.

4 Applications of Naive Bayes Algorithms


 Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast.
Thus, it could be used for making predictions in real time.
 Multi class Prediction: This algorithm is also well known for multi class prediction
feature. Here we can predict the probability of multiple classes of target variable.
 Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers
mostly used in text classification (due to better result in multi class problems and
independence rule) have higher success rate as compared to other algorithms. As a result,
it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in
social media analysis, to identify positive and negative customer sentiments)
 Recommendation System: Naive Bayes Classifier and Collaborative Filtering together
builds a Recommendation System that uses machine learning and data mining techniques
to filter unseen information and predict whether a user would like a given resource or not

14

5.3.2 Decision tree Algorithm

Decision tree is a classifcation algorithm that works on categorical as well as numerical data.
Decision tree is used for creating tree-like structures. Decision tree is simple and widely used to
handle medical dataset. It is easy to implement and analyse the data in tree-shaped graph. The
decision tree model makes analysis based on three nodes.

 Root node: main node, based on this all other nodes functions.
 Interior node: handles various attributes.
 Leaf node: represent the result of each test.

This algorithm splits the data into two or more analogous sets based on the most
important indicators. The entropy of each attribute is calculated and then the data are
divided, with predictors having maximum information gain or minimum entropy:

Entropy(S) = ∑c i=1 −Pi log2 Pi,


Gain (S, A) = Entropy(S) − ∑ v∈ Values(A) |Sv| / |S| Entropy (Sv)

The results obtained are easier to read and interpret [3]. This algorithm has higher
accuracy in comparison to other algorithms as it analyzes the dataset in the tree-like
graph. However, the data may be over classifed and only one attribute is tested at a time
for decision-making.

5.3.3 K-Nearest Neighbor(KNN)

K-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. The k-NN algorithm is
among the simplest of all machine learning algorithms. The neighbors are taken from a set of
objects for which the class (for k-NN classification) or the object property value (for k-NN
regression) is known.

15

STEP 1: BEGIN

STEP 2: Input: D = {(x1, c1), . . . , (xN , cN )}

STEP 3: x = (x1. . . xn) new instance to be classified

STEP 4: FOR each labelled instance (xi, ci) calculate d (xi, x)

STEP 5: Order d (xi , x) from lowest to highest, (i = 1. . . N)

STEP 6: Select the K nearest instances to x: Dkx

STEP 7: Assign to x the most frequent class in Dkx

STEP 8: END

5.3.4 Support Vector Machine


Support Vector Machines (SVMs) is a supervised machine learning technique, having great
theoretical foundations and excellent empirical successes. The SVM has the constraint
which makes the total weight for the positive class equal to that of the negative class. This
kind of technique has been applied to different classification tasks such as text classification,
object recognition, as well as prediction tasks.

16

CHAPTER-6
System design
6.1 System Architecture
Fig no: 6.1 System Architecture

6.2 Data flow diagram


Data Flow Diagram (DFD) is a two-dimensional diagram that describes how data is
processed and transmitted in a system. The graphical depiction recognizes each source of data
and how it interacts with other data sources to reach a mutual output. In order to draft a data flow
diagram one must.
 Identify external inputs and outputs
 Determine how the inputs and outputs relate to each other
 Explain with graphics how these connections relate and what they result in.

17
Role of DFD:
 It is a documentation support which is understood by both programmers and
nonprogrammers. As DFD postulates only what processes are accomplished not how
they are performed.
 A physical DFD postulates where the data flows and who processes the data.
 It permits analyst to isolate areas of interest in the organization and study them by
examining the data that enter the process and viewing how they are altered when they
leave.

Fig no : 6.2 Data flow diagram


6.3 Uml diagrams

UML is simply anther graphical representation of a common semanticmodel. UML provides a


comprehensive notation for the full lifecycle of object-oriented development.

18

Advantages
 To represent complete systems (instead of only the software portion) using object
oriented concepts
 To establish an explicit coupling between concepts and executable code
 To take into account the scaling factors that are inherent to complex and critical systems
 To creating a modeling language usable by both humans and machines
 UML defines several models for representing systems
 The class model captures the static structure
 The state model expresses the dynamic behavior of objects
 The use case model describes the requirements of the user
 The interaction model represents the scenarios and messages flows
 The implementation model shows the work units
 The deployment model provides details that pertain to process allocation.

6.4 Use case diagram

Use case diagrams overview the usage requirement for system. They are useful for presentations

to management and/or project stakeholders, but for actual development you will find that use

cases provide significantly more value because they describe “the meant” of the actual

requirements. A use case describes a sequence of action that provides something of measurable

value to an action and is drawn as a horizontal ellipse.

19
User Registration

Sever Deployment

Gives symptoms

Analysis Of Patient health

User
Server

Analysis of Questionaries

Disease Anlaysis

Suggest best drug

Doctor appointment for


corresponding disease

Fig no : 6.4 Use case diagram

6.5 Class diagram

In this class diagram represents how the classes with attributes and methods are linked together

to perform the verification with security. From the above diagram shown the various classes

involved in our project.

20
Server
Sring query

accept()

UserLogin Registration
String Name String Name
String Password String Password
String Email
login() String mobileno

Qustions Health Anlayzes


String Question1 String disease name
String Question2 String Username
String Question3
findmatch()
analyzes()

Fig no : 6.5 Class diagram

6.6 Activity diagram

Activity diagram are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. The activity diagrams can be used to describe
the business and operational step-by-step workflows of components in a system. Activity
diagram consist of Initial node, activity final node and activities in between.

21
Login

Diseas e

Server

Analy s is of Patient
Health Condit ion

Formulat ion
Ques tions

Evidenc e
Gat hering

Res ult ant


Out put

Fig no : 6.6 Activity diagram

6.7 Sequence diagram

Sequence diagram model the flow of logic within your system in a visual manner, enabling you
both to document and validate your logic, and commonly used for both analysis and design
purpose. Sequence diagram are the most popular UML artifact for dynamic modeling, which
focuses on identifying the behavior within your system.

22
User Server Analysis Of Analysis Of Analysis
Deployment Patient Health Question
1: User Registration

2: Creating of Network

3: Gives symptoms

4: check patient health

5: analysis disease

6: Response to user

7: Best drug

8: Recommend the best doctor based on user feed back and fix appoinment

Fig no : 6.7 Sequence diagram

6.8 Collaboration diagram

A collaboration diagram, also called a communication diagram or interaction diagram, is an


illustration of the relationships and interactions among software objects in the Unified Modelling
Language (UML). The concept is more than a decade old although it has been refined as
modelling paradigms have evolved.
23

1: User Registration
User Server
Deployment

3: Gives symptoms
6: Response to user
7: Best drug
Analysis

5: analysis disease

2: Creating of Network
Analysis Of
Patient Health
Analysis Of 8: Recommend the best doctor based on user feed back and fix appoinment
Question 4: check patient health

Fig no : 6.8 Collaboration diagram

24
CHAPTER 7
Screen shots
User Login page

Fig no : 7.1 Login page

User home page

Fig no : 7.2 User home page

25
User Details

Fig no : 7.3 user Details

Prediction

Fig no : 7.4 Prediction

26
Detection

Fig no : 7.5 Detection

Danger Analysis

Fig no : 7.6 Danger Analysis

27
Danger Detection

Fig no : 7.7 Danger detection

Graph

Fig no : 7.8 Graph

28
CHAPTER 8

Conclusion

The main motivation of this project is to provide an insight about detecting and curing
heart disease using data mining technique. For data mining, data were collected from jubilee
mission hospital Thrissur. Collection of data was carried by interacting with patients one to
one and jotting it down. The other mode of collecting data was from discharge summary of
the respective patients. In such a way, a total 20 attributes of nearly 2200 and above patients
were collected. This collected data were then sorted and arranged systematically in Excel
format. Using this data, it can be subjected to different data mining algorithms. From the
medical profiles twenty attributes are extracted such as age, sex, blood pressure and blood
sugar etc. to predict the likelihood of patient getting heart diseases. These attributes are fed
in to Decision tree, Random Forest, KNN, and Navie baye’s algorithm classification
Algorithms in which Navie baye’s algorithm gave the best result with the highest accuracy.
Valid performance is achieved using Navie bayes’s algorithm in diagnosing heart diseases
and can be further improved by increasing the number of attributes.

29
CHAPTER 9
Future Enhancement

In this paper we have presented an efficient approach for fragmenting and extracting
substantial forms from the heart attack data warehouses for the efficient prediction of
heart attack.In our future work, we have planned to conduct experiments on large real
time health datasets to predict the diseases like heart attack and compare the
performance of our algorithm with other related data mining algorithms.

30
CHAPTER 10
REFERENCES

[1] Babu, Sarath, "Heart disease diagnosis using data mining technique."Electronics
Communication and Aerospace Technology (ICECA), 2017 Internationalconference of.Vol. 1.
IEEE, 2017.
[2] Banu, MA Nishara, and B. Gomathy. "Disease forecasting system using data mining
methods." Intelligent Computing Applications (ICICA), 2014 International Conference on.
IEEE, 2014.
[3] Krishnaiah, V., "Diagnosis of heart disease patients using fuzzy classification technique.“
Computer and Communications Technologies (ICCCT), 2014 International Conference on.
IEEE, 2014.
[4] Gandhi, Monika, and Shailendra Narayan Singh. "Predictions in heart disease using
techniques of data mining." Futuristic Trends on Computational Analysis and Knowledge
Management (ABLAZE), 2015 International Conference on. IEEE, 2015.
[5] Purusothaman, G., and P. Krishnakumari. "A survey of data mining techniques on risk
prediction: Heart disease." Indian Journal of Science and Technology 8.12 (2015).
[6] Thomas, J., and R. Theresa Princy. "Human heart disease prediction system using data
mining techniques." Circuit, Power and Computing Technologies (ICCPCT), 2016 International
Conference on. IEEE, 2016.
[7] Banu, NK Salma, and Suma Swamy. "Prediction of heart disease at early stage using data
mining and big data analytics: A survey." Electrical, Electronics, Communication, Computer and
Optimization Techniques (ICEECCOT), 016 International Conference on. IEEE, 2016.
[8] Thanigaivel, R., and K. Ramesh Kumar. "Boosted Apriori: an Effective Data Mining
Association Rules for Heart Disease Prediction System." Middle-East Journal of Scientific
Research 24.1 (2016): 192-200.
[9] Saboji, Rashmi G. "A scalable solution for heart disease prediction using classification
mining technique." 2017 International Conference on Energy, Communication, Data Analytics
and Soft Computing (ICECDS). IEEE, 2017.
31
[10] Sowmiya, C., and P. Sumitra. "Analytical study of heart disease diagnosis using
classification techniques." Intelligent Techniques in Control, Optimization and Signal Processing
(INCOS), 2017 IEEE International Conference. IEEE, 2017.
[11] S. Khemmarat and L. Gao, “Supporting drug prescription via predictive and
personalized query system,” in PervasiveHealth. IEEE, 2015.
[12] C. Knox et al., “Drugbank 3.0: a comprehensive resource for omics research on drugs,”
Nucleic acids research, vol. 39, no. suppl 1, pp. D1035–D1041, 2011.
[13] M. Kuhn et al., “A side effect resource to capture phenotypic effects of drugs,”
Molecular systems biology, vol. 6, no. 1, p. 343, 2010.
[14] M. Kanehisa and S. Goto, “Kegg: kyoto encyclopedia of genes andgenomes,” Nucleic acids
research, vol. 28, no. 1, pp. 27–30, 2000.
[15] T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters, vol. 27, no. 8, pp.
861–874, 2006.

32
Sample Code:

import gc
from django.shortcuts import render
from django.contrib import messages
# Create your views here.
from users.forms import UserRegistrationForm, HeartDataForm
from users.models import UserRegistrationModel, HeartDataModel
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
from django_pandas.io import read_frame
#%matplotlib inline
from sklearn.model_selection import train_test_split
import os
#print(os.listdir())
import warnings
from django.core.paginator import Paginator, PageNotAnInteger, EmptyPage
def UserLogin(request):
return render(request, 'UserLogin.html', {})
def UserRegisterAction(request):
if request.method == 'POST':
form = UserRegistrationForm(request.POST)
if form.is_valid():
print('Data is Valid')
form.save()
messages.success(request, 'You have been successfully registered')
# return HttpResponseRedirect('./CustLogin')
form = UserRegistrationForm()
33
return render(request, 'Register.html', {'form': form})
else:
print("Invalid form")
else:
form = UserRegistrationForm()
return render(request, 'Register.html', {'form': form})
def UserLoginCheck(request):
if request.method == "POST":
loginid = request.POST.get('loginname')
pswd = request.POST.get('pswd')
print("Login ID = ", loginid, ' Password = ', pswd)
try:
check=UserRegistrationModel.objects.get(loginid=loginid, password=pswd)
status = check.status
print('Status is = ', status)
if status == "activated":
request.session['id'] = check.id
request.session['loggeduser'] = check.name
request.session['loginid'] = loginid
request.session['email'] = check.email
print("User id At", check.id, status)
return render(request, 'users/UserHomePage.html', {})
else:
messages.success(request, 'Your Account Not at activated')
return render(request, 'UserLogin.html')
# return render(request, 'user/userpage.html',{})
except Exception as e:
print('Exception is ', str(e))
pass
messages.success(request, 'Invalid Login id and password')
34
return render(request, 'UserLogin.html', {})
def UserAddData(request):
if request.method == 'POST':
form = HeartDataForm(request.POST)
if form.is_valid():
print('Data is Valid')
form.save()
messages.success(request, 'Data Added Successfull')
# return HttpResponseRedirect('./CustLogin')
form = HeartDataForm()
return render(request, 'users/UserAddData.html', {'form': form})
else:
print("Invalid form")
else:
form = HeartDataForm()
return render(request, 'users/UserAddData.html', {'form': form})
def UserDataView(request):
data_list = HeartDataModel.objects.all()
page = request.GET.get('page', 1)
paginator = Paginator(data_list, 10)
try:
users = paginator.page(page)
except PageNotAnInteger:
users = paginator.page(1)
except EmptyPage:
users = paginator.page(paginator.num_pages)
return render(request, 'users/DataView_list.html', {'users': users})
def UserMachineLearning(request):
#gc.collect()

35

You might also like