0% found this document useful (0 votes)
354 views

Internship Report DiabetesPrediction

This document summarizes an internship project report on diabetes prediction. It was submitted by two interns, Dhanya G and Jennifer Serrao, at Zephyr Technologies. The report describes building models using K-Nearest Neighbors, logistic regression, and neural networks to predict diabetes using patient parameters. It finds that neural networks produced the most accurate predictions with minimum error compared to actual data.

Uploaded by

Dhanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
354 views

Internship Report DiabetesPrediction

This document summarizes an internship project report on diabetes prediction. It was submitted by two interns, Dhanya G and Jennifer Serrao, at Zephyr Technologies. The report describes building models using K-Nearest Neighbors, logistic regression, and neural networks to predict diabetes using patient parameters. It finds that neural networks produced the most accurate predictions with minimum error compared to actual data.

Uploaded by

Dhanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

INTERNSHIP PROJECT REPORT

ON

“DIABETES PREDICTION”

SUBMITTED BY

DHANYA G
(4SO18CS034)
JENNIFER SERRAO
(4SO18CS053)

AT

ZEPHYR TECHNOLOGIES

2nd FLOOR,OBERLE TOWERS


BALMATTA,MANGALURU
KARNATAKA-575002, INDIA
ACKNOWLEGEMENT

This dissertation would not have been possible without the guidance and the help of
several individuals and organizations who in one way or another contributed and
extended their valuable assistance during this internship project.

I would like to express our gratitude to employees of Zephyr technologies &


Solutions Pvt. Ltd for providing this internship opportunity under whom I executed
this project. Their constant guidance and willingness to share their vast knowledge
made us understand this project and its manifestations in great depths and helped me
to complete the assigned tasks.

I would like to extend my special thanks to Mr. Vedanth Shenoy, Faculty at Zephyr
technologies for his constant guidance throughout this internship.

Finally, I would like to thank my family and friends for their blessings, and for
helping me in all aspects and appreciating me to spend all the time in the work during
my internship period and lending their helping hand in successfully completing the
project.

DHANYA G
(4SO18CS034)
JENNIFER SERRAO
(4SO18CS053)
ABOUT THE COMPANY

ZEPHYR TECHNOLOGIES & SOLUTIONS PVT. LTD is a software company


delivering high quality, cost effective, reliable result-oriented web and e-commerce
solutions on time for a global clientele. Professionalism, Skill and Expertise are the tools
we use to make the web work for your business bringing in maximum return on your
investment in shortest possible time. We have delivered on IT projects of varying
complexities for their very demanding and internet clients spread across the globe.
They develop unique web solutions which ensures increased efficiency and
competitive advantage for your business and thus to your end users.

Their tools are professionalism, skills and expertise that translate into delivering
quality work at every step for any project we undertake. They work towards getting
better than the best out of every team member at ZEPHYR TECHNOLOGIES,
which means when you hire them all round quality is assured off as you want it. Their
Advantage Quality includes protection of intellectual for the source codes developed
specifically for your business. They do not sell the source codes to the third parties and
all elements that they create for your web solution belongs to you. ZEPHYR
TECHNOLOGIES project managers and business analysts place great value for
building a clean communication link with you as they consider it the key ingredient for
the success of any project at hand.
ABSTRACT

Diabetes mellitus is a chronic disease characterized by hyperglycemia. It may cause


many complications. According to the growing morbidity in recent years, in 2040, the
world’s diabetic patients will reach 642 million, which means that one of the ten adults
in the future is suffering from diabetes.
There is no doubt that this alarming figure needs great attention. With the rapid
development of machine learning, machine learning has been applied to many aspects of
medical health.

In this study, we used K-Nearest Neighbour (KNN), Logistic Regression (LR) and
neural network to predict diabetes mellitus. The trained data is tested and then validated
by making a comparison between actual and predicted data. The neural networks with
different algorithms and functions were trained with diabetic parameters and and the
outcome is predicted in this study. After training and testing ; the results were compared
to check the efficiency of the system. Lastly, the outcome prediction after training,
testing are obtained that are quite accurate and through comparison outlined that the
actual and predicted data for these areas illustrated finest results using the certainly
different diabetic parameters with minimum error observed.
DIABETES PREDICTION

TABLE OF CONTENTS

1. Introduction ......................................................................... 6

2. System Design ................................................................... 7

2.1 Functional Requirements ..................................................7

2.2 Non-Functional Requirements ........................................7

2.3 System Requirements ………………………………….8

3. Implementation Details ..................................................... 9

4. Methodology ..................................................................... 13

5. Results .............................................................................. 14

6. Conclusion ...................................................................... 15

5
DIABETES PREDICTION

CHAPTER 1
INTRODUCTION

Diabetes is a common chronic disease and poses a great threat to human health. The
characteristic of diabetes is that the blood glucose is higher than the normal level, which
is caused by defective insulin secretion or its impaired biological effects, or both.
Diabetes can lead to chronic damage and dysfunction of various tissues, especially eyes,
kidneys, heart, blood vessels and nerves. Diabetes can be divided into two categories,
type 1 diabetes (T1D) and type 2 diabetes (T2D). Patients with type 1 diabetes are
normally younger, mostly less than 30 years old. The typical clinical symptoms are
increased thirst and frequent urination, high blood glucose levels. This type of diabetes
cannot be cured effectively with oral medications alone and the patients are required
insulin therapy. Type 2 diabetes occurs more commonly in middle-aged and elderly
people, which is often associated with the occurrence of obesity, hypertension,
dyslipidemia, arteriosclerosis, and other diseases.

Recently, numerous algorithms are used to predict diabetes, including the traditional
machine learning method, such as support vector machine (SVM), decision tree (DT),
logistic regression and so on. Machine learning methods are widely used in predicting
diabetes, and they get preferable results. Decision tree is one of popular machine
learning methods in medical field, which has grateful classification power. Random
forest generates many decision trees. Neural network is a recently popular machine
learning method, which has a better performance in many aspects. So in this study, we
used decision tree, random forest (RF) and neural network to predict the diabetes.

6
DIABETES PREDICTION

CHAPTER 2
SYSTEM DESIGN

2.1 Functional Requirements


• The model should be able to give accurate and trustworthy predictions.
• The application must show graphical visualization of the predicted results and
data in general.
• The user should be able to enter the input values for prediction.

2.2 Non-Functional Requirements


Non-Functional Requirements will describe how a system should behave and what
limits are constrained on its functionality. It generally specifies the system’s quality
attributes or characteristics.
• Availability: The system should be available to any transaction verification
system.
• Correctness: The accuracy of the system should be as maximum as possible for
better prediction.
• Maintainability: The system should maintain correct history of records.
• Usability: The system should satisfy a maximum number of banking system
needs.

7
DIABETES PREDICTION

2.3 System Requirements

• Software requirements
• Languages: Python
• Operating Systems: Windows, Linux, etc.
• Back End Software: Anaconda, Jupyter notebook.

• Hardware Requirements:
• CPU: Intel Pentium IV 600MHz
• Hard disk space: 20 GB or more
• Memory: 4 GB RAM

8
DIABETES PREDICTION

CHAPTER 3
IMPLEMENTATION DETAILS

Algorithms
Ability of system to automatically learn and improve from experience without being
explicitly programmed is called machine learning and it focuses on the development
of computer programs that can access data and use it to learn by themselves. And
classifier can be stated as an algorithm that is used to implement classification
especially in concrete implementation, it also refers to a mathematical function
implemented by algorithm that will map input data into category. It is an instance of
supervised learning i.e., where training set of correctly identified observations is
available.

A. K-Nearest Neighbor Classifier:

KNN is a supervised machine learning algorithm. KNN helps to solve both the
classification and regression problems. KNN is lazy prediction technique. KNN
assumes that similar things are near to each other. Many times, data points which
are similar are very near to each other. KNN helps to group new work based on
similarity measure. KNN algorithm record all the records and classify them
according to their similarity measure. For finding the distance between the points
uses tree like structure. To make a prediction for a new data point, the algorithm
finds the closest data points in the training data set its nearest neighbors.

Here K= Number of nearby neighbors, it’s always a positive integer. Neighbor’s


value is chosen from set of class. Closeness is mainly defined in terms of
Euclidean distance. The Euclidean distance between two points P and Q i.e. P
(p1, p2, . ,pn) and Q (q1, q2,..qn) is defined by the following equation:-

9
DIABETES PREDICTION

Algorithm:

a. Take a sample dataset of columns and rows named as credit card data
set.

b. Take a test dataset of attributes and rows.

c. Find the Euclidean distance by the help of formula.

d. Then, decide a random value of K is the no. of nearest neighbors

e. Then with the help of these minimum distance and Euclidean distance
find out the nth column of each.

f. Find out the same output values.

For example: Suppose there are two categories, i.e., Category A and
Category B, and we have a new data point x1, so this data point will lie in
which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class
of a particular dataset. Consider the below diagram:

Advantages Of KNN Algorithm

• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
10
DIABETES PREDICTION

Disadvantages Of KNN Algorithm

• Always needs to determine the value of K which may be complex some


time.
• The computation cost is high because of calculating the distance between
the data points for all the training samples.

B. Logistic Regression

Logistic Regression is a supervised classification method that returns the


probability of binary dependent variable that is predicted from the
independent variable of dataset i.e., logistic regression predicts the
probability of an outcome which has two values, either zero or one, no or yes
and false or true. Logistic regression has similarities to linear regression, but,
in linear regression a straight line is obtained, logistic regression shows a
curve. The use of one or several predictors or independent variable is on what
prediction is based, logistic regression produces logistic curves which plots
the values between zero and one. Logistic Regression is a regression model
where the dependent variable is categorical and analyses the relationship
between multiple independent variables. There are many types of logistic
regression model such as binary logistic model, multiple logistic models,

binomial logistic models. Binary Logistic Regression model is used to


estimate the probability of a binary response based on one or more predictors.

Above equation represents the logistic regression in mathematical form

11
DIABETES PREDICTION

Advantages:
• Logistic regression is easier to implement, interpret, and very efficient
to train.
• It is very fast at classifying unknown records.

Disadvantages:
• If the number of observations is lesser than the number of features,
Logistic Regression should not be used, otherwise, it may lead to over
fitting.
• Logistic Regression requires average or no multicollinearity
between independent variables.

12
DIABETES PREDICTION

CHAPTER 4
METHODOLODY

Step 1: Importing Libraries.

Step 2: Data Collection.

Step 3: Data Exploration(Analysis).

Step 4: Data Preparation(Cleaning).

Step 5: Experimenting and Trying to get the best accuracy with three different

methods. (In our Project, 3 different methods with which I experimented is K-NN,

Logistic Regression ). Select the method which gives best accuracy.

Step 6: Training and Evaluating the Machine Learning Model.

Step 7: Interpreting the ML Model.

Step 8: Building Predictive System.

13
DIABETES PREDICTION

CHAPTER 5

RESULTS

1. KNN
• Accuracy – 80.5194 %

2. Logistic Regression
• Accuracy : log_model - 81.1688 %
svc_model - 81.1688 %

14
DIABETES PREDICTION

CONCLUSION

Diabetes mellitus is growing to epidemic proportions, leading to devastating


complications if not treated well. There are many challenges in the successful
treatment of diabetes mellitus because of personal and economic costs incurred
in diabetes therapy. Its long-term consequences translate into enormous human
suffering and economic costs. However, comprehensive diabetes care can delay
the progression of complications, maximize the quality of life, and minimize
healthcare expenditure.

In this study, systematic efforts are made in designing a system which results in
the prediction of disease like diabetes. During this work, three machine learning
classification algorithms are studied and evaluated on various measures.
Experiments are performed on Pima Indians Diabetes Database. Experimental
results determine the adequacy of the designed system with an achieved
accuracy of 80.5 % using KNN and accuracy of 81.1% using LR classification
algorithms. In future, the designed system with the used machine learning
classification algorithms can be used to predict or diagnose other diseases. The
work can be extended and improved for the automation of diabetes analysis
including some other machine learning algorithms.

15

You might also like