Internship Report DiabetesPrediction
Internship Report DiabetesPrediction
ON
“DIABETES PREDICTION”
SUBMITTED BY
DHANYA G
(4SO18CS034)
JENNIFER SERRAO
(4SO18CS053)
AT
ZEPHYR TECHNOLOGIES
This dissertation would not have been possible without the guidance and the help of
several individuals and organizations who in one way or another contributed and
extended their valuable assistance during this internship project.
I would like to extend my special thanks to Mr. Vedanth Shenoy, Faculty at Zephyr
technologies for his constant guidance throughout this internship.
Finally, I would like to thank my family and friends for their blessings, and for
helping me in all aspects and appreciating me to spend all the time in the work during
my internship period and lending their helping hand in successfully completing the
project.
DHANYA G
(4SO18CS034)
JENNIFER SERRAO
(4SO18CS053)
ABOUT THE COMPANY
Their tools are professionalism, skills and expertise that translate into delivering
quality work at every step for any project we undertake. They work towards getting
better than the best out of every team member at ZEPHYR TECHNOLOGIES,
which means when you hire them all round quality is assured off as you want it. Their
Advantage Quality includes protection of intellectual for the source codes developed
specifically for your business. They do not sell the source codes to the third parties and
all elements that they create for your web solution belongs to you. ZEPHYR
TECHNOLOGIES project managers and business analysts place great value for
building a clean communication link with you as they consider it the key ingredient for
the success of any project at hand.
ABSTRACT
In this study, we used K-Nearest Neighbour (KNN), Logistic Regression (LR) and
neural network to predict diabetes mellitus. The trained data is tested and then validated
by making a comparison between actual and predicted data. The neural networks with
different algorithms and functions were trained with diabetic parameters and and the
outcome is predicted in this study. After training and testing ; the results were compared
to check the efficiency of the system. Lastly, the outcome prediction after training,
testing are obtained that are quite accurate and through comparison outlined that the
actual and predicted data for these areas illustrated finest results using the certainly
different diabetic parameters with minimum error observed.
DIABETES PREDICTION
TABLE OF CONTENTS
1. Introduction ......................................................................... 6
4. Methodology ..................................................................... 13
5. Results .............................................................................. 14
6. Conclusion ...................................................................... 15
5
DIABETES PREDICTION
CHAPTER 1
INTRODUCTION
Diabetes is a common chronic disease and poses a great threat to human health. The
characteristic of diabetes is that the blood glucose is higher than the normal level, which
is caused by defective insulin secretion or its impaired biological effects, or both.
Diabetes can lead to chronic damage and dysfunction of various tissues, especially eyes,
kidneys, heart, blood vessels and nerves. Diabetes can be divided into two categories,
type 1 diabetes (T1D) and type 2 diabetes (T2D). Patients with type 1 diabetes are
normally younger, mostly less than 30 years old. The typical clinical symptoms are
increased thirst and frequent urination, high blood glucose levels. This type of diabetes
cannot be cured effectively with oral medications alone and the patients are required
insulin therapy. Type 2 diabetes occurs more commonly in middle-aged and elderly
people, which is often associated with the occurrence of obesity, hypertension,
dyslipidemia, arteriosclerosis, and other diseases.
Recently, numerous algorithms are used to predict diabetes, including the traditional
machine learning method, such as support vector machine (SVM), decision tree (DT),
logistic regression and so on. Machine learning methods are widely used in predicting
diabetes, and they get preferable results. Decision tree is one of popular machine
learning methods in medical field, which has grateful classification power. Random
forest generates many decision trees. Neural network is a recently popular machine
learning method, which has a better performance in many aspects. So in this study, we
used decision tree, random forest (RF) and neural network to predict the diabetes.
6
DIABETES PREDICTION
CHAPTER 2
SYSTEM DESIGN
7
DIABETES PREDICTION
• Software requirements
• Languages: Python
• Operating Systems: Windows, Linux, etc.
• Back End Software: Anaconda, Jupyter notebook.
• Hardware Requirements:
• CPU: Intel Pentium IV 600MHz
• Hard disk space: 20 GB or more
• Memory: 4 GB RAM
8
DIABETES PREDICTION
CHAPTER 3
IMPLEMENTATION DETAILS
Algorithms
Ability of system to automatically learn and improve from experience without being
explicitly programmed is called machine learning and it focuses on the development
of computer programs that can access data and use it to learn by themselves. And
classifier can be stated as an algorithm that is used to implement classification
especially in concrete implementation, it also refers to a mathematical function
implemented by algorithm that will map input data into category. It is an instance of
supervised learning i.e., where training set of correctly identified observations is
available.
KNN is a supervised machine learning algorithm. KNN helps to solve both the
classification and regression problems. KNN is lazy prediction technique. KNN
assumes that similar things are near to each other. Many times, data points which
are similar are very near to each other. KNN helps to group new work based on
similarity measure. KNN algorithm record all the records and classify them
according to their similarity measure. For finding the distance between the points
uses tree like structure. To make a prediction for a new data point, the algorithm
finds the closest data points in the training data set its nearest neighbors.
9
DIABETES PREDICTION
Algorithm:
a. Take a sample dataset of columns and rows named as credit card data
set.
e. Then with the help of these minimum distance and Euclidean distance
find out the nth column of each.
For example: Suppose there are two categories, i.e., Category A and
Category B, and we have a new data point x1, so this data point will lie in
which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class
of a particular dataset. Consider the below diagram:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
10
DIABETES PREDICTION
B. Logistic Regression
11
DIABETES PREDICTION
Advantages:
• Logistic regression is easier to implement, interpret, and very efficient
to train.
• It is very fast at classifying unknown records.
Disadvantages:
• If the number of observations is lesser than the number of features,
Logistic Regression should not be used, otherwise, it may lead to over
fitting.
• Logistic Regression requires average or no multicollinearity
between independent variables.
12
DIABETES PREDICTION
CHAPTER 4
METHODOLODY
Step 5: Experimenting and Trying to get the best accuracy with three different
methods. (In our Project, 3 different methods with which I experimented is K-NN,
13
DIABETES PREDICTION
CHAPTER 5
RESULTS
1. KNN
• Accuracy – 80.5194 %
2. Logistic Regression
• Accuracy : log_model - 81.1688 %
svc_model - 81.1688 %
14
DIABETES PREDICTION
CONCLUSION
In this study, systematic efforts are made in designing a system which results in
the prediction of disease like diabetes. During this work, three machine learning
classification algorithms are studied and evaluated on various measures.
Experiments are performed on Pima Indians Diabetes Database. Experimental
results determine the adequacy of the designed system with an achieved
accuracy of 80.5 % using KNN and accuracy of 81.1% using LR classification
algorithms. In future, the designed system with the used machine learning
classification algorithms can be used to predict or diagnose other diseases. The
work can be extended and improved for the automation of diabetes analysis
including some other machine learning algorithms.
15