Machine Learning Project
Machine Learning Project
SYSTEM
17/U/9932/ITD/PD
December, 2020
Declaration
I, Taibo Nicholas Ijjo, declares that this research carried out towards
development of a diabetes testing system is my original work and has
never been submitted to any university, institution of higher learning or
any related field for any academic award or related awards. All the work
from other authors have been acknowledged and cited.
(The Researcher)
i
APPROVAL
This is to certify that this research proposal titled: “A Machine Learning
Driven Diabetes Prediction System” has been carried out under my/our
supervision and is now ready for submission to the Examinations Board
and Senate of Kyambogo University.
Signature………………………………
Date……………………………………...
Ahishakiye Emmanuel
(Supervisor)
Signature………………………………
Date……………………………………...
Taremwa Danison
(Supervisor)
ii
Definition of Terms Used
iii
Table of Contents
APPROVAL .............................................................................................ii
iv
2.1 Introduction to Artificial intelligence. .......................................... 8
2.5 How the proposed system will address these gaps? ................... 25
v
2.6 Chapter Summary......................................................................... 25
vi
CHAPTER FOUR: SYSTEMS ANALYSIS AND
REQUIREMENTS COLLECTION .................................................... 34
vii
5.2 System design using Entity relationship diagrams .................... 43
Bibliography ........................................................................................... 58
Appendices .............................................................................................. 66
viii
CHAPTER ONE: INTRODUCTION
1.0 Introduction:
This chapter presents the background of the study, problem statement,
objectives of the study, research questions, scope of the study,
significance of the study and the chapter summary.
There are several machine learning models that have been used to address
the challenge of diabetes, some of which, includes (Alghamdi et al.,
2017), Predicting diabetes mellitus using SMOTE and ensemble machine
learning approach,(Zou et al., 2018), Predicting Diabetes Mellitus With
Machine Learning Techniques and many more other, however, there is no
2
implementation of this model into an application or devices that patients
can use, thus this research implements a mobile application that shall be
used to provide an early warning to patience.
this poses the need for a machine learning system that shall provide
instant result to the patients with easy access.
3
1.3 Objectives of study:
1.3.1 General Objective
The general objective of the study is to develop and implement an
automated system for prediction of diabetes, using XGBoost
algorithm.
5
1.6 Significance of the research
To the government, the study helps to minimize on the death rate of
diabetes patience and cost of producing drugs/medicine for diabetes
patients.
6
7
CHAPTER TWO: LITERATURE REVIEW
2.0 Introduction.
9
understanding of the human genome,(Andrew Ng, n.d.). Machine
learning is so pervasive today that you probably use it dozens of times a
day without knowing it, (Andrew Ng, n.d.). machine learning mainly fall
under three categories, namely supervised, semi-supervised and
unsupervised machine learning, (IBM Cloud Education, 2020a).
11
Figure 2.3: linear regression model; Source: (JavaTpoint, n.d.)
12
In real life, there aren’t many problems in the world that exhibit a clear
relationship between the independent and dependent variables, thus
limiting the applicability of linear regression, (AKM Adib, 2018).
13
• This algorithm allows models to be updated easily to reflect new
data, unlike decision trees or support vector machines. The update
can be done using stochastic gradient descent.
• logistic regression is less prone to over-fitting, In a low dimensional
dataset with sufficient training examples,
• Logistic Regression is very efficient with linearly separable data.
Disadvantages of logistics regression
• Logistics regression work with linearly separable data, which is
rarely found in real world scenarios, thus there is need for
transformation of non-linearly separable data, which is done by
increasing the number of features to make the data linearly separable
in a higher dimension
• Logistic Regression requires a large dataset and also sufficient
training examples for all the categories it needs to identify.
• It is difficult to capture complex relationships using logistic
regression. More powerful and complex algorithms such as Neural
Networks can easily outperform this algorithm.
2.2.2.1 XGBoost
XGBoost is a short for “Extreme Gradient Boosting”, it was introduced by
Chen in 2014. Since its introduction, XGBoost has become one of the most
popular machine learning algorithm, (SauceCat, 2017)., According to
(Vishal Morde, 2019), XGBoost is a decision-tree-based ensemble
Machine Learning algorithm that uses a gradient boosting framework. In
prediction problems involving unstructured data (images, text, etc.)
artificial neural networks tend to outperform all other algorithms or
frameworks. However, when it comes to small-to-medium
structured/tabular data, decision tree based algorithms are considered best-
in-class right now, (Vishal Morde, 2019).
Advantages of XGBoost
15
• XGBoost has an in-built capability to handle missing values, when
XGBoost encounters a missing value at a node, it tries both the left
and right hand split and learns the way leading to higher loss for
each node. It then does the same when working on the testing data.
• XGBoost allows user to run a cross-validation at each iteration of
the boosting process and thus it is easy to get the exact optimum
number of boosting iterations in a single run
Disadvantages of XGBoost
16
the model are not changed). The contribution of the weak learner to the
ensemble is based on the gradient descent optimization process and the
calculated contribution of each tree is based on minimizing the overall
error of the strong learner, (Hussain Mujtaba, 2020).
17
2.2.3 Deep learning algorithms
18
According to (Margaret Rouse, 2018), A convolutional neural network
(CNN) is a type of artificial neural network used in image recognition and
processing that is specifically designed to process pixel data, CNNs are
powerful image processing algorithms that use deep learning to perform
both generative and descriptive tasks, often using machine vison that
includes image and video recognition, along with recommender systems
and natural language processing (NLP).
19
• Down-sampling the Pooling layer uses the principle of image local
correlation to subsample the image, which can reduce the amount of
data processing while retaining useful information, and further
reduce the number of parameters by removing samples that are not
important in the Feature Map.
21
• Recurrent neural network is even used with convolutional layers to
extend the effective pixel neighborhood.
Disadvantages of Recurrent Neural Network
1. Training an RNN is a very difficult task.
2. It cannot process very long sequences if using tanh or relu as an
activation function.
2.3 Some of the applications of Machine learning algorithms in
disease diagnosis
Several studies have been carried out for the prediction of diabetes using
machine learning, some of which include:
The study (Islam Ayon & Milon Islam, 2019) proposed a strategy for the
diagnosis of diabetes using deep neural network by training its attributes
in five-fold and ten-fold cross validation, it used the Pima Indian Diabetes
(PID) data set, which is retrieved from the UCI machine learning
repository database, the study obtained accuracy of 98.35%, F1 score of
98, and MCC of 97 for five-fold cross-validation. Additionally, accuracy
of 97.11%, sensitivity of 96.25%, and specificity of 98.80% are obtained
for ten-fold cross-validation using deep neural networks. However, ….
The study (Kaur & Kumari, 2019) used machine learning technique with
Pima Indian diabetes dataset to develop trends and detect patterns with
risk factors using R data manipulation tool, which are, linear kernel
support vector machine (SVM-linear), radial basis function (RBF) kernel
support vector machine, k-nearest neighbor (k-NN), artificial neural
22
network (ANN) and multifactor dimensionality reduction (MDR), the
study obtained accuracy of 0.89 using linear kernel SVM model, 0.84
using radial basis function kernel SVM, accuracy, 0.88 using k-NN
model, 0.86 using ANN and 0.83 using MDR based model. However, …
The study (Mujumdar & Vaidehi, 2019) stated that In existing method,
the classification and prediction accuracy is not so high, thus it
proposed a diabetes prediction model for better classification of
diabetes, with a classification accuracy boosted with new dataset
compared to existing dataset, and a pipeline model for diabetes
prediction intended towards improving the accuracy of classification.
The study (Lai et al., 2019) proposed a predictive model using the most
recent records of 13,309 Canadian patients aged between 18 and 90 years,
along with their laboratory information (age, sex, fasting blood glucose,
body mass index, high-density lipoprotein, triglycerides, blood pressure,
and low-density lipoprotein), the models used Logistic Regression and
Gradient Boosting Machine (GBM) techniques and area under the
receiver operating characteristic curve (AROC) to evaluate the
discriminatory capability of these models.
The study (Sumitra Menaria, 2018) used Back propagation algorithm, J48
algorithm, Naïve Bayes classifier and Support vector machine, and
achieved an accuracy of 83.11%, using Back propagation algorithm,
23
78.26% using J48 algorithm, 78.97% using Naive Bayes classifier and
81.69 using Support vector machine.
The study (Rahul Joshi, 2017) used KNN, Naïve Bayes, Random forest,
and J48, to make an ensemble hybrid model by combining individual
techniques into one in order to increase the performance and accuracy.
24
2.5 How the proposed system will address these gaps?
This study concentrated on development of a mobile application, that used
XGBoost algorithm. The choice of XGBoost because XGBoost is the
leading model for working with standard tabular data (the type of data you
store in Pandas Data Frames, as opposed to more exotic types of data like
images and videos), and XGBoost models also dominate many Kaggle
competitions. The proposed system shall be used to create an early
warning about diabetes. This will make people to take early precautions
about diabetes, thus reducing the work of doctors and also reduce
government expenditure on diabetes diagnosis
25
CHAPTER THREE: RESEARCH METHODOLOGY
3.0 Introduction
This chapter describes the method used for data collection and analysis,
systems study and analysis, system requirements and specification, the
design and modeling method, the implementation testing and validation
of the system, it also explains the way the information used for the design
and implementation of the system is obtained.
The data used in this study was obtained from UCI machine learning
repository, (Michael Kahn, n.d.), the dataset was recorded in the period of
2012 to 2016, for use in the AAAI Spring Symposium on Artificial
Intelligence in Medicine, the dataset is made of 8 predictor variables
namely, pregnancies, glucose, diastolic, triceps, insulin, bmi, dpf and age
and one dependent variable called diabetes, the model will use 8 feature
for the prediction which are pregnancies, glucose, diastolic, triceps,
insulin, age, weight and height.
The model was trained on the data set using cross validation of 5 k-folds,
and it achieved and average accuracy of 78,8%.
3.2 System study and analysis methods
26
3.2.1 System Study methods
The researcher studied and analyzed the current process and systems for
testing diabetes, to get the loophole in the existing systems, the
information used was acquired by reviewing existing documents, review
of documents was chosen since it is easy and cheaper to acquire
information, that is already in existence. Currently people test for
diabetes, manually by visiting their doctors at the health center, who has
to take some sample test of the required parameters, and assess their
diabetic status, (Kareem et al., 2020), currently there are models
developed for diabetes prediction but mostly not put into
production,(Sisodia & Sisodia, 2018) , which remains of no use to the
people.
This section defines the core operations expected of the proposed system,
which are termed as the user requirements, system requirements,
functional and non-functional requirements.
These are user expectations of the system, they give the researchers the
functionalities of the system that need to be implemented, so that the users
can use the system. these includes;
• Capture user data, the system should capture the data of the users
that are required for the prediction process.
27
• Produce user results instantly, the system should produce the result
of the users, within a short period of time.
• Store user details, the system should store user details securely, for
future reference.
• Authenticate users, the system should enable user login, such that
they can access their personalized details.
3.3.2 System Requirements.
These are requirements that define functionalities of the system and its
components, these helps the researcher with the kind of features that has
to be included on the system, and it also defines the way how the system
will operate. these includes;
28
• Ability to authenticate the system users, this is by use of the use
login accounts, to enable each user to get access to their individual
data.
• The system should be able to detect wrong values entries and
suggest possible remedies to the errors in the input.
• The system should be able to predict the chances of a person having
diabetes from the collected user data.
3.3.4 Non-Functional requirements
29
design and development of a mobile app that will be used to create early
warning for diabetes. The objectives for the system design includes,
accuracy, usability, reliability, security and confidentiality.
Trained
model
Return prediction
Call the model
Mobile
App
Send request API server
The diabetes prediction app is modelled using data flow diagrams, this is
a graphical representation of the flow of data within the system. It is made
up of entities and process flows, these flow is as follows.
30
• The API then calls the trained model,
• The model runs prediction of the data, and sends the result to the
API,
• The API then sends the result back to the mobile app.
• The API also saves a copy of the user data plus the predictions to a
storage database.
• The mobile app delivers the results to the patients.
3.5 System implementation, testing and validation methods.
3.5.1 System implementation methods
The API server is hosted on heroku cloud server and the mobile
application will be available for download from the expo website.
31
The system was tested at various levels of development before
deployment to production for use. The main testing that were be done are
user acceptance testing, system testing, integration testing and unit
testing. Unit testing was carried out during development to ensure that all
units of the program achieves its requirement, integration testing is carried
out to ensure that all program units integrate with each other without
errors, system testing is done after integration by the researcher to find
out if the developed system fulfills the system requirements, and user
acceptance testing is done after completing the development of the system
to ensure that users will be able to use the system, a selected group of
users will be requested to use the system and give their feedback about
the system and its usability.
This chapter covered the data collection method used in the research
giving its advantages and disadvantages, the system study and analysis
methods, explaining the current systems and process giving their strengths
and weaknesses, the system requirements and specifications, giving the a
detailed explanation of the different requirements of the system, which
32
are system requirements, user requirements, functional requirements and
non-functional requirements, system design and modeling methods,
where data flow diagrams are used for modeling the system, System
implementation, testing and validation methods, explaining in details the
system implementation where the app is hosted on expo website and the
API server is hosted on heroku cloud servers, the testing methods used
and stages where each is used and the validation method used explaining
why each method used is important.
33
CHAPTER FOUR: SYSTEMS ANALYSIS AND REQUIREMENTS
COLLECTION
4.0 Introduction
This chapter discusses the current procedure of testing for diabetes and
current systems that are in existence for prediction of diabetes, analysis of
the weaknesses and strength, comparative
analysis of the weaknesses and strengths of the current system.
requirements for the new systems which includes; System requirements,
functional requirements, nonfunctional
requirements and user requirements, it also explains that the new system
is not a replacement for the current existing procedure of testing for
diabetes but it’s a system that shall run alongside the existing manual
means of testing for diabetes, since the new system is only intended to
create an early warning to people about their diabetes status, or risk factors
but not actual testing of diabetes.
From the research carried, it was noted that the current process of testing
for diabetes is a manual process where a patient has to visit a doctor at a
health center, make an appointment with the doctor and have his
measurements taken. The patient then has to wait for the result to be
processed, which most times takes like a day or two, before the results can
be released. These is a tiresome and tedious work to do, it’s also noted
34
that due to the longtime of waiting and long procedures to follow to get
the testing for diabetes done, some people feel reluctant to go for diabetes
testing. In relation to the manual process, there exists models developed
for prediction of diabetes, but the research reveals that most of this models
only end at the stage of training the machine learning model but never put
into production, thus this model is only helpful to few people who are
literate about machine learning, but not the general public, therefore
leaving the tedious work on the doctors and the reluctant behavior of the
public about testing for diabetes unsolved.
Analysis of the data collected reveals some of the strengths of the current
system.
• Most people feel comfortable about the current system, since they
have a direct contact with a medical doctor.
• The system might be tedious but at last the advice and
recommendations of the doctors is one thing that might be had to be
provided by computers systems, and still people will trust words
direct from a doctor much more than as compared to what is read on
a computer system.
• If dragonized by the right doctors, there is always high accuracy of
results.
4.1.2 Weaknesses of the Current system
35
From the analysis of the collected data it was found that the current system
has many weaknesses some of which are;
The above weaknesses render the system inefficient to cab the challenges
faces in the prediction of diabetes, does the need for a readily available
easy to use system that can help to improve on the efficiency of the
existing system.
This defines the core operations expected of the proposed system, which
are termed as the user requirements, system requirements, functional and
non-functional requirements.
• Capture user data, the system should capture the data of the users
that are required for the prediction process.
• Produce user results, the system should produce the result of the
users, within a short period of time.
• Store user details, the system should store user details securely, for
future reference.
• Authenticate users, the system should enable user login, such that
they can access their personalized details.
4.2.2 System Requirements.
These are requirements that define functionalities of the system and its
components, these helps the researcher with the kind of features that has
37
to be included on the system, and it also defines the way how the system
will operate. these include;
39
CHAPTER FIVE: SYSTEM DESIGN, IMPLEMENTATION,
TESTING AND VALIDATION
5.0 Introduction
This chapter cover the system design, database design, model design,
system implementation, user interface, system program code and system
testing and validation
5.1 System design using data flow diagrams.
Systems design is the process of defining elements of a system like the
modules, architecture, components, interfaces and data for a system based
on the system requirements. System design is important as it visualizes
the application before it is developed.
A data-flow diagram is a way of representing a flow of data through a
system, it provides information about the outputs and inputs of each entity
and the process the input goes through to give the outputs. A data flow
diagram is made of several components which includes;
Data Flow.
Data flow represents the transfer of data from one part of the system to
another, it uses an arrow as its simple and it is identified by a name which
determines what information is being moved.
40
Processes.
These are the transformations that the data passes through right from input
to output, it is represented by rounded rectangles or ovals in a DFD, each
process is identified by a unique name.
Data Store
External entities.
41
External entities are factors outside of the system that have an influence
on the system. Such as the patience who shall be using the system.
External entities may
be data source, sinks, actors or terminators, they are represented by a
closed rectangle in the
data flow diagram and are drawn at the edges of the diagram.
42
Patient
Register
User guide
Login
Data Base
System
About Us
home page
About
Diabetes
Diabetes
Trained model
prediction
43
Attribute Type Description
Id Integer Primary key
Username Integer Foreign key
Glucose Integer Sugar level of patient
Age Integer Age of patient
Height Integer Patients height
Weight Integer Patients weight
Insulin Integer Level of insulin
produced in the
patient’s body
Diastolic Integer The rate of hate beat.
Triceps Integer The skin fold of the
patient
Pregnancy Integer Number of
pregnancies,
applicable to ladies
only.
Gender String The gender of the
patient
44
5.2.2 Entity relationship diagram.
Accounts table
Id (PK)
Id (PK)
Pass word
Username
Glucose
Age
Height
Wei ght
Ins ulin
Triceps
Diastolic
Pregnanci es
Gender
46
47
5.3.2 Sample code
This section presents a sample code of the system implementations. The
code snippet bellow is for the home page and predictions form, of the
mobile application
render() {
return (
<ScrollView contentContainerStyle={styles.container}>
<TouchableOpacity onPress={()=>this.props.navigation.navigate("F
orm")} style={styles.buttonsBorder}>
<View style={styles.button}>
<Text style={styles.buttonText}>Diagnosis</Text>
</View>
</TouchableOpacity>
48
<TouchableOpacity onPress={()=>this.props.navigation.navigate("A
boutDiabetes")} style={styles.buttonsBorder}>
<View style={styles.button}>
</View>
</TouchableOpacity>
<TouchableOpacity onPress={()=>this.props.navigation.navigate("A
boutUs")} style={styles.buttonsBorder}>
<View style={styles.button}>
</View>
</TouchableOpacity>
<TouchableOpacity onPress={()=>this.props.navigation.navigate("U
serGuide")} style={styles.buttonsBorder}>
<View style={styles.button}>
</View>
</TouchableOpacity>
</ScrollView>
49
);
container: {
height: Dimensions.height,
alignItems: 'center',
justifyContent: 'center',
backgroundColor:'#cdc',
flexGrow: 1,
},
title: {
fontSize: 20,
fontWeight: 'bold',
},
buttonsBorder:{
borderRadius:50,
height:"10%",
50
backgroundColor:'#898',
width:"80%",
margin:"5%",
},
button:{
borderRadius:50,
height:"60%",
width:"90%",
alignSelf: 'center',
alignContent: 'center',
backgroundColor:'green',
margin:'5%'
},
buttonText:{
textAlign: 'center',
textAlignVertical:'center',
paddingVertical:15
51
});
52
a cross validation of 5-folds and it presented an accuracy of 78.8%, this
was done to ensure that the model produce results of high accuracy.
6.0 Introduction
This chapter finalizes the research process, by summarizing the findings,
in this chapter the researcher also made recommendation for the system
developed, discussed the challenges faced during the study, and also
suggested areas that would need improvement, the researcher then
finally gave conclusive remarks about the system and the study
6.1 Discussion
The researcher managed to achieve the objectives of the study as stated
in chapter one of this document. This led the implementation of a
friendly mobile application for the prediction of diabetes, that was build
using react native framework of JavaScript programming language, the
53
mobile application consumes an API that was developed in Flask, a
framework of Python programming language that runs a machine
learning model build using python programming language and XGBoost
machine learning algorithm, the database of the system runs on
PostgreSQL and is hosted on heroku cloud servers. The system has been
tested by both the researcher and some selected end users, who
appreciated the system for its user friendliness and ease of access and
use.
6.2 Recommendation
The general public is encouraged to use the application as an early
warning application for diabetes, this will help people to be aware of
their health status especially regarding diabetes and also reduce the
strain on health specialist and government expenditure on diabetes
diagnosis. the system is still open to improvement, thus relevant changes
to improve the systems usability and accuracy of results can be
suggested and implemented, access to the system shall be unrestricted
for now, and all user feedbacks are welcomed. The users of this system
are also recommended to always keep up to date with the most current
version of the system, since there are still changes that are being made to
the system.
54
6.3 Challenges faced in the study
Several challenges were faced by the researcher during the study,
especially during data collection, system designing and implementation
of the system some of which includes.
55
• There is need to implement a hardware system for obtaining the
required parameters that is compatible with mobile devices to
directly feed the results of each test to the mobile application, such
as a glucose meter that can feed results to a mobile device.
• There is need to fine tune the algorithm to improve on the accuracy
of the results.
6.5 Conclusion
The main aim for this study was to develop a mobile application to
create an early warning for diabetes, which has been achieved. The
previous system for testing diabetes was hectic and time consuming,
however this system is not a replacement to the previously existing
manual testing procedure, but rather it is to work in parallel with the
existing process so as to ease the work of doctors and also increase
access to testing kit for those who might not be able to access health
centers. The whole processes and procedures used in the study have
been documented for those who might be interested on taking part in
improving the system or those who might be researching in a related
field.
56
improvement, the researcher then finally gave conclusive remarks about
the system and the study
57
Bibliography
AKM Adib. (2018, November 6). Basics of linear regression. Linear
regression. You may have come… | by AKM Adib | Data Driven
Investor | Medium. https://medium.com/datadriveninvestor/basics-of-
linear-regression-9b529aeaa0a5
Alghamdi, M., Al-Mallah, M., Keteyian, S., Brawner, C., Ehrman, J., &
Sakr, S. (2017). Predicting diabetes mellitus using SMOTE and
ensemble machine learning approach: The Henry Ford ExercIse
Testing (FIT) project. PLoS ONE, 12(7).
https://doi.org/10.1371/journal.pone.0179805
Bahendeka, S., Wesonga, R., Mutungi, G., Muwonge, J., Neema, S., &
Guwatudde, D. (2016). Prevalence and correlates of diabetes mellitus
in Uganda: A population-based national survey. Tropical Medicine
and International Health, 21(3), 405–416.
https://doi.org/10.1111/tmi.12663
59
FRANKENFIELD, J. (2020). Artificial Intelligence (AI) Definition.
https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp
Hussain Mujtaba. (2020, June 6). What is Gradient Boosting and How is
it different from AdaBoost?
https://www.mygreatlearning.com/blog/gradient-boosting/
Islam Ayon, S., & Milon Islam, M. (2019). Diabetes Prediction: A Deep
Learning Approach. Information Engineering and Electronic
Business, 2, 21–27. https://doi.org/10.5815/ijieeb.2019.02.03
60
Kareem, A., Shi, L., Wei, L., & Tao, Y. (2020). A Comparative Analysis
and Risk Prediction of Diabetes at Early Stage using Machine
Learning Approach A Comparative Analysis and Risk Prediction of
Diabetes at Early Stage using Machine Learning Approach.
International Journal of Future Generation Communication and
Networking, 13(3), 4151–4163.
Kaur, H., & Kumari, V. (2019). Predictive modelling and analytics for
diabetes using a machine learning approach.
https://doi.org/10.1016/j.aci.2018.12.004
Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019).
Predictive models for diabetes mellitus using machine learning
techniques. BMC Endocrine Disorders, 19(1), 101.
https://doi.org/10.1186/s12902-019-0436-6
Meenal Dhande. (2020, July 3). What is the difference between AI,
61
machine learning, and deep learning.
https://www.geospatialworld.net/blogs/difference-between-ai-
machine-learning-and-deep-learning/
62
Rahul Joshi, M. A. (2017, October 10). Analysis and prediction of
diabetes diseases using machine learning algorithm: Ensemble
approach. https://d1wqtxts1xzle7.cloudfront.net/54976577/IRJET-
V4I1077.pdf?1510396014=&response-content-
disposition=inline%3B+filename%3DAnalysis_and_prediction_of_d
iabetes_dise.pdf&Expires=1609615692&Signature=d53PjZd1LeZZ
OpycZUihAyDeaTmdWOOYLogUCBfBJdxwu0-
~oDY~VdddTPXDFXTc4eJ1gLsRroIPgmzn-
jfsehiFzpn8rkTqOSnk7lLLX9JETtlCvEk10JfwfRIlGCtO84GfaBUp
83hEKVcw~DEyO84L~zb9nsOPBBE2g8NT69-
YQKyDfd1gQUVG1z5cToPFKz-P-
bP1ALACf05N23n01snFIRztnv1eT0ei3mszK09GjP3FKYyG5Io5E
L~DjKba2smZhcU2sw9sOmM8l~weNaDBjy-
0kOmeAyaVvIErOLkOjlpDRFncpWZ0TxpFayaLaJi96qKpec8Xlk
AdegL-Iw__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA
63
Sisodia, D., & Sisodia, D. S. (2018). Prediction of Diabetes using
Classification Algorithms. Procedia Computer Science, 132, 1578–
1585. https://doi.org/10.1016/j.procs.2018.05.122
Zimmermann, M., Bunn, C., Namadingo, H., Gray, C. M., & Lwanda, J.
(2018). Experiences of type 2 diabetes in sub-Saharan Africa: a
64
scoping review. Global Health Research and Policy, 3(1), 1–13.
https://doi.org/10.1186/s41256-018-0082-y
Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., & Tang, H. (2018). Predicting
Diabetes Mellitus With Machine Learning Techniques. Frontiers in
Genetics, 9. https://doi.org/10.3389/fgene.2018.00515
65
Appendices
66