Diabetes
Diabetes
INSTITUTE OF TECHNOLOGY
DEPARTMENT OF SOFTWARE ENGINEERING
I
Submitted By:
No. Name ID
Advisor: Milion F
Signature: _____________________
Date: 12/Fab/2025
A PROJECT REPORT SUBMITTED TO THE DEPARTMENT OF
SOFTWARE ENGINEERING
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
AWARD OF THE DEGREE
OF BACHELOR OF SCIENCE IN SOFTWARE ENGINEERING
JIGJIGA UNIVERSITY
INSTITUTE OF TECHNOLOGY
DEPARTMENT OF SOFTWARE ENGINEERING
ACADEMIC YEAR: 2025
APPROVED BY:
Examiner 1: ________________________
Head of Department: ________________________
Date: 12/Feb/2025
II
ACKNOWLEDGEMENT
First and foremost, we extend our heartfelt gratitude to Almighty God, whose endless
blessings, guidance, and wisdom have been our source of strength throughout this
journey. Without His divine grace, the successful completion of this project would not
have been possible.
We are deeply indebted to our advisor, Milion F, for their invaluable guidance,
continuous support, and insightful feedback. Their expertise and encouragement have
played a crucial role in shaping this work, and we are truly grateful for their patience
and dedication.
Our sincere appreciation also goes to Jigjiga University, the Institute of Technology,
and the Department of Software Engineering for providing us with the necessary
resources, knowledge, and an enabling environment to undertake this project. The
faculty members, administrative staff, and fellow students have all contributed to our
academic and professional growth.
We would like to express our gratitude to our family and friends, whose unwavering
support, encouragement, and understanding have been a constant source of motivation.
Their belief in us has strengthened our resolve to persevere through challenges and
complete this project successfully.
Finally, we acknowledge all individuals, researchers, and developers whose work has
inspired and contributed to the development of this project. Their contributions in the
field of machine learning and healthcare technology have provided us with a solid
foundation to build upon.
To all who have played a part in this journey, we extend our heartfelt thanks. Your
support and encouragement have been instrumental in bringing this project to fruition.
III
Table of Contents
ACKNOWLEDGEMENT ........................................................................................... III
1.6.1.1 Data Sourcing and Collection from Hospitals and Healthcare Facilities . 9
IV
1.6.2.5 Data Augmentation (if required) ............................................................ 10
Figure 1.1 flowchart to visualize the data collection and model training
process. .......................................................................................................... 14
V
Key Functionalities: ........................................................................................... 22
Benefits: .........................................................................................................28
VI
Use Cases for the Diabetes Prediction System: ........................................ 38
Steps: .................................................................................................................. 47
Transitions: .........................................................................................................47
Explanation: ...................................................................................................52
Summary: .......................................................................................................53
Appendices ......................................................................................................... 56
VII
Appendix F: Ethical Considerations ..............................................................59
References .......................................................................................................... 60
List of Figures
Figure
Title/Description Section/Page
No.
VIII
List of Tables
Table 1.1 Device Compatibility for Web Platform Chapter 1.5.2 (Specific Objectives)
Table 2.2 Risk Factors & Mitigation Chapter 2.4.1 (Technical Feasibility)
Table 2.4 Data Privacy & Security Chapter 2.4.3 (Operational Feasibility)
List of Abbreviations
AI Artificial Intelligence
IX
AI Artificial Intelligence
ML Machine Learning
X
Chapter One: Introduction
1.1 Overview
1
However, many existing systems lack integration with real-time web platforms,
limiting accessibility and usability. This project aims to bridge that gap by developing
a web-based machine learning model for diabetes prediction, facilitating early
diagnosis and intervention.
This study aims to develop a machine learning-based web application that predicts
diabetes risk with high accuracy using historical medical data. The system will
provide an accessible and efficient solution for individuals and healthcare
professionals to assess diabetes risk early and take preventive measures. By
leveraging machine learning algorithms, the proposed system can improve diagnostic
accuracy and assist in timely medical interventions, reducing the long-term impact of
diabetes on individuals and healthcare systems.
2
1.2.1 Existing System
3
By providing an accessible web-based platform, users will be able to check their
diabetes risk from anywhere without the need for extensive medical visits. The system
will incorporate real-time analysis, processing user inputs and generating instant
results to facilitate early diagnosis. Additionally, it will offer data-driven insights,
helping individuals take preventive measures based on their risk levels.
The proposed system offers several advantages over traditional diagnostic methods. It
will enables faster diagnosis by providing instant risk assessments without requiring
laboratory test results. The approach is cost-effective as it reduces the need for
frequent and expensive medical tests. Increased accessibility is another key benefit,
as the web-based platform allows remote users to assess their diabetes risk
conveniently. Machine learning models improve accuracy by analyzing multiple risk
factors simultaneously, leading to more reliable predictions. Furthermore, the system
is automated and scalable, making it suitable for both individuals and healthcare
professionals for large-scale screening. By integrating modern technology with
diabetes prediction, this system aims to bridge the gap between traditional diagnosis
methods and technological advancements, making early detection more efficient and
widely accessible.
1.3 Motivation
4
1.4 Scope and Limitations of the Project
This project focuses on developing a web-based machine learning system for diabetes
prediction, with an emphasis on Type 2 Diabetes Mellitus (T2DM). The system will
utilize K-Nearest Neighbor (KNN), to enhance prediction accuracy. A user-friendly
web application will be developed, allowing users to input their health parameters and
receive an instant risk assessment. Machine learning algorithms will analyze key risk
factors such as age, BMI, glucose levels, blood pressure, and family history to provide
accurate predictions. The system will support data-driven decision-making, enabling
individuals to take preventive measures based on their assessed diabetes risk.
Additionally, the project is designed to be scalable, allowing for future integration of
additional health parameters and more advanced prediction techniques.
Despite its advantages, the project has several limitations. The system is limited to
Type 2 Diabetes prediction and does not diagnose Type 1 Diabetes or Gestational
Diabetes. Its accuracy depends on the quality and completeness of input data,
meaning inaccurate or incomplete data may affect the results. The system does not
support real-time blood sugar monitoring, as it does not connect to medical devices
for continuous glucose level tracking. Additionally, the predictions are based on
machine learning models and should not be considered a replacement for professional
medical advice. The system requires an internet connection, making it inaccessible in
areas with poor connectivity. Privacy and security concerns must also be addressed to
ensure that patient data is securely stored and protected from unauthorized access.
1.6 Methodology
5
collection, system design, and the development tools used to ensure the creation of an
accurate, secure, and efficient prediction model. This section also outlines the process
of gathering data from healthcare facilities to train the machine learning models for
diabetes prediction.
1.5 Objective
Design and implement an intuitive and interactive web application where users can
input health data and receive instant feedback on diabetes risk.
Develop a web-based platform for diabetes prediction by following a systematic
lifecycle from data collection to deployment and testing:
Data Collection and Preparation:
Source and curate validated medical datasets (e.g., PIMA Indian Diabetes Dataset,
NHANES) containing features like age, BMI, glucose levels, and family history.
Preprocess data by addressing missing values (imputation), normalizing features (e.g.,
Min-Max scaling), and balancing class distributions (e.g., SMOTE for oversampling).
Model Development and Validation:
Train machine learning models (KNN, SVM, Random Forest, Logistic Regression)
using scikit-learn or TensorFlow, prioritizing interpretability and performance.
Optimize models through feature selection (e.g., Recursive Feature Elimination, PCA)
and hyperparameter tuning (e.g., GridSearchCV).
Validate models using stratified k-fold cross-validation and benchmark performance
against metrics like accuracy (>90%), recall (>85%), and AUC-ROC.
6
Web Application Design and Development:
Build an intuitive frontend using React.js for dynamic user interactions (e.g., input
forms, risk feedback).
Develop a backend API with Express or Node Js to process user inputs, run model
predictions, and return results in real time.
Integrate explainable AI (XAI) components (e.g., SHAP values, LIME) to visualize
risk factor contributions.
Cross-Device Compatibility and Responsiveness:
Implement responsive design frameworks (e.g., Bootstrap, CSS Grid) to ensure
seamless functionality across devices (see Table 1.1).
Conduct compatibility testing on desktops, tablets, and mobile devices to verify UI
adaptability and touch-friendly interfaces.
Deployment and Scalability:
Deploy the platform on cloud services (e.g., AWS, Heroku) with containerization
(Docker) for scalability.
Set up continuous integration/continuous deployment (CI/CD) pipelines using GitHub
Actions for automated updates.
System Testing and Usability Evaluation:
Perform functional testing (unit, integration, end-to-end) to validate prediction
accuracy and system robustness.
Conduct usability testing with stakeholders (patients, doctors) to refine UI/UX based
on feedback (e.g., simplifying input forms).
Security and Compliance:
Encrypt user data at rest (AES-256) and in transit (TLS 1.3).
Ensure compliance with healthcare regulations (HIPAA, GDPR) and implement
OAuth 2.0 for secure authentication.
7
address emerging user needs.
Ensure cross-device compatibility and responsiveness
Table 1.1: Device Compatibility
8
Provide interactive dashboards with explainable AI (XAI) outputs to clarify
predictions.
Ensure security and privacy of user data:
Encrypt data at rest (AES-256) and in transit (TLS 1.3), and comply with
HIPAA/GDPR regulations.
Implement OAuth 2.0 for secure authentication and role-based access control.
Validate and compare system performance:
Benchmark models using metrics (accuracy, F1-score, AUC-ROC) and compare
against existing tools (e.g., ADA Risk Test).
Target >90% accuracy and >85% recall to minimize false negatives.
Provide actionable recommendations based on predictions:
Deliver personalized advice (e.g., diet plans, exercise routines) via the platform.
Enable API integration with healthcare providers for seamless referral systems.
Accurate diabetes predictions rely heavily on high-quality data. The data collection
process follows a systematic approach to ensure reliability and compliance with
ethical and legal standards.
1.6.1.1 Data Sourcing and Collection from Hospitals and Healthcare Facilities
The primary dataset for this project is obtained from hospitals, clinics, and healthcare
institutions to enhance real-world applicability. The process involves:
Health Records
9
Collect electronic health records (EHRs) containing key diabetes risk factors:
Age, gender, weight, BMI, glucose levels, blood pressure, family history, and
physical activity.
After data collection, preprocessing ensures the dataset is suitable for machine
learning training.
Identify and remove outliers that could negatively impact model performance.
10
sampling Technique) or other data augmentation methods to balance it.
11
1.6.4 System Development Tools
Node.js – Handles API requests, data processing, and machine learning model
integration.
Express.js – A web framework for structuring the backend server.
Python – Used for machine learning model training, data processing, and
evaluation.
scikit-learn: Implements classification models (KNN, SVM, Random Forest,
Logistic Regression).
12
datasets.
AWS, Google Cloud, or Heroku – Cloud platforms for deploying the web
application.
Git/GitHub – Version control and collaborative development.
13
1.6.4.6 System Flowchart
Figure 1.1 flowchart to visualize the data collection and model training
process.
14
Chapter Two: System Requirement Specification
2.1 Background Overview
Diabetes is one of the leading global public health challenges of the 21st century. This
chronic metabolic disorder is characterized by elevated levels of blood glucose due to
issues related to insulin production or its effectiveness. According to the World
Health Organization (WHO), an estimated 463 million people were living with
diabetes in 2019, and this number is expected to rise significantly in the coming
decades. The disease is not only a personal health issue but also a public health
burden, straining healthcare systems worldwide and contributing to a variety of long-
term complications, such as heart disease, stroke, kidney failure, blindness, and
amputation.
Type 2 Diabetes Mellitus (T2DM), which accounts for approximately 90% of all
diabetes cases, is a chronic condition where the body either becomes resistant to
insulin or doesn’t produce enough insulin to maintain normal glucose levels.
Although it is often preventable, many individuals develop T2DM without early
symptoms. This makes early detection and intervention vital to prevent the onset of
more severe health complications. Early-stage diabetes or pre-diabetes can often be
managed through lifestyle changes such as improved diet and increased physical
activity, making the role of early detection even more critical.
Despite the availability of traditional diagnostic methods such as blood glucose tests
and oral glucose tolerance tests, late diagnosis continues to be a significant challenge.
Many individuals with diabetes remain undiagnosed, and by the time they are
diagnosed, they often have already developed severe complications. The delay in
diagnosis is often linked to the absence of early symptoms in the case of Type 2
diabetes and the lack of regular screening for individuals at risk.
In response to this challenge, the integration of machine learning (ML) and predictive
analytics in healthcare offers an innovative solution. Machine learning algorithms,
15
with their ability to analyze large datasets and recognize patterns in complex data,
have proven effective in identifying early warning signs of various diseases, including
diabetes. These algorithms can process diverse health data, including age, gender,
BMI (Body Mass Index), family history, physical activity, and diet, to predict the
likelihood of diabetes onset before the disease progresses to more serious stages.
One of the major advantages of using machine learning for diabetes prediction is that
it can provide real-time analysis of patient data, making it a useful tool for healthcare
providers. In many cases, a web-based platform can be developed to integrate with
existing healthcare infrastructure, offering doctors and healthcare professionals a tool
that provides fast and accurate predictions. This system would use historical patient
data, processed through machine learning algorithms, to make predictions about an
individual’s risk of developing diabetes.
16
Moreover, predictive diabetes systems can be tailored to meet the needs of specific
populations, such as high-risk individuals with a family history of diabetes, elderly
patients, or individuals with obesity. This customization of predictions can make the
system more relevant and accurate, ensuring that at-risk individuals are identified
early and provided with targeted interventions.
As such, the goal of this project is to create a web-based diabetes prediction system
that utilizes machine learning models, specifically classification algorithms, to predict
the likelihood of an individual developing diabetes based on a variety of health-
related factors. The system will be designed with a user-friendly interface that can be
easily integrated into existing healthcare workflows, allowing healthcare professionals
to use it as a tool for early diagnosis and preventive healthcare.
In addition to its diagnostic potential, this system aims to contribute to the growing
field of digital health technologies, which are revolutionizing the way healthcare is
delivered. By improving predictive accuracy, reducing diagnostic errors, and offering
real-time analysis, the proposed system has the potential to reduce the burden of
diabetes on individuals and healthcare systems globally
17
The functional requirements cover all the core features and operations that the system
must support for it to be functional and useful to the healthcare professionals using it.
The Diabetes Prediction System must support a secure and efficient user
registration and authentication process for authorized individuals, such as
healthcare professionals. Users should be able to create new accounts by
providing personal details like username, email, password, and role (e.g.,
doctor, nurse). The system will validate the user’s credentials during login and
provide feedback in case of incorrect information. Additionally, password
recovery functionality will be available for users who forget their credentials.
User access will be role-based to ensure that only authorized personnel can
access sensitive data and predictive tools.
User Access Levels:
Patients: Patients can input their personal health data and view general
recommendations, but they do not have access to raw prediction results or
sensitive healthcare information.
Healthcare Providers: Healthcare professionals, such as doctors and nurses,
can enter and review patient data, interpret predictions, and generate reports to
support decision-making.
Administrators: Administrators are responsible for managing user accounts,
monitoring system performance, and ensuring the overall security of the
system, including access control and data protection measures.
18
ethnicity. It should also gather important health information, such as Body Mass
Index (BMI), blood pressure, cholesterol levels, glucose levels, physical activity
levels, and lifestyle factors like diet and smoking habits. Additionally, the system
should account for any existing symptoms related to diabetes, such as excessive thirst,
frequent urination, or blurred vision.
Data preprocessing should be carried out on the input data, such as normalizing
numerical values like BMI and glucose levels, ensuring they are in a consistent format
for accurate predictions. Input validation is crucial to maintain the accuracy of the
data, and the system should ensure that all inputs, such as BMI, blood pressure, and
glucose levels, are correctly formatted (e.g., numeric values) and handle missing data
appropriately by either providing default values or notifying the user to input the
missing information. This ensures reliable input and enhances the system's predictive
capability.
The core functionality of the Diabetes Prediction System is to analyze the input data
and predict the likelihood of diabetes development using machine learning models. To
achieve this, the system should implement one or more classification algorithms, such
as Logistic Regression, Decision Trees, Random Forest, or K-Nearest Neighbors
(KNN), that are capable of analyzing the various risk factors input by users.
Based on the collected data, the system should calculate and provide a diabetes risk
score or a probability (e.g., 80% chance of developing Type 2 Diabetes). This
prediction is crucial for determining the user's level of risk for diabetes. Additionally,
the system should incorporate a pre-defined risk threshold, such as 70%, which
categorizes the user into one of three risk levels: high risk, medium risk, or low risk.
For individuals identified as high-risk, the system should offer decision support by
suggesting preventive actions, such as recommending lifestyle changes, more frequent
testing, or referring the patient to a specialist for further evaluation. For those
identified as low-risk, the system can recommend ongoing monitoring or regular
check-ups to ensure early detection and continued health management. This
19
functionality not only helps users assess their risk but also provides actionable
insights for improving or maintaining their health.
Once the prediction is made, the Diabetes Prediction System should display the
results in a clear and understandable format, enabling healthcare professionals to
interpret the findings effectively. The system will feature a Results Dashboard that
presents the diabetes risk assessment report. This report should include:
The risk score or probability of developing diabetes, providing a clear
numerical representation of the individual's risk.
A summary of input data, such as BMI, age, cholesterol levels, and other
health factors, displayed with visual indicators like charts or graphs for
easier interpretation.
Recommendations based on the results, including suggestions for dietary
adjustments, exercise plans, or referrals to specialists, helping healthcare
professionals make informed decisions about the patient's care.
The User-Friendly Interface ensures that the results are easily understood by
medical professionals, using visual aids like bar charts, pie charts, and risk
category indicators (e.g., high, medium, low) to highlight important information.
The Diabetes Prediction System should have the capability to store and track
historical data for each user, enabling healthcare professionals to review past risk
assessments and monitor the patient's progress over time. This functionality ensures
20
that medical professionals can make data-driven decisions based on the patient’s
historical health trends.
Key Functionalities:
Patient History: The system should securely store each patient’s data in a
database, which includes past risk predictions, health data (e.g., glucose levels,
BMI), and any treatment or preventive recommendations that were made. This
historical data will provide a comprehensive view of the patient's journey and
help in assessing changes in their health over time.
Update Data: The system should allow users (with appropriate permissions)
to update their personal health data, such as weight, blood pressure, or
cholesterol levels. These updates will be factored into future risk predictions,
ensuring that the system remains accurate and up-to-date based on the
patient’s current health status.
Data Security: All stored data must be encrypted and protected from
unauthorized access, ensuring that sensitive health information is kept secure.
The system should adhere to health data security standards, such as HIPAA
(Health Insurance Portability and Accountability Act) in the U.S. or applicable
regulations in other regions, to ensure compliance with privacy laws.
21
2.2.6 System Maintenance and Updates
The system must be designed to support ongoing maintenance and periodic updates to
ensure it remains functional, secure, and capable of adapting to evolving healthcare
needs. This will allow the system to continue providing accurate and reliable
predictions over time while maintaining its performance and compliance with industry
standards.
Key Functionalities:
Types of Testing:
22
User Acceptance Testing (UAT): Involves gathering feedback from
healthcare providers to validate the system’s usability and ensure that it
meets the needs of medical professionals. For example, "User Acceptance
Testing (UAT) will involve 10 healthcare professionals from Jigjiga General
Hospital to validate system usability."
Performance Requirements
Performance requirements are essential to ensure that the Diabetes Prediction System
operates efficiently, especially when faced with high volumes of user interactions and
large datasets. The system should be designed to provide timely, accurate predictions
without compromising on responsiveness or usability.
Key Considerations:
23
frustration.
Description: Security requirements are crucial to protect sensitive patient data and
ensure that the system is resilient to malicious attacks. Healthcare data is highly
sensitive, and thus, security must be a top priority in the system design.
Key Considerations:
Data Encryption: All sensitive data, including patient personal information and
medical data, must be encrypted both in transit and at rest. SSL/TLS encryption
should be used for all communication between the user and the server.
Data Integrity: The system must ensure the integrity of data using mechanisms like
24
checksums and hashing. This ensures that data cannot be altered or tampered with by
unauthorized parties.
Audit Logging: The system should keep an audit trail of all user actions, especially
for access to patient data, predictions, and system modifications. These logs should be
secure and accessible only to authorized personnel for compliance and monitoring.
Compliance with Regulations: The system must comply with relevant data
protection and healthcare regulations such as HIPAA (Health Insurance Portability
and Accountability Act) or GDPR (General Data Protection Regulation) to ensure
patient privacy and data protection.
Description: Usability requirements focus on how easy and intuitive the system is to
use for healthcare professionals and patients. The goal is to ensure that users can
interact with the system effectively without needing extensive training.
Key Considerations:
User Interface (UI): The user interface should be simple, intuitive, and responsive.
It must present relevant information clearly, using visual aids such as charts, graphs,
and risk indicators to aid in decision-making.
User Experience (UX): The system should follow best practices for UX design to
ensure that tasks such as data input, risk prediction, and report generation can be
performed with minimal effort and maximum efficiency.
Error Handling: The system should provide clear error messages and instructions
to help users resolve issues when input data is incorrect or missing.
Accessibility: The system should be designed to meet accessibility standards,
ensuring that it is usable by individuals with disabilities (e.g., support for screen
readers, color contrast for the visually impaired).
25
2.3.5 Reliability Requirements
Key Considerations:
System Availability: The system should be available 99.9% of the time, ensuring
minimal downtime for both users and administrators. This should be backed by a
service level agreement (SLA).
Backup and Recovery: The system should include regular backups of patient data
and models. In case of failure, the system should be able to recover quickly with
minimal data loss (e.g., restore from the most recent backup within 15 minutes).
Error Detection and Reporting: The system should automatically detect errors and
log them for troubleshooting. Automatic alerts should notify system administrators if
critical failures occur.
Description: Maintainability refers to how easy it is to update, improve, and fix the
system over time. As technology and healthcare needs evolve, the system must be
adaptable and maintainable.
Key Considerations:
26
Modular Design: The system should follow a modular architecture to make it
easier to update individual components (e.g., machine learning models, user interface)
without affecting the entire system.
Code Quality: The system’s code should be well-documented, clean, and follow
coding best practices. This ensures that future developers can understand and maintain
the system efficiently.
Automated Testing: The system should include automated testing tools that allow
developers to quickly verify the functionality of the system after updates or bug fixes.
Version Control: The system should use version control systems (e.g., Git) to track
changes in code, ensuring that updates and modifications can be managed effectively.
Key Considerations:
27
able to integrate with other healthcare systems, particularly Electronic Health
Records (EHR), to ensure that patient data can be accessed and updated in
real-time. This integration will reduce the need for redundant data entry and
allow for a streamlined workflow across different healthcare platforms.
Benefits:
The Diabetes Prediction System aims to enhance early detection of diabetes using
machine learning technologies. A thorough feasibility study is conducted to evaluate
the practicality and sustainability of the system in terms of technical, economic,
operational, legal, and social factors. This comprehensive analysis ensures that the
system is viable for development, deployment, and long-term use.
28
Long-Term Sustainability
Technical feasibility assesses whether the available technology and resources can
support the development and deployment of the Diabetes Prediction System.
Data Collection
The system will initially use the PIMA Indian Diabetes Dataset for training.
Healthcare partnerships will help collect real-world data to enhance model
accuracy.
Model Training
29
Infrastructure Requirements (Table 2.1):
Component Requirement
30
Risk Impact Mitigation Strategy
Cost-Benefit Analysis
31
increases financial sustainability.
Operational feasibility evaluates how practical the system will be for daily use in
healthcare settings.
System Integration
The system will integrate with Electronic Health Records (EHR) systems
using FHIR (Fast Healthcare Interoperability Resources) for compatibility.
Web-based access enables healthcare providers to access the system remotely.
32
Security Measure Implementation
Legal feasibility ensures that the system complies with healthcare and data protection
laws.
Regulation Requirement
33
Intellectual Property & Data Rights
Social feasibility assesses how well the system will be accepted by patients,
healthcare professionals, and the public.
34
Public Health Impact
35
36
Chapter Three: System Analysis and Modeling
3.1 Overview
The System Analysis and Modeling chapter is a crucial part of the documentation that
bridges the gap between the conceptual design and practical implementation of the
Diabetes Prediction System. It begins with Problem Analysis, where the core issue of
delayed diabetes detection is identified, highlighting the impact of late diagnoses on
healthcare costs, patient health, and overall quality of life. The system’s goal is to
provide early prediction and diagnosis of diabetes, enabling timely intervention.
Following this, the Requirements Revisited section revisits the system's functional
and non-functional requirements. Functional requirements focus on the system’s core
tasks—collecting patient data, processing it, and generating predictions, while non-
functional requirements emphasize the system’s performance, scalability, security,
and user-friendliness. The System Design section outlines the system’s architecture,
which includes the frontend (user interface), backend (data processing and machine
learning model), and cloud storage. User interface design prioritizes ease of use for
both healthcare professionals and patients, with wireframes and mockups illustrating
the flow of data and results. The backend design details the processes involved in data
handling, prediction logic, and communication with the machine learning model.
Database design outlines the structure of patient data, predictions, and historical
reports. In the Modeling section, various models and diagrams, such as use case
diagrams, data flow diagrams, entity-relationship diagrams, and class diagrams, are
used to visually represent system functionality and data flow. Finally, the System
Architecture and Interaction section explains how the system components interact,
focusing on frontend-backend communication, machine learning integration, and
cloud infrastructure. This chapter serves as a comprehensive blueprint for developing
a system that provides accurate and timely diabetes predictions, ultimately enhancing
patient care and reducing healthcare costs.
37
3.2 Scenario-Based Modeling
38
Predict Diabetes Risk:
Generate Reports:
39
Figure 3 Use Cases for the Diabetes Prediction System:
40
3.2.2 Actor Identification
An actor in a system refers to any entity (person, external system, or device) that
interacts with the system. For the Diabetes Prediction System, the primary actors are:
Patient:
Healthcare Provider:
Admin:
This is a system actor that processes the patient’s data and provides
predictions based on the trained model.
System:
For each use case identified, a detailed use case description can be provided,
including the steps involved, preconditions, and postconditions. Below is an example
of a Use Case Description:
41
Description: The healthcare provider enters the patient’s basic details into the
system for diabetes prediction.
Preconditions: The healthcare provider has access to the system. Patient
details are available.
Basic Flow:
Exceptions: If the patient’s details cannot be saved due to a system failure, the
healthcare provider is notified, and the process is aborted.
42
1. User enters health data → The user inputs their health information into the
system.
Store health data in database → The system saves the health data for further
processing.
Decision: Is the user a Health Care Provider?
Yes:
1. The provider accesses patient data.
2. The system retrieves patient data from the database.
3. The provider views the patient data.
No:
1. The system sends the health data to a Machine Learning (ML)
model for prediction.
2. The ML model processes the data.
3. The system returns the prediction result.
4. The prediction result is displayed to the user.
43
3.3Behavioral/Dynamic Modeling
For the Diabetes Prediction System, a sequence diagram can be used to model the
"Predict Diabetes Risk" use case. Below is a high-level description:
44
Actors: Healthcare Provider, Patient, System (Web Application), Machine
Learning Model
Objects: User Interface, Data Processor, ML Model, Database
1. User enters health data → The input is sent to the Web Application.
3. Web Application sends health data to the ML Model for prediction (if the
user is not a healthcare provider).
This sequence diagram illustrates the flow of messages between the healthcare
provider, user interface, data processor, machine learning model, and the database to
process and display the diabetes risk prediction.
45
A State Diagram (or Statechart Diagram) is used to model the states of a system or an
object over time, as well as the transitions between those states triggered by events. It
is particularly useful for modeling the lifecycle of an object or entity, showing how it
reacts to various inputs and changes.
For the Diabetes Prediction System, we can model the "Patient Data Processing"
lifecycle, representing the different states of the patient data and how it transitions
through different states until a prediction is made.
46
Steps:
1. Start
The process begins when the user initiates the system.
InputData
The user submits health data into the system.
PreprocessData
The system cleans and normalizes the data for further processing.
ModelPrediction (Nested within PreprocessData)
The system runs the prediction model on the preprocessed data.
DisplayResult (Nested within ModelPrediction)
The prediction result is shown to the user.
End
The process completes after displaying the prediction result.
Transitions:
Waiting for Data → Data Validated: Occurs when the system receives data
from the user and checks it for completeness and correctness.
Data Validated → Data Processed: Happens after the data is confirmed to be
valid and ready for analysis.
Data Processed → Prediction Generated: The machine learning model
processes the validated data to make predictions.
Prediction Generated → Prediction Displayed: The system outputs the
predictions generated by the model.
Prediction Displayed → Completed State: Final stage where results are
stored, and the process is marked complete.
47
3.4 Class-Based Modeling
In the Diabetes Prediction System, several key entities or objects need to be modeled
as classes. These classes will correspond to the core components of the system and
their attributes. Here are some of the potential classes that can be identified:
Patient: This class represents the patient whose data is used for prediction. It stores
attributes such as age, gender, BMI, blood pressure, glucose level, and family history
of diabetes.
Prediction: This class holds the prediction results generated by the machine learning
model. It represents the outcome of the diabetes risk assessment for a patient.
48
Machine Learning Model: This class handles the machine learning model's
dataprocessing and prediction functionality.
This class manages the storage and retrieval of patient and prediction data.
49
UserInterface: This class represents the user interface components that allow
healthcare providers and patients to interact with the system.
Validation: This class ensures the validity of input data, such as checking for missing
or incorrect values.
50
3.4.2 Class Diagram
A Class Diagram visually represents the structure of the system by showing the
classes, their attributes, methods, and the relationships between them. Below is a basic
class diagram for the Diabetes Prediction System.
51
Explanation:
Relationships:
The Patient class has a one-to-many relationship with the Prediction
class, meaning a patient can have multiple predictions.
The Healthcare Provider interacts with the Patient and Prediction
classes to enter data and generate predictions.
The MachineLearningModel class interacts with the Prediction class
to generate a diabetes risk prediction based on patient data.
The Database class is used by all other classes to store and retrieve
data.
The UserInterface is the entry point for interaction and is connected to
both the Patient and Prediction classes.
52
Summary:
Identifying Classes: The core entities in the Diabetes Prediction System are
Patient, Healthcare Provider, Prediction, MachineLearningModel,
Database, UserInterface, and Validation.
Class Diagram: This diagram visually represents the structure of the system,
detailing the attributes and methods of each class and their relationships.
Class-based modeling is crucial for understanding the structure of the system and the
interactions between various components. It also helps guide the design and
implementation of the system by clearly defining how data is stored and processed.
53
Chapter 4: Conclusion and Future Work
4.1 Conclusion
Despite its achievements, the pre-system is not without limitations. The accuracy of
predictions depends on the quality and diversity of training data, and the pre-system
currently focuses primarily on Type 2 Diabetes Mellitus (T2DM). Additionally, real-
time blood sugar monitoring and integration with medical devices are not yet
implemented.
To further enhance the pre-system's capabilities and impact, the following areas of
future work are proposed:
1. Integration with Real-Time Medical Devices
54
2. Enhancement of Machine Learning Models
55
Appendices
The following table presents a dataset collected from a healthcare survey, containing
anonymized patient health records used for diabetes prediction. The dataset includes
essential attributes such as Patient ID, Age, Gender, BMI, Blood Pressure, Glucose
Level, Cholesterol Level, Physical Activity, Family History, and Diabetes Risk
classification.
The data was gathered through a structured survey and serves as the basis for training
and evaluating the machine learning model for diabetes prediction. Below is a
screenshot of the dataset used in this project:
56
Appendix B: Additional Test Cases
Table B1: Test Cases for Diabetes Prediction System
Valid Data Submission Age=45, BMI=28, Glucose=140, BP=85 Risk Score: High (85%)
Missing Glucose Value Age=30, BMI=25, Glucose=Null Error: "Glucose value required"
Extreme BMI Age=50, BMI=45, Glucose=160 Risk Score: Very High (95%)
Appendix C: Diagrams
57
Appendix D: Dataset Description
1. Registration:
58
3. Generate Prediction:
4. View Results:
59
References
2. Smith, J., Patel, R., & Lee, K. (2020). Machine Learning for Diabetes
Prediction: A Comparative Study. Journal of Medical Informatics, 15(3), 45–
60. https://doi.org/10.1016/j.jmi.2020.03.005
60