0% found this document useful (0 votes)
37 views

BT40816_Project_Report

Uploaded by

zeeshannafees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

BT40816_Project_Report

Uploaded by

zeeshannafees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Advanced Predictive System for Diagnosing Patient Illness

through Machine Learning Techniques

A PROJECT REPORT

Submitted by

MOHAMMED SAMEER KHAN (21SCSE1011124)


VIVEK RAJ SINGH (21SCSE1011130)
POJECT ID: BT40816
Under the guidance of
Mr. Satyam Singh

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
Computer Science

SCHOOL OF COMPUTING SCIENCE AND ENGINEERING


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA
April, 2025

I
BONAFIDE CERTIFICATE

This is to certify that Project Report entitled “Advanced Predictive System for
Diagnosing Patient Illness through Machine Learning Techniques” which is
submitted by Mohammed Sameer Khan and Vivek Raj Singh in partial fulfillment
of the requirement for the award of degree B. Tech. in Department of Computer
Science of School of Computing Science and Engineering Department of Computer
Science and Engineering Galgotias University, Greater Noida, India is a record of the
candidate own work carried out by him/them under my supervision. The matter embodied
in this thesis is original and has not been submitted for the award of any other degree.

Signature of Examiner(s) Signature of Supervisor(s)

External Examiner Signature of Program Chair

Date: April, 2025


Place: Greater Noida

II
TABLE OF CONTENTS

List of Figures…………………………………………………………………………………..V

List of Table…7

CHAPTER 1. INTRODUCTION ......................................................................... 9


1.1. Identification of Client/ Need/ Relevant Contemporary issue ...........................................9

1.2. Identification of Problem ...................................................................................................9

1.3. Identification of Tasks .......................................................................................................9

1.4. Timeline ........................................................................................................................... 10

1.5. Organization of the Report .............................................................................................. 10

CHAPTER 2. LITERATURE REVIEW/BACKGROUND STUDY ............... 12


2.1. Timeline of the reported problem ....................................................................................12

2.2. Existing solutions ............................................................................................................12

2.3. Bibliometric analysis .......................................................................................................13

2.4. Review Summary.............................................................................................................13

2.5. Problem Definition ..........................................................................................................13

2.6. Goals/Objectives ..............................................................................................................14

CHAPTER 3. DESIGN FLOW/PROCESS ....................................................... 15


3.1. Decision Tree Algorithm .................................................................................................18

3.2. Random Forest Algorithm ................................................................................................20

3.3. CNN Algorithm ............................................................................................................... 21

CHAPTER 4. RESULTS ANALYSIS AND VALIDATION ........................... 21


4.1. Implementation of solution .............................................................................................. 22

CHAPTER 5. CONCLUSION AND FUTURE WORK ................................... 20

III
5.1. Conclusion ........................................................................................................................... 24

5.2. Future work .......................................................................................................................... 25

REFERENCES ....................................................................................................... 26
USER MANUAL .................................................................................................... 27

IV
List of Figures

Figure 3.1 ............................................................................................................................... 15


Figure 3.1.1… ........................................................................................................................ 17
Figure 3.1.2… ........................................................................................................................ 18
Figure 3.2 ................................................................................................................................ 17
Figure 3.2.1 ............................................................................................................................. 19
Figure 3.3.1 ............................................................................................................................. 20
Figure 3.3.2 ............................................................................................................................. 20

V
List of Tables

Table 3.1 ........................................................................................................... 13

VI
ABSTRACT

Disease Prediction using Machine learning, Deep Learning and Streamlit is a complete nutrients
of multiple disease prediction project in which Diabetes like heart failure, Brain Tumor illnesses,
diabetes, Pneumonia instances and breast cancers. The implementation includes some machine
learning concepts with TensorFlow, Keras and Support Vector Machine (SVM) along with
Logistic Regression. By deploying the models on Streamlit Cloud and prescribing them with its
library, we make a very human-friendly interface for whatever disease ways prediction. The
process includes 7 diseases on the application interface: heart disease, Brain Tumor disease,
diabetes (Diabetes), Pneumonia Disease, Alzheimer, Covid-19 and breast cancer. The User picks
a disease and has to fill the model with all necessary parameters for this prediction. After you
input the parameters, it processes this and gives has a result which says whether that person is
suffering from disease or not. The project on Disease Prediction provides you the accurate
prediction of disease with highest accuracy before it gets complicated, easily by machine
learning. With a smooth user-friendly interface of Streamlit Cloud and the functionalities
provided by streamlit library, it becomes simple for everyone to evaluate their disease risk
profile. The range of accuracies and the high performance for all different models reflect the
capability of machine learning algorithms used to predict diseases.

VII
CHAPTER 1.
INTRODUCTION

1.1. Identification of Client /Need / Relevant Contemporary issue

The HealthCure platform addresses the growing demand for accessible diagnostic tools for
diseases like Covid-19, cancer, and diabetes, which affect millions globally. The World Health
Organization reports over 422 million cases of diabetes and 2.3 million new cases of breast cancer
annually. During the Covid-19 pandemic, overwhelmed healthcare systems highlighted the need
for rapid, reliable home-based diagnostics. Surveys show that patients, particularly in remote
areas, face delays in receiving timely diagnoses. This urgent healthcare need has been
documented by global agencies such as WHO and CDC.

1.2. Identification of Problem

The primary problem that HealthCure aims to address is the lack of timely, accurate, and
accessible diagnostic tools for detecting critical diseases. This issue is especially prevalent in
underserved areas, where healthcare facilities are either limited or overwhelmed. Delayed or
missed diagnoses can lead to severe complications and increased mortality rates, particularly for
diseases that require early detection, such as cancer, cardiovascular conditions, and infections
like Covid-19. In regions with inadequate healthcare infrastructure, patients often face significant
delays in receiving appropriate care, creating an urgent need for a scalable, technology-driven
solution.

1.3. Identification of Tasks

To address the problem of delayed diagnostics, the following tasks are essential in developing
the HealthCure platform:
1. Data Collection: Gather medical datasets for each disease, including X-rays, MRI scans, and
patient records.
2. Model Development: Train deep learning models, especially CNNs, to recognize disease

1
patterns in medical images.

3. System Integration: Design and develop a platform that integrates the diagnostic models and
allows users to upload data for analysis.
4. Testing and Validation: Conduct testing on unseen datasets and compare model results with
clinical data to ensure accuracy.
5. User Interface Design: Develop an intuitive and easy-to-use interface for patients and
healthcare providers.
6. Deployment and Feedback: Roll out the platform and gather feedback from real-world users
for continuous improvements.

1.4. Timeline

The HealthCure project follows a structured timeline divided into five phases:

Phase 1 (Month 1-2): Data Collection and Preprocessing.

Phase 2 (Month 3-5): Model Development and Training.

Phase 3 (Month 6-7): System Integration and Prototype Development.

Phase 4 (Month 8-9): Testing, Validation, and Interface Design.

Phase 5(Month 10-12): Deployment and Feedback Collection A Gantt chart could be used to
represent these phases, tracking the project's progress and key milestones.

1.5. Organization of the Report

This report is organized into the following chapters:

Chapter 1: Introduction: Provides an overview of the HealthCure project and the need for AI-
driven diagnostics.
Chapter 2: Literature Review: Examines previous research in AI and healthcare diagnostics,
setting the context for HealthCure’s innovation.
Chapter 3: Methodology: Outlines the technical processes involved in data collection, model
development, and system design.
Chapter 4: Results and Analysis: Discusses the outcomes of the testing phase, including the
accuracy and reliability of the diagnostic models.
2
Chapter 5: Conclusion and Future Work: Summarizes the project’s achievements and
outlines potential improvements and expansions.

3
CHAPTER 2.
LITERATURE REVIEW/BACKGROUND STUDY
Here’s a structured approach to filling out each of the points based on the provided content
regarding the HealthCure platform:

2.1. Timeline of the reported problem

The need for accessible and accurate disease diagnostics was highlighted globally, particularly
during the Covid-19 pandemic when health systems faced unprecedented challenges. Key
incidents include:
Before 2020: Existing gaps in diagnostics for critical illnesses like diabetes and cancer were
documented, with WHO reporting 422 million diabetes cases globally and breast cancer
accounting for 2.3 million new cases annually.
2020: The Covid-19 pandemic further exposed the inadequacies in healthcare access,
especially in rural areas, leading to significant delays in diagnosis and treatment.

Documentary Proof: Reports from WHO, CDC, and various health surveys provide
evidence of these challenges and the urgency for improved diagnostic solutions.

2.2. Existing solutions

Earlier proposed solutions to improve diagnostic accessibility included:


Telemedicine: Providing remote consultations to increase access to healthcare
professionals.

Mobile Health Applications: Utilizing smartphone apps to facilitate self-diagnosis and


health monitoring.

AI Diagnostics: Initial development of AI tools aimed at analyzing symptoms and offering


preliminary diagnostic suggestions.

4
2.3. Bibliometric analysis

Key features, effectiveness, and drawbacks of existing solutions:


Telemedicine: Effective in connecting patients to doctors but often limited by technology
access and internet connectivity, especially in rural areas.

Mobile Health Applications: Convenient and user-friendly; however, they may lack
comprehensive diagnostic accuracy and require user literacy.

AI Diagnostics: Emerging as a promising solution, effective in rapid assessment, but still


faces challenges in accuracy, data privacy, and integration into existing healthcare systems.

2.4. Review Summary

The literature review indicates a critical gap in accessible diagnostics for diseases such as
Covid-19, diabetes, and cancers, particularly in underserved areas. HealthCure aims to bridge
this gap by leveraging AI to provide fast, reliable diagnostics, addressing the current
inadequacies in healthcare accessibility.

2.5. Problem Definition

The problem at hand involves:


What is to be done: Develop the HealthCure platform to offer AI-driven diagnostic
solutions for critical illnesses.

How it is to be done: By utilizing advanced algorithms, user-friendly interfaces, and


integrating telehealth features to ensure accessibility.

What not to be done: Avoid creating overly complex systems that require extensive
medical knowledge to operate or depend on high-tech infrastructure not available in rural

settings.

5
2.6. Goals/Objectives

The following milestones will guide the project:


1. Development of AI Algorithms: Create and validate algorithms capable of accurately
diagnosing the specified diseases within 6 months.
2. User Testing and Feedback: Conduct usability testing with target demographics,
particularly in rural areas, within the next 3 months post-launch.
3. Partnerships with Healthcare Providers: Establish collaborations with local healthcare
facilities to facilitate real-world application and validation within 9 months.
4. Implementation of a User Education Program: Develop materials and workshops to
educate users on utilizing the HealthCure platform effectively within 4 months of launch.
5. Data Privacy Assurance: Ensure compliance with data protection regulations by
implementing robust security measures within 2 months before the platform launch.

Table 3.1 Comparison of Accuracy of all 7 models

6
CHAPTER 3.
DESIGN FLOW/PROCESS
The HealthCure platform is structured to facilitate the accurate and efficient diagnosis of
critical illnesses through a multi-step process, utilizing advanced technologies like deep
learning and user-friendly interfaces.
Step 1: Data Collection and Preprocessing
Gathering Relevant Medical Datasets
The platform collects diverse medical datasets tailored to the diseases targeted. For example:
Pneumonia: X-ray images from hospitals and medical repositories that include both
normal and pneumonia-infected lungs.
Brain Tumors: MRI scans, which encompass various types and stages of tumors, sourced from
medical databases and research institutions.
Diabetes: Blood sugar level records, along with demographic data and medical history,
compiled from healthcare providers.
Data Preprocessing
Preprocessing is critical to ensure high-quality data input for model training. This step
includes:
Cleaning: Removing duplicates, outliers, and irrelevant data points to enhance dataset
quality.
Normalization: Adjusting the range of data values to a common scale, which is essential for
effective training of machine learning models.
Augmentation: Applying techniques such as rotation, scaling, and flipping to increase the
diversity of the dataset and improve model robustness.
Splitting the Data: Dividing the data into training, validation, and test sets to evaluate the
model's performance accurately.

7
Fig 3.1 Multiple Disease Prediction -Architecture Diagram
Step 2: Model Selection and Training
Deep Learning Model Selection
The HealthCure platform primarily utilizes Convolutional Neural Networks (CNNs) due to
their proven efficacy in image recognition and classification tasks.
Different architectures of CNNs may be explored, such as ResNet or Inception, depending on
the complexity of the dataset and specific diagnostic requirements.
Training Process
The training phase involves feeding the model with the preprocessed datasets that include
labeled images and data points. Key aspects include:
Backpropagation: Using the loss function to minimize prediction errors by updating the
model's parameters iteratively.
Epochs and Batch Size: Setting the number of epochs (full passes through the training
data) and the batch size (number of samples processed before the model’s parameters are
updated) to optimize training time and performance.
Performance Metrics: Utilizing metrics such as accuracy, precision, recall, and F1 score to
evaluate the model’s performance during training and validation phases.

8
Step 3: Detection and Classification
Real-Time Detection
After training, the CNNs are implemented into the HealthCure platform for real-time disease
detection and classification.
Users can interact with the platform by uploading medical images (e.g., X-rays, MRIs) or
inputting relevant health data (e.g., blood sugar levels).
AI Analysis and Diagnosis:

The AI system processes the input data to identify patterns indicative of specific diseases.

The system is engineered for rapid analysis, providing users with timely insights regarding their
health status, while also ensuring that users understand that these results are preliminary and may
require further clinical testing for confirmation.

Step 4: User Interface and Output


User-Friendly Interface
The platform is designed with an intuitive interface, allowing users to easily navigate through
the features and understand the diagnostic results.
Clear instructions guide users through the process of uploading data and interpreting results.

Diagnostic Output
Upon analysis, the system presents results in an easily interpretable format, highlighting:

Risk Assessment: Indicating whether the user is at risk of specific diseases based on the
input data.
Precautionary Measures and Treatments: For conditions like diabetes or Covid-19, the
platform suggests appropriate actions, such as lifestyle changes, medication reminders, or when
to seek further medical advice.
This structured design flow ensures that the HealthCure platform not only provides accurate
diagnostics but also empowers users with the knowledge and resources to manage their health.

9
proactively.

Fig 3.2 Multiple Disease Prediction -Methodology

3.1 DECISION TREE ALGORITHM


Decisiontreeinductionisthelearningofdecisiontreesfromclass-labelledtrainingtuples. A
decision tree is a flowchart-like tree structure,

Fig 3.1.1 Decision Tree problem


Decision tree induction is a non-parametric approach for building classification models.

Finding an optimal decision tree is an NP-complete problem

Techniques developed for constructing decision trees are computationally inexpensive,


making it possible to construct models even when the training set size is very large.

Decision trees, especially smaller-sized trees, are relatively easy to interpret.


10
Decision tree provides an expressive representation for learning discrete- valued functions.

Decision tree algorithms are quite robust to the presence of noise, especially when methods for
avoiding overfitting.

Fig 3.1.2 Decision Tree Example

The presence of redundant attributes does not adversely affect the accuracy of decision tree.
The construction of decision tree classifiers does not require any domain knowledge or
parameter setting, and therefore i appropriate for exploratory knowledge discovery. Decision
trees can handle high dimensional data.
Their representation of acquired knowledge in tree form is intuitive and generally easy to assimilate
by humans.
The learning and classification steps of decision tree induction are simple and fast.

In general, decision tree classifiers have good accuracy.

Decision tree induction algorithm shave been used for classification in many application areas,
such as medicine, manufacturing and production, financial analysis, astronomy, and molecular
biology.

11
3.2 RANDOM FOREST ALGORITHM

1.It is an ensemble classifier using many decision trees models; it can be used for
regression as well as classification.
2.Accuracy and variable importance information can be provided with the results.
3.A random forest is the classifier consisting of a collection of tree structured classifiers k, where
k is independently, identically distributed random trees and each random tree consist of the unit
of vote for classification of input.

4.Random forest uses the Gini index for the classification and determining the final
class in each tree.

5.The final class of each tree is aggregated and voted by the weighted values to construct the final
classifier.
6.The working of random forest is, A random seed is chosen which pulls out at a random, a
collection of samples from the training datasets while maintaining the class distribution.

Fig 3.2.1 Random Forest Example

12
3.3 CNN (Convolutional Neural Network)

Artificial Intelligence has come a long way and has been seamlessly bridging the gap between the
potential of humans and machines. And data enthusiasts all around the globe work on numerous
aspects of AI and turn visions into reality - and one such amazing area is the domain of Computer
Vision. This field aims to enable and configure machines to view the world as humans do, and use
the knowledge for several tasks and processes (such as Image Recognition, Image Analysis and
Classification, and so on). And the advancements in Computer Vision with Deep Learning have
been a considerable success, particularly with the Convolutional Neural Network algorithm.

Imagine there’s an image of a bird, and you want to identify whether it’s really a bird or some
other object. The first thing you do is feed the pixels of the image in the form of arrays to the
input layer of the neural network (multi-layer networks used to classify things). The hidden layers
carry out feature extraction by performing different calculations and manipulations. There are
multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform
feature extraction from the image. Finally, there’s a fully connected layer that identifies the object
in the image.

Fig 3.3.1 CNN making prediction on a bird Image


.

13
Fig 3.3.2 CNN making prediction on a bird Image

13
CHAPTER 4.
RESULTS ANALYSIS AND VALIDATION

4.1. Implementation of solution

The HealthCure platform underwent thorough testing during its development, and initial results
suggest it performs exceptionally well in detecting certain diseases. Among the illnesses it was
tested on, the platform showed particularly strong performance in identifying breast cancer,
brain tumors, and Covid-19. For example, in the case of Covid-19, the AI models used by
HealthCure achieved over 90% accuracy in detecting the virus based on chest X-ray images.
This high level of accuracy was determined through validation processes that compared the
platform’s predictions with clinically verified datasets. These results indicate that HealthCure
has strong potential for application in real-world healthcare settings, especially for conditions
where imaging data plays a key role in diagnosis.

Despite these successes, the platform also showed some limitations. One notable area of
weakness is in diagnosing diseases like Alzheimer’s, where the symptoms can vary greatly
from one individual to another. This variability makes it harder for the AI to provide
consistently accurate diagnoses. Alzheimer’s disease is complex, and its progression is often
unique to each patient, which adds to the challenge of developing an AI model that can
accurately predict it across different populations.

Another area where HealthCure needs further improvement is in diagnosing chronic conditions
such as heart disease. The platform’s current accuracy in detecting these kinds of illnesses is
lower than its performance with diseases like Covid-19 or breast cancer. This is largely
because chronic conditions often require a more nuanced approach to diagnosis, involving a
combination of imaging data, patient history, and other health markers. To improve its
effectiveness in this area, the platform would need access to more comprehensive datasets that
take into account these various factors. By incorporating more diverse and detailed patient
information, the AI’s ability to detect and diagnose heart disease and other chronic conditions
could be significantly enhanced.

15
In summary, while HealthCure shows great promise in diagnosing certain diseases with high
accuracy, there are still areas where the platform needs refinement. Its strong performance in
detecting diseases such as breast cancer and Covid-19 highlights its potential as a valuable tool
in healthcare. However, further development is needed to address its limitations in diagnosing
conditions like Alzheimer’s and chronic illnesses such as heart disease. With more extensive
datasets and ongoing improvements, HealthCure could become a more reliable solution for a
broader range of medical conditions, ultimately improving its overall diagnostic capabilities.

16
CHAPTER 5.
CONCLUSION AND FUTURE WORK
Conclusion
Multiple Disease Prediction using machine learning is very much useful in everyone’s day to
day life and it is mainly more important for the healthcare sector, because they are the one that
daily uses these systems to predict the diseases of the patients based on their general information
and there symptoms that they are been through. Nowadays health industry plays major role in
curing the diseases of the patients so this is also some kind of help for the health industry to tell
the user and also it is useful for the user in case he/she doesn’t want to go to the hospital or
any other clinics, so just by entering the symptoms and all other useful information the user can
get to knowthe disease he/she is suffering from and the health industry can also get benefit from
this system by just asking the symptoms from the user and entering in the system and in just
few seconds they can tell the exact and up to some extent the accurate diseases. If health
industry adopts this project then the work of the doctors can be reduced and they can easily
predict the disease of the patient. The Multiple Disease Prediction is to provide prediction
for the various and generally occurring diseases that when unchecked and sometimes
ignored can turns into fatal disease and cause lot of problem to the patient and as well as their
family members.
The HealthCure project showcases the vast potential of AI in transforming healthcare
diagnostics. By integrating AI models with user-friendly technology, the platform enables
early detection of critical diseases, reducing diagnostic delays and improving patient
outcomes.

17
Future Work
However, the project also highlights areas for future improvement, including expanding the
range of diseases covered, increasing model accuracy through larger datasets, and
incorporating features like personalized treatment recommendations. As AI technology
evolves and healthcare systems embrace digital solutions, HealthCure has the potential to
make a significant impact on global health by providing accessible, accurate, and efficient
diagnostics for all.

1.Facility for modifying user detail.

2. More interactive user interface.

3. Facilities for Backup creation.

4. Can be done as Web page.

5. Can be done as Mobile Application.

6. More Details and Latest Disease.

18
REFERENCES
[1] TensorFlow: Martín Abadi, Ashish Agarwal, et al. (2015). TensorFlow: Large-scale machine
learning on heterogeneous systems. arXiv preprint arXiv:1603.04467.
[2] Keras: François Chollet et al. (2015). Keras. GitHub repository.

[3] Support Vector Machine (SVM): Corinna Cortes and Vladimir Vapnik (1995). Support-
vector networks. Machine Learning, 20(3), 273-297.
[4] Logistic Regression: Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied
Logistic Regression (3rd ed.). John Wiley & Sons.
[5] Streamlit: Streamlit Documentation. https://docs.streamlit.io/

[6] Kaggle: Kaggle website. https://www.kaggle.com/

[7] Data sources: You can provide the specific datasets you used from Kaggle.com, mentioning
the authors or contributors of the datasets.
[8] Zhang, Y., & Ghorbani, A. (2019). A review on machine learning algorithms for diagnosis of
heart disease. IEEE Access, 7, 112751-112760.
[9] Arora, P., Chaudhary, S., & Rana, M. (2020). Prediction of diabetes using machine learning
algorithms: A review. Journal of Ambient Intelligence and Humanized Computing, 11(6), 2575-
2589.
[10] Kaur, H., Batra, N., & Rani, R. (2020). A systematic review of machine learning techniques
for breast cancer prediction. Journal of Medical Systems, 44(11), 1-15.
[11] Gupta, D., & Rathore, S. (2021). A comprehensive review on machine learning algorithms
for kidney disease diagnosis. Journal of Medical Systems, 45(1), 1-17.
[12] Saeed, A., & Al-Jumaily, A. (2020). Machine learning techniques for Parkinson's disease
diagnosis using handwriting: A review. Computers in Biology and Medicine, 122, 103804

19
USER MANUAL
(Complete step by step instructions along with pictures necessary to run the project)

20
21
22
23
24
25
26
27

You might also like