PlantMD - Final Year Project Report
PlantMD - Final Year Project Report
Amrit Campus
Submitted by
Anurag Limbu Kandangwa (20124/075)
Ashok Tripathi (20128/075)
Rabin Babu Tiwari (20173/075)
Submitted to
Department of Computer Science and Information Technology
Amrit Campus
Institute of Science and Technology
Tribhuvan University
May 16, 2023
Tribhuvan University
Amrit Campus
Submitted by
Anurag Limbu Kandangwa (20124/075)
Ashok Tripathi (20128/075)
Rabin Babu Tiwari (20173/075)
Submitted to
Department of Computer Science and Information Technology
Amrit Campus
Institute of Science and Technology
Tribhuvan University
May 16, 2023
Tribhuvan University
Institute of Science and Technology
Amrit Campus
Department of Computer Science and Information Technology
Email: [email protected]
I hereby recommend that the project prepared under my supervision by Mr. Yuba
Raj Devkota entitled “PlantMD: A Plant Disease Prediction System” be accepted as
fulfilling in partial requirement for the degree of Bachelors of Science in Computer
Science and Information Technology. In my best knowledge, this is an original work in
Computer Science and Information Technology.
..............................
Mr. Yuba Raj Devkota
Supervisor
Lecturer
Amrit Campus
Certificate of Approval
This is to certify that this project prepared by Anurag Limbu Kandangwa, Ashok
Tripathi and Rabin Tiwari, entitled “PlantMD: A Plant Disease Prediction System”
in partial fulfillment of the requirements for the degree of Bachelor of Science in Com-
puter Science and Information Technology has been well studied. In our opinion, it
is satisfactory in the scope and quality as a project for the required degree.
..............................
Mr. Yuba Raj Devkota
Supervisor
Lecturer
Amrit Campus
..............................
External Examiner
..............................
Dr. Binod Kumar Adhikari
Head of Department
Lecturer
Amrit Campus
Acknowledgement
It gives us immense pleasure to express our deepest sense of gratitude and sincere
thanks to our highly respected and esteemed guide Mr. Yuba Raj Devkota, De-
partment of Computer Science and Information Technology, Amrit Campus, for his
valuable guidance, encouragement and help for completing this work. His useful sug-
gestions for this whole work and co-operative behavior are sincerely acknowledged.
We would like to express our sincere thanks to Mr. Yuba Raj Devkota, Depart-
ment of Computer Science and Information Technology, Amrit Campus, for giving us
this opportunity to undertake this project. We would also like to thank Dr. Binod
Kumar Adhikari, Asst. Prof, Head of Department of BSc.CSIT, Amrit Campus, for
his whole hearted support.
We are also grateful to our teachers for their constant support and guidance.
At the end we would like to express our sincere thanks to all our friends and oth-
ers who helped us directly or indirectly during this project work.
i
Abstract
Plant diseases pose a significant threat to global agriculture, affecting crop yields and
food security. In this final-year project, we address the challenges of limited data
availability and real-world applicability in the domain of machine learning and deep
learning for plant disease classification, detection, and generation. To tackle these
issues, we fine-tuned a deep Convolutional Neural Network (CNN) model on a real-
world dataset and employed various object detection models to accurately identify
and classify plant diseases. Furthermore, we explored the potential of image genera-
tion using different variants of Generative Adversarial Networks (GANs) to overcome
the constraints of limited data. Our work demonstrates the feasibility of applying
advanced machine-learning techniques to enhance plant disease detection and classifi-
cation, ultimately contributing to more sustainable and efficient agricultural practices.
ii
Contents
Acknowledgement i
Abstract ii
List of Figures vi
List of Tables ix
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope and Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Development Methodology . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 6
2.1 Study of Research Papers . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Comparing accuracy of reviewed papers . . . . . . . . . . . . . . . . . 7
3 System Analysis 8
3.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Requirement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Requirement Collection . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Functional Requirement . . . . . . . . . . . . . . . . . . . . . 9
3.2.3 Non-Functional Requirements . . . . . . . . . . . . . . . . . . 10
3.3 Feasibility Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Operational . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.3 Economic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.4 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 System Design 14
4.1 Design of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.3 ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.4 Data Flow Diagram: Level 0 . . . . . . . . . . . . . . . . . . 17
4.1.5 Data Flow Diagram: Level 1 . . . . . . . . . . . . . . . . . . 17
4.1.6 Data Flow Diagram: Level 2 . . . . . . . . . . . . . . . . . . 18
4.1.7 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . 19
iii
5 Algorithms Details 20
5.1 Classification Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1.1 Inception Resnet v2 . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.2 MobileNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Grad-CAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.1 Two-stage detector . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.1.1 R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.1.2 Fast R-CNN . . . . . . . . . . . . . . . . . . . . . . 26
5.3.1.3 Faster R-CNN . . . . . . . . . . . . . . . . . . . . . 26
5.3.2 One-stage detector . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.2.1 Single Shot MultiBox Detector (SSD) . . . . . . . . . 27
5.3.2.2 You Only Look Once (YOLO) . . . . . . . . . . . . . 27
5.4 Generation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.1 DCGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4.2 ProGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7 Result Analysis 39
7.1 Analyzing Heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.2 Interpreting the model prediction with Grad-CAM . . . . . . . . . . . 42
7.3 Plant Disease Detection model results . . . . . . . . . . . . . . . . . . 44
7.4 Image generation using GANs . . . . . . . . . . . . . . . . . . . . . . 45
7.4.1 Images generated by DCGAN trained on PlantDoc dataset . . 45
iv
7.4.2 Images generated by DCGAN trained on PlantVillage dataset 46
7.4.3 Images generated by ProGANs . . . . . . . . . . . . . . . . . 47
8 Conclusion 49
References 50
Appendix A 52
Appendix B 53
Appendix C 59
Appendix D 61
v
List of Figures
1.1 Agile Software Development Methodology . . . . . . . . . . . . . . . 4
vi
List of Abbreviations
API Application Programming Interface . . . . . . . . . . . . . . . . . . . . . 1
ML Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
DL Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
vii
C-DCGAN Conditional Deep Convonutional Generative Adversarial Network 7
VM Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
viii
List of Tables
2.1 CNN model accuracy table. . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Object detection model accuracy table. . . . . . . . . . . . . . . . . . 7
ix
1. Introduction
1.1 Introduction
Approximately 40 percent of global crop production is lost to pests. Each year, plant
diseases cost the global economy over $220 billion, and invasive insects at least $70
billion [1]. In Nepal, 25-35 percent of yield loss is caused by insect pests and diseases
[2]. Plant Diseases are one of the major reasons influencing food production, causing
a negative hit on the country’s economy and farmers’ livelihood as 65 % of the total
population of Nepal is involved in agriculture. Furthermore, it may lead to food
shortages and become a threat to the rising population. It is important for farmers
to be aware of the risks posed by plant diseases and to take steps to protect their
crops. Early disease detection is mandatory to take preventive measures and increase
the quality & quantity of crop production.
Hence, our project introduces PlantMD, an abbreviation for Plant Medic, which
is a a plant disease prediction system. The project is a Web Application Pro-
gramming Interface (API) application that takes input as an image of the plant and
provides the result as the detected disease, and preventive measures. Furthermore,
we have a mobile application utilizing the API to provide an easy-to-use interface for
the user.
Essentially, we aim to develop a Machine Learning (ML) model which uses the com-
puter vision technique for plant disease prediction that will be applicable in real-world
scenarios through the help of mobile devices. Many computer vision algorithms [3]–
[9] have been developed for classifying and recognizing objects with high accuracy.
Despite the success of the above algorithms, many challenges [10] are present when
applying these algorithms to detect plant diseases. In order to overcome the above
challenges, we have used various classification and detection algorithms and compare
their performance and inference time. We have evaluated our experiment on the
PlantDoc [11] dataset and also use the PlantVillage [12] dataset to further improve
the accuracy of our models.
1
1.2 Problem Statement
Plant Disease Detection has always been a challenging task because of its limited use
in real-world scenarios. Most of the work on such systems that detect the disease in
real-world scenarios is mostly limited to research works. While some of these systems
do exist, they suffer from the problem of either not being deployed or not being easily
accessible and most of the time are not easy to use. Also, the dataset used by most
of them does not reflect real-world data. Furthermore, the expensive cost of data
collection and labeling limits us from experimenting with a large dataset.
1.3 Objectives
• To build an ML model that detects plant disease using a small dataset while
achieving acceptable accuracy in real-world scenarios.
2
1.4 Scope and Limitation
1.4.1 Scope
A plant disease detection system is a tool or system designed to identify and diagnose
plant diseases. The scope of such a system depends on the specific goals and objectives
of the system, as well as the resources available for its development and operation.
The scope of our plant disease detection project includes:
• Integration with other tools and systems: The system should be able to
integrate with other tools and systems, such as weather forecasting systems or
irrigation systems, to help optimize disease management strategies.
1.4.2 Limitation
• Larger models lead to an increase in inference time which might not be desirable
for some end users.
• The accuracy of the model suffers due to the diversity within the dataset.
3
1.5 Development Methodology
Our system involved many parts working in tandem with each other to function as a
whole system. For example, the Web API needs the ML prediction model to produce
results while the Mobile/Web app requires the Web API to be in place to give input
and show results. While these parts are dependent on each other, they didn’t require
the other to be completely functional and could be built in parallel. Hence, we used
the Agile Development Methodology for developing our system. Gantt chart can be
referred to in Appendix A, Figure A.1.
• Design: We thought of how the solution of the next features will be imple-
mented and how problems will be solved.
• Develop: One team member was developing the ML model, another member
was developing the Web API and another member was developing the Web/-
Mobile app.
• Test: At the end of the sprint, all the parts were tested to ensure the features
were implemented according to design.
• Deploy: The code of all the members at the end of the sprint was pushed into
production after all tests had passed.
• Review: Finally at the end of the sprint, the system was reviewed to check if
the parts were working together properly and if the planning and design goals
were met.
4
1.6 Report Structure
• Introduction (Introduction, Problem Statement, Objectives, Scope and Limi-
tation, Development Methodology)
• Result Analysis
• Conclusion
5
2. Literature Review
2.1 Study of Research Papers
Plant disease prediction can be categorized into two subproblems namely Plant Dis-
ease Classification and Plant disease detection. Plant disease classification refers to
the categorization of plant leaves into one of the predefined classes whereas Plant
disease detection is the process of identifying the presence of plant disease in a plant
or group of plants. Plant disease prediction can be solved using either ML-based
techniques or Deep Learning (DL)-based techniques. Plant disease generation is the
technique of generating new synthetic images that can be used for training purposes
to make the model more robust and accurate.
Previously, Image processing techniques were widely used to detect plant diseases.
It uses hand-crafted features such as colors and texture features to train machine
learning models. Fermi energy-based segmentation method was used to isolate the
infected region of the image from its background [13]. Based on the segmented image
features like color, shape, and position were extracted and passed through the rule-
based classification for disease identification of rice plants. Sandika [14] proposed a
feature-based approach for the classification of grape disease. They also separated the
complex background from disease patches and classified the different diseases based on
different texture features like local texture filters, Local Binary Patterns, Gray-Level
Co-Occurrence Matrix (GLCM) features, and some statistical features in the Red
Green Blue (RGB) plane. They compared the performance of four machine learning
algorithms, Probabilistic Neural Network (PNN), Back propagation Neural Network
(BPNN), Support Vector Machine (SVM) & Random Forest, and used Random Forest
and GLCM features.
Mohanty [15] trained a deep convolutional neural network to identify 14 crop species
and 26 diseases. The trained model achieved an accuracy of 99.35% on a held-out
test set, demonstrating the feasibility of this approach. Ramcharan [16] proposed
a deep Convolutional Neural Network (CNN) to identify three diseases in Cassava.
The research showed that the transfer learning approach for image recognition of
field images offers a fast, affordable, and easily deployable strategy for digital plant
disease detection. Singh [11] used deep learning methods for classification and de-
tection. They also released a real-world data set which was a major challenge for
enabling vision-based plant disease detection. In 2017, Fuentes [17] used the object
detection approach for plant disease detection. Faster Region-based Convolutional
Neural Network (R-CNN), Region-based Fully Convolutional Network (R-FCN), and
Single Shot Multibox Detector (SSD) was used to locate tomato disease and pets
directly, combined with deep feature extractors such as Visual Geometry Group Net-
work (VGG-Net) and Residual Network (ResNet). The authors proposed a system
that can effectively recognize nine different types of diseases and pests, with the abil-
ity to deal with complex scenarios from a plant’s surrounding area. Sun [18] proposed
a technique that incorporates three major steps: the data set preprocessing part,
fine-tuning network, and the detection module. Their proposed system achieved the
6
Mean Average Precision (mAP) of 91.83%. Prakruti [19] proposed a method to locate
the health condition of tea leaves in a complex background and with occlusion. They
achieved the mAP score of 86% at 50% Intersection Over Union (IOU) using the You
Only Look Once version 3 (YOLOv3) model.
Model mAP
Faster rcnn resnet101 45.43
SDD Mobilenetv2 41.54
YOLOv5 55.67
7
3. System Analysis
3.1 Proposed System
The system will be a fully-fledged application that can take photos of plant leaves and
send them to our Web API that runs the prediction model behind the scenes to send
the prediction results back to the user. The user will then be shown the prediction
results in the application itself.
Figure 3.1 shows us a diagram that describes the working mechanism of our proposed
system. The user will be using the Mobile app or Web app to send images to our
Web API which will then be passed to the ML model. The model will return the
prediction results which will be stored in the database and then our API will return
the result to the app. Finally, the app will display the results to the user.
8
3.2 Requirement Analysis
We used the observation methodology for our requirement collection where we ob-
served the existing systems and studied the research work to come up with the re-
quirements for our project to overcome the existing limitations of systems and models.
Hence, keeping our findings as the basis, the following are the requirements of our
project.
• A real-world dataset to train the model for robustness and high accuracy in
real-world data.
• Virtual Machine (VM) for the training of compute-intensive deep learning mod-
els.
• The system should contain an AI model that is trained to predict plant diseases
based on the images uploaded by the user. The AI model should analyze the
uploaded image and provide the user with relevant information about the plant
disease.
• The app should allow the user to take a photo of a plant or a group of plants
and upload it to the app.
• The app should provide recommendations to the user on how to treat the de-
tected disease.
• The Web API should have an authentication system to ensure that only regis-
tered devices can hit the Web API.
• The system should integrate with external systems like weather forecasting ser-
vices, pest control agencies, and other relevant systems to provide users with
up-to-date information about plant diseases and prevention measures.
• The PlantMD app should support multiple platforms like Android, iOS, and
the web.
9
3.2.3 Non-Functional Requirements
• Performance: The PlantMD app should provide fast and responsive perfor-
mance, with quick image upload and prediction results within a few seconds.
• Availability: The app should be available 24/7, with minimal downtime for
maintenance or updates.
• Reliability: The app should be reliable and accurate in predicting diseases and
providing recommendations to ensure user satisfaction.
• Usability: The PlantMD app should be easy to use, with a simple and intuitive
interface that users can navigate easily.
• Portability: The app should be portable, allowing users to access their profiles
and disease prediction results across different devices and platforms.
10
3.3 Feasibility Analysis
3.3.1 Technical
Our system is going to be used mainly on mobile devices while the backend is web-
based due to which it has the following technical needs:
• Mobile device
In recent times, technology to support such needs has progressed a lot and is easily ac-
cessible. The available technologies that align with our technical factors are described
below:
• Mobile network coverage and high-speed internet connection are widely available
which ensures a reliable internet connection.
• VMs are available on demand with high customization which provides good
computing power for training our model.
• Lots of VPS service providers exist in the market that allows us to easily con-
figure our servers according to our needs for deploying our model.
• Progressive Web App (PWA) allows us to deliver mobile apps as web apps and
can support low-end devices through the browser.
• Web API allows the backend to be accessed and used by any kind of tech stack.
Since the technical needs of our project can be fulfilled by the above technologies on
both the software and hardware side, our project is technically feasible.
11
3.3.2 Operational
Contrary to the trained staff of any tailored system, our system is going to be used by
a wide array of people from general-purpose users to users with low technical literacy.
Keeping our audience/users in mind, our system has the following features:
• Fast inference and response times for the users to get results.
• Fallback mode for offline usage in which the detection is done by the local model
inside the app instead of the server.
The system is easy to operate and provides accurate results quickly while being pub-
licly and freely accessible. Hence, the project is operationally feasible.
3.3.3 Economic
Training our deep learning model in VM and a VPS for deploying our model to have
fast inference times are part of our project which will be the majority of our cost.
While the cost of VMs is expensive but affordable and also the cost of deploying the
model on a VPS has become reasonable nowadays. We can get our model up and
running at a very affordable and reasonable cost while gaining good performance and
reasonable inference time for our deployed system.
The long-term gains from these services outweigh the cost in the following ways:
• We will get to collect more plant images and data to enhance our model which
is absolutely necessary.
Since the cost of our system allows us to scale our service as needed and improve our
model in the long run, our project is economically feasible.
12
3.3.4 Schedule
The estimated timespan for completing our project is 4 months. Hence, we will be
using cross-platform app development framework like Flutter, React for front end and
FastAPI python library for building our Web API. Since, all these frameworks and
libraries are widely used and popular, have good documentation and a big commu-
nity suppot to solve errors, using them will shorten our development time into the
estimated time. Therefore, our system’s schedule is feasible.
13
4. System Design
4.1 Design of the System
4.1.1 Flowchart
14
4.1.2 Use Case Diagram
15
4.1.3 ER Diagram
16
4.1.4 Data Flow Diagram: Level 0
17
4.1.6 Data Flow Diagram: Level 2
18
4.1.7 Sequence Diagram
19
5. Algorithms Details
Four different types of algorithms are used in this project, which are:
1. Classification Algorithm
2. Grad-CAM
3. Detection Algorithm
4. Generation Algorithm
In a CNN, the input image is passed through multiple layers of interconnected pro-
cessing nodes or neurons, which form the network. These layers include:
20
In this project, we have used two CNN model for the fine tuning purpose.
iii. Stem Layers: The model starts with a ”stem” block, which comprises a series
of convolutional and pooling layers. The stem block is responsible for reducing
the input image’s size and increasing the number of feature maps before feeding
them into the Inception-ResNet modules.
iv. Global Average Pooling and Softmax: After passing through the Inception-
ResNet modules, a global average pooling layer is used to reduce each feature
map’s spatial dimensions to a single value. This results in a feature vector,
which is then passed through a fully connected layer and softmax activation
function to generate the final class probabilities.
21
5.1.2 MobileNet
The key innovations in the MobileNet architecture are the use of depth wise separa-
ble convolutions and the introduction of width and resolution multipliers to create a
family of models with different trade-offs between accuracy and computational com-
plexity.
ii. Width Multiplier: The width multiplier is a hyperparameter that allows the
creation of a thinner MobileNet model by reducing the number of channels in
each layer. By choosing a smaller width multiplier, the number of parameters
and computations in the network decreases, resulting in a smaller and faster
model at the cost of reduced accuracy.
MobileNet has been widely used in various applications, including real-time object
detection, facial recognition, image segmentation, and transfer learning for domain-
specific tasks. Due to its efficiency and lightweight design, MobileNet is particularly
well-suited for deployment on mobile devices, IoT devices, and edge computing sys-
tems, where computational resources and power consumption are critical factors.
22
5.2 Grad-CAM
Interpretability is a relation between formal theories that express the possibility of
interpreting or translating one into another (from Wikipedia). Interpretability in
machine learning and deep learning models has been a major concern for a long time.
The traditional machine learning algorithm is more interpretable while its accuracy
and robustness are low and on the other hand deep learning is great at accuracy and
robustness but lacks interpretability. So there is a tradeoff between accuracy and
interpretability. Deep neural networks are considered black boxes because it’s really
hard to figure out what is happening to make the output. So interpreting these models
can help to gain more trust in predictions.
The neuron importance weight captures the importance of feature map k for a target
class C. Then the linear combination of the forward activation map is performed and
followed by ReLU to obtain the result.
23
!
Õ
𝑐
𝐿𝐺𝑟𝑎𝑑−𝐶𝐴𝑀 = 𝑅𝑒𝐿𝑈 𝛼 𝑐𝑘 𝐴 𝑘
𝑘
| {z }
linear combination
This result is the heatmap of the same size of feature map like 14 x 14. The ReLU
activation is used because we are only interested in the features that have a positive
influence on the class of interest. The produced heatmap is rescaled and superimposed
on the original image to visualize the important regions in the image for prediction.
24
5.3 Detection Algorithm
Deep Learning based object detection can be classified into two different types:
We will briefly look into both types of algorithms and compare their result on plant
disease detection tasks.
The R-CNN family is a series of two-stage object detectors that have significantly
influenced the field of object detection in computer vision. The two-stage approach
involves:
There are several object detectors in the R-CNN family, including R-CNN, Fast
R-CNN, and Faster R-CNN. Each iteration improves upon the previous model’s
efficiency and accuracy.
5.3.1.1 R-CNN
R-CNN is the initial model that introduced the concept of region-based convolutional
neural networks for object detection [22]. It uses selective search, a bottom-up region
proposal method, to generate around 2000 RoIs for each input image. Then, a pre-
trained CNN (e.g., AlexNet or VGG) extracts features from these regions. Finally, a
SVM classifies the regions into object classes, and bounding-box regression refines the
box coordinates. However, R-CNN is computationally expensive due to the separate
CNN feature extraction for each RoI and reliance on external region proposal methods.
25
5.3.1.2 Fast R-CNN
Fast R-CNN improves upon R-CNN by introducing RoI pooling, which allows the
network to extract features from all RoIs in a single forward pass through the CNN
[23]. The input image is first passed through a pre-trained CNN, resulting in a feature
map. The RoIs generated by selective search are then mapped onto the feature map,
and RoI pooling is applied to extract fixed-size features. These features are processed
through fully connected layers, followed by two parallel output layers for classification
(using softmax) and bounding box regression. Fast R-CNN is significantly faster than
R-CNN but still relies on an external region proposal method, which limits its overall
speed.
Faster R-CNN further accelerates object detection by introducing the Region Proposal
Network (RPN), which generates RoIs directly from the feature map produced by the
CNN, eliminating the need for an external region proposal method [24]. The RPN
is trained to predict both objectness scores (how likely a region contains an object)
and bounding box coordinates for anchor boxes. The RoIs generated by the RPN are
then fed into the Fast R-CNN pipeline for classification and bounding box regression.
Faster R-CNN achieves near real-time performance while maintaining state-of-the-art
detection accuracy.
The R-CNN family of detectors has had a significant impact on the field of object
detection, providing a foundation for many subsequent architectures and approaches,
such as Mask R-CNN for instance segmentation and Cascade R-CNN for higher-
quality bounding boxes.
One-stage detectors are object detection models that perform object localization and
classification in a single, unified framework, offering faster detection speeds compared
to two-stage detectors like R-CNN family models. Two popular one-stage detectors
are SSD and YOLO.
26
5.3.2.1 Single Shot MultiBox Detector (SSD)
SSD is a fast and accurate one-stage object detector that combines the ideas of anchor
boxes and multi-scale feature maps to detect objects of various sizes and aspect ratios
[7]. The architecture consists of a base CNN (e.g., VGG or MobileNet) followed by a
series of convolutional layers with decreasing spatial dimensions. The model predicts
object classes and bounding box offsets for a fixed set of anchor boxes at different
feature map levels, capturing objects at different scales and aspect ratios.
During training, SSD uses a matching strategy to assign ground truth boxes to anchor
boxes based on IOU scores. The model is then trained using a combined loss function
that includes both classification loss (e.g., cross-entropy) and localization loss (e.g.,
Smooth L1 loss). SSD achieves near real-time detection speeds with competitive
accuracy compared to two-stage detectors.
YOLO is another one-stage object detector that is designed for real-time detection
with high accuracy [25]. YOLO divides the input image into a fixed-size grid (e.g.,
13x13 or 19x19) and predicts object classes, bounding box coordinates, and objectness
scores for each grid cell.
YOLO is known for its speed and ability to detect objects in real-time, making it
suitable for applications like video analysis and autonomous driving. However, it may
have lower detection accuracy for small objects or objects with similar aspect ratios
compared to some two-stage detectors.
In summary, one-stage detectors like SSD and YOLO offer fast object detection by
unifying the tasks of object localization and classification into a single framework.
While they may not always achieve the same accuracy as some two-stage detectors,
they are well-suited for real-time applications and scenarios where speed is a critical
factor.
27
5.4 Generation Algorithm
GANs are a class of deep learning models introduced by Ian Goodfellow et al. in 2014.
GANs have gained significant attention for their ability to generate realistic and high-
quality images, among other applications. GANs consist of two neural networks, a
generator and a discriminator, which are trained together in a competitive setting.
1. Generator: The generator network takes a random noise vector as input and
produces a generated image. The purpose of the generator is to learn the un-
derlying data distribution of the training images so that it can generate new,
realistic images that resemble the training data.
The training process of a GAN involves alternating between training the discriminator
and the generator:
The training process continues until an equilibrium is reached, where the generator
produces realistic images, and the discriminator can no longer accurately distinguish
between real and generated images. This adversarial training process allows GANs to
generate high-quality and diverse images that mimic the training data distribution.
GANs have been applied to a wide range of image generation tasks, including image
synthesis, style transfer, image-to-image translation, inpainting, and super-resolution.
Variations and improvements to the original GAN architecture have been proposed,
such as DCGANs, CGANs, ProgressiveGANs, and Wasserstein GANs, to address
issues like training stability, mode collapse, and image quality.
28
5.4.1 DCGAN
DCGANs are a variant of GANs that specifically use deep convolutional layers in both
the generator and discriminator networks. DCGANs were introduced by Alec Rad-
ford, Luke Metz, and Soumith Chintala [26]. DCGANs have demonstrated significant
improvements in the stability of GAN training and the quality of generated images
compared to the original GAN architecture.
3. ReLU and Leaky ReLU Activations: DCGANs use the ReLU activation
function in the generator’s hidden layers to promote faster learning, while the
Leaky Rectified Linear Unit (Leaky ReLU) activation function is used in the
discriminator’s hidden layers to prevent sparse gradients and dead neurons.
DCGANs have been used for various image generation tasks, including image syn-
thesis, style transfer, and image-to-image translation. They have also been employed
as a basis for more advanced GAN architectures and applications, such as text-to-
image synthesis, semi-supervised learning, and unsupervised representation learning.
DCGANs have contributed significantly to the development and understanding of
stable and efficient GAN training.
29
5.4.2 ProGAN
ProGANs improve the training process of GANs by progressively increasing the resolu-
tion and complexity of the generated images in a step-by-step manner. The training
begins with low-resolution images, and as the training progresses, the resolution is
gradually increased by adding new layers to both the generator and discriminator
networks. This allows the networks to learn features at different scales, starting from
coarse features and moving towards finer details.
30
6. Implementation and Testing
6.1 Tools Used
6.1.2.1 Backend
Python
Python is widely used for training deep learning models because of its vast number of
libraries and frameworks. It also has huge community support, guides, and tutorials
which makes it an ideal choice for any machine learning project to quickly train and
deploy its model.
FastAPI
FastAPI is a python framework that enables developers to quickly and easily make
web APIs using python along with other frameworks and libraries. Hence, we will be
using it to create our Web API.
PostgreSQL
PostgreSQL is a structured renational database that is easy to use, fast, supports
many languages, and has good community support. We will be using it to store the
user data and the images uploaded by the users for their history and for training our
model. It is an open source database perfectly suitable for our free to use model.
31
6.1.2.2 Frontend
Flutter
Flutter is an open-source framework by Google which allows faster app development,
and native compilation, and can support multiple multiple platforms such as Android
and iOS from a single codebase. We will use flutter to build our user side android
application and a PWA.
Flutter Web
Flutter Web is a library that allows us to build good-looking progressive webapps
from the same codebase of flutter in a short period of time and in a managed way.
We will be using flutter web to build the webapp for our system.
Android Studio
Android Studio is an Integrated Development Environment (IDE) for developing mo-
bile applications with features essential for app development like code completion,
debugging tools, etc. We will be using it to develop our flutter application.
Postman
Postman is a tool for analyzing and testing web APIs and has features such as edit-
ing the Hypertext Transfer Protocol (HTTP) request, viewing the HTTP response,
choosing the request method, changing HTTP header properties, etc. We will be
using it to test our Web API.
Onnx
Onnx is a tool that allows the conversion of ML models from one format to another
which can be run on any platform with a variety of frameworks, tools, runtimes, and
compilers without losing accuracy.
32
6.2 Implementation Details of Modules
6.2.1 Hardware
The training and deployment of the models for the PlantMD project required powerful
hardware resources. For this purpose, high-performance GPUs such as NVIDIA Tesla
T4 was used to accelerate the training process. Additionally, sufficient RAM and
storage were required to handle the large dataset and model files.
The implementation of the PlantMD project relied on various software tools and
frameworks. The primary programming language used for the project was Python,
due to its extensive support for machine learning and image processing libraries. The
deep learning models were implemented using popular frameworks like TensorFlow
and PyTorch, which provided pre-trained models, GPU support, and various opti-
mization techniques. Image preprocessing and data augmentation were performed
using libraries like OpenCV, NumPy, and PIL.
The first step in the implementation of the PlantMD project was to collect a large
dataset of plant images, including healthy plants and plants with various diseases.
This dataset was obtained from various sources, such as online plant databases.. The
images were then preprocessed by resizing, normalization, and data augmentation to
create a diverse and balanced dataset.
Two popular image classification models, Inception ResNetV2 and MobileNet, were
used to classify the plant images into different categories based on their health and
disease conditions. These models were trained on the preprocessed dataset using
transfer learning, where the pre-trained weights of the models were fine-tuned on the
plant dataset. This allowed the models to achieve high accuracy in classifying the
plant images.
To provide better insights into the decision-making process of the image classification
models, the Grad-CAM algorithm was implemented. Grad-CAM is a technique that
visualizes the regions in the input image that contribute most significantly to the
model’s classification decision.
33
6.2.6 Object Detection Models
In addition to image classification, object detection models like Faster R-CNN, SSD
MobileNet, and YOLOv7 were employed to detect and localize the diseased regions
in the plant images. These models were trained on a subset of the dataset with
annotated bounding boxes around the diseased regions. The object detection models
were able to accurately identify and highlight the affected areas in the plant images.
To generate realistic images of healthy and diseased plants, various GAN models like
DCGAN and ProGAN were implemented. These models were trained on the plant
dataset to generate new images, which could be used for further analysis and research.
GANs were particularly useful in generating images with simple backgrounds but they
weren’t effective for images with complex backgrounds.
The performance of the different image classification and object detection models was
evaluated using various metrics like accuracy, precision, recall, and F1 score. Based
on these metrics, the best-performing models were selected for deployment in the
PlantMD application.
Once the models were trained and evaluated, they were deployed on a cloud-based
server using platform called Linode.
34
6.2.12 Application Development and Deployment
The selected models were integrated into a Web API that can be accesed through a
user-friendly application, which could be accessed by farmers, researchers, and other
stakeholders. This application allows users to upload plant images and receive infor-
mation about the health and disease conditions of the plants. The application also
provided suggestions on how to treat the identified diseases and maintain the overall
health of the plants.
35
6.3 Testing
36
6.3.2 Test 2: Disease prediction
37
6.3.3 Test 3: Suggest Remedies
38
7. Result Analysis
7.1 Analyzing Heatmap
CNN were used to classify the different diseases that are present in the plant leaves.The
CNN model can classify 27 different diseases. The heatmap of confusion matrix is
used to evaluate the performance of the given system.
Analyzing the heatmap from Figure 7.1, we can see the wrong predictions are from
the same category i.e Apple, Corn, Potato and Tomato categories. The images from
these classes are very hard to distinguish because they look visually similar. Some of
the images are shown in Figure 7.2, 7.3, 7.4 and 7.5.
39
(a) Tomato Early Blight (b) Tomato Late Blight
Figure 7.2: Featured Discriminator for Tomato Leaves
40
(a) Apple Scab Leaf (b) Apple Rust Leaf
Figure 7.4: Featured Discriminator for Apple leaves
41
7.2 Interpreting the model prediction with Grad-
CAM
Figures 7.6, 7.7, 7.8 and 7.9 are some of the results obtained by Grad-CAM.
42
(a) Tomato Early Blight (b) Prediction: Tomato Early Blight
Figure 7.8: Grad-CAM Visualization for Tomato Early Blight
From figures 7.6, 7.7, 7.8 and 7.9, we can see the model is learning to focus on the
infected parts of the leaves to make predictions. The predictions in Figures 7.8 and 7.9
are from the same class i.e Tomato Early Blight. In Figure 7.8b and Figure 7.9b we
made our model predict them as Tomato Early Blight and Tomato Septoria Gray Spot
respectively. The visualizations are quite interesting. It focuses on the infected upper
part to make its prediction as Tomato Early Blight and it focuses on the infected
lower part to make its prediction as Tomato Septoria Gray Spot.
43
7.3 Plant Disease Detection model results
(b) Tomato Mold Leaf (64%) (c) Tomato Early Blight Leaf (72%)
44
7.4 Image generation using GANs
Based on the Figure 7.11, it can be observed that generating an image with complex
background fails because the images get highly affected by background features.
45
7.4.2 Images generated by DCGAN trained on PlantVillage
dataset
Based on Figure 7.12, it can be observed that the generated images are much more
appealing than the images from the PlantDoc dataset. The images are trying to
capture the shape of the leaf. The DCGAN model is only trained for 2500 epochs
because training the DCGAN WGAN GP model takes a lot of resources and time.
Training on PlantVillage Dataset for a longer duration can produce images that look
similar to original images. Also these image are trained at resolution of 64 x 64 pixels
and training DCGAN to produce a higher resolution image can be hard because of
instability.
46
7.4.3 Images generated by ProGANs
47
Figure 7.15: Synthetic image generated by ProGAN of resolution 128x128
The generated images shown in Figure 7.13, 7.14 and 7.15 are of resolution 32 x 32, 64
x 64 and 128 x 128 respectively. Based on the above image, it can be observed that the
generated image poses quality like shape, color, and texture as of real images. But,
training the same ProGAN architecture on the PlantDoc dataset produces images
that are very much indistinguishable because of complex background features. So
it is difficult to generate images having complex backgrounds or images which are
taken from real-life conditions. The images generated by ProGAN on PlantVillage
look similar to original images which can potentially be used as training samples. Due
to resource constraints, we couldn’t generate an image of size 256X256. So we only
generated images up to size 128X128.
48
8. Conclusion
In conclusion, the PlantMD project aimed to create an effective plant disease pre-
diction system by utilizing various machine learning techniques. Although our initial
attempt with image classification did not yield the desired performance, the subse-
quent implementation of an object detection model proved to be highly successful in
predicting diseases in leaves accurately.
The final phase of our project involved deploying the trained model via a web API,
allowing seamless interaction with a mobile application to serve users efficiently. Over-
all, the PlantMD project represents a significant step forward in plant disease predic-
tion and management, with the potential for further improvements and applications
in the field of agriculture and plant health.
49
References
[1] FAO - News Article: Climate change fans spread of pests and threatens plants
and crops, new FAO study, [Online; accessed 18. Dec. 2022], Dec. 2022. [Online].
Available: https://www.fao.org/news/story/en/item/1402920/icode.
[2] Plant Protection Directorate 2013, [Online; accessed 18. Dec. 2022], Dec. 2022.
[Online]. Available: http://www.npponepal.gov.np/.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-
scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recog-
nition,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2016, pp. 770–778.
[5] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-
resnet and the impact of residual connections on learning,” in Thirty-first AAAI
conference on artificial intelligence, 2017.
[6] A. G. Howard, M. Zhu, B. Chen, et al., “Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications,” arXiv preprint arXiv:1704.04861,
2017.
[7] W. Liu, D. Anguelov, D. Erhan, et al., “Ssd: Single shot multibox detector,”
in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The
Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, 2016, pp. 21–
37.
[8] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint
arXiv:2207.02696, 2022.
[9] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time ob-
ject detection with region proposal networks,” Advances in neural information
processing systems, vol. 28, 2015.
[10] R. I. Hasan, S. M. Yusuf, and L. Alzubaidi, “Review of the state of the art
of deep learning for plant diseases: A broad analysis and discussion,” Plants,
vol. 9, no. 10, p. 1302, 2020.
[11] D. Singh, N. Jain, P. Jain, P. Kayal, S. Kumawat, and N. Batra, “Plantdoc:
A dataset for visual plant disease detection,” in Proceedings of the 7th ACM
IKDD CoDS and 25th COMAD, ser. CoDS COMAD 2020, Hyderabad, India:
Association for Computing Machinery, 2020, pp. 249–253, isbn: 9781450377386.
doi: 10.1145/3371158.3371196. [Online]. Available: https://doi.org/10.
1145/3371158.3371196.
[12] PlantVillage Dataset, [Online; accessed 19. Dec. 2022], Dec. 2022. [Online].
Available: https://www.kaggle.com/datasets/emmarex/plantdisease.
[13] S. Phadikar, J. Sil, and A. K. Das, “Rice diseases classification using feature
selection and rule generation techniques,” Computers and electronics in agri-
culture, vol. 90, pp. 76–85, 2013.
50
[14] B. Sandika, S. Avil, S. Sanat, and P. Srinivasu, “Random forest based classifica-
tion of diseases in grapes from images captured in uncontrolled environments,”
in 2016 IEEE 13th international conference on signal processing (ICSP), IEEE,
2016, pp. 1775–1780.
[15] S. P. Mohanty, D. P. Hughes, and M. Salathé, “Using deep learning for image-
based plant disease detection,” Frontiers in plant science, vol. 7, p. 1419, 2016.
[16] A. Ramcharan, K. Baranowski, P. McCloskey, B. Ahmed, J. Legg, and D. P.
Hughes, “Deep learning for image-based cassava disease detection,” Frontiers
in plant science, vol. 8, p. 1852, 2017.
[17] A. Fuentes, S. Yoon, S. C. Kim, and D. S. Park, “A robust deep-learning-based
detector for real-time tomato plant diseases and pests recognition,” Sensors,
vol. 17, no. 9, p. 2022, 2017.
[18] J. Sun, Y. Yang, X. He, and X. Wu, “Northern maize leaf blight detection
under complex field environment based on deep learning,” IEEE Access, vol. 8,
pp. 33 679–33 688, 2020.
[19] P. V. Bhatt, S. Sarangi, and S. Pappula, “Detection of diseases and pests on im-
ages captured in uncontrolled conditions from tea plantations,” in Autonomous
Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping
IV, SPIE, vol. 11008, 2019, pp. 73–82.
[20] Z. Chen, Y. Qian, Y. Wang, and Y. Fang, “Deep convolutional generative ad-
versarial network-based emg data enhancement for hand motion classification,”
Frontiers in Bioengineering and Biotechnology, vol. 10, 2022.
[21] N. Alajlan, M. Alyahya, N. Alghasham, and D. M. Ibrahim, “Data enhancement
for date fruit classification using dcgan,” 2021.
[22] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies
for accurate object detection and semantic segmentation,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
[23] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference
on computer vision, 2015, pp. 1440–1448.
[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time ob-
ject detection with region proposal networks,” Advances in neural information
processing systems, vol. 28, 2015.
[25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Uni-
fied, real-time object detection,” in Proceedings of the IEEE conference on com-
puter vision and pattern recognition, 2016, pp. 779–788.
[26] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learn-
ing with deep convolutional generative adversarial networks,” arXiv preprint
arXiv:1511.06434, 2015.
51
Appendix A
Gantt Chart
52
Appendix B
1. Home Screen
53
2. Select Image From Storage
54
3. Scanning Plant UI
55
4. Result Popup UI
Figure B.4: Small fragment of results that pops up after scanning is complete.
56
5. Result Screen UI
57
6. Result Screen of Diseases Plant
58
Appendix C
1. Hitting the \ping endpoint of the Web API without auth token
Figure C.2: Posting \token endpoint with credentials to obtain an auth token.
3. Hitting the \ping endpoint of the Web API with auth token
59
4. Posting image to the \inferences endpoint of the Web API
5. Hitting the \image endpoint of the Web API with image name
60
Appendix D
The soft copy of this report and the source code to all the artifacts of this
project can be found in the anuraglimbu/PlantMD github repository.
61