50% found this document useful (2 votes)
461 views

Detection and Classification of Abnormalities in Leaf Image Using CNN - Blackbook

This project aims to detect and classify abnormalities in leaf images using convolutional neural networks (CNN). The CNN model is trained on a dataset of images of healthy and diseased tomato plant leaves. The trained model is then deployed on a Raspberry Pi 3B+ to enable real-time prediction of leaf abnormalities directly on the farm. This low-cost solution aims to help farmers identify plant diseases early to improve crop yields.

Uploaded by

Shrirang Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
461 views

Detection and Classification of Abnormalities in Leaf Image Using CNN - Blackbook

This project aims to detect and classify abnormalities in leaf images using convolutional neural networks (CNN). The CNN model is trained on a dataset of images of healthy and diseased tomato plant leaves. The trained model is then deployed on a Raspberry Pi 3B+ to enable real-time prediction of leaf abnormalities directly on the farm. This low-cost solution aims to help farmers identify plant diseases early to improve crop yields.

Uploaded by

Shrirang Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

REPORT TITLED

DETECTION AND CLASSIFICATION OF ABNORMALITIES IN LEAF


IMAGE USING CNN
SUBMITTED
B.Tech (ELECTRONICS AND TELECOMMUNICATIONS)

BY

MIHIR DESHPANDE 151090025


OMKAR MORE 151090026
KESHAV JHA 151090023
VENKTESH RATHI 151090008
JENIL GALA 151090014

GUIDED BY
Dr. A. N. Cheeran
DEPARTMENT OF ELECTRICAL ENGINEERING

VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE


MUMBAI 400019
2018-2019
DECLARATION

We declare that this written submission represents my ideas in our own words
and where others' ideas or words have been included, we have adequately cited and
referenced the original sources.
We also declare that we have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any idea / data / fact /
source in our submission.
We understand that any violation of the above will be cause for disciplinary
action by the Institute and can also evoke penal action from the sources which have thus
not been properly cited or from whom proper permission has not been taken when
needed.

Signature of Candidate Signature of Candidate

MIHIR DESHPANDE OMKAR MORE

Signature of Candidate Signature of Candidate

KESHAV JHA VENKTESH RATHI

Signature of Candidate

JENIL GALA
CERTIFICATE

This is to certify that,

MIHIR DESHPANDE 151090025


OMKAR MORE 151090026
KESHAV JHA 151090023
VENKTESH RATHI 151090008
JENIL GALA 151090014

Students of B.Tech E.X.T.C have completed the Project Report entitled,


“DETECTION AND CLASSIFICATION OF ABNORMALITIES IN LEAF IMAGE USING
CNN” to our satisfaction.

Dr. A.N. Cheeran

(Department of Electrical Engineering, VJTI)


VEERMATA JIJABAI TECHNOLOGICAL INSTITUTE
CENTRAL TECHNOLOGICAL INSTITUTE, MAHARASHTRA STATE
MATUNGA, MUMBAI – 400019

CERTIFICATE OF APPROVAL

THE REPORT “DETECTION AND CLASSIFICATION OF ABNORMALITIES IN LEAF


IMAGE USING CNN” SUBMITTED BY MIHIR DESHPANDE, OMKAR MORE, KESHAV
JHA, VENKTESH RATHI AND JENIL GALA, IS FOUND TO BE SATISFACTORY AND IS
APPROVED FOR THE DEGREE OF BACHELOR OF TECHNOLOGY IN ELECTRONICS
AND TELECOMMUNICATION ENGINEERING DEGREE

Guide and Internal Examiner External Examiner

Date: Place:

1
ACKNOWLEDGEMENT

The vital aspect of any project is to complete the project in the given time
frame with the available resources. The success would have been miles away if we
hadn’t received support from various people who directly or indirectly
contributed to the success of the project. We would like to take this opportunity
to thank them wholeheartedly.

We are grateful to our Project guide, Dr. A.N. Cheeran, to channelize our
skills and energy and boosting us to work together with cooperation and
coordination.

We would definitely like to thank the entire department staff for giving us
full support and all the resources required for completing this project.

We express our sincere thanks and deepest gratitude to our family and
friends for their support and encouragement. We would like to thank every
person who might have helped in making this. If, in the course of
acknowledging the contributors to the project, we have forgotten any names,
generously pardon us.

2
ABSTRACT

Using Convolutional Neural Network (CNN) the project classifies a given


image of a plant leaf into diseased or healthy class, and further, into nine different sets
of diseases for the Tomato plant. It uses Multi-Level Perceptron, built using Keras API
with Tensorflow backend to achieve the goal. A model is created using 2 hidden layers
and one output layer and trained prior to deploying it on a single board computer. A
Raspberry Pi 3B+ is used as the single board computer, on which the pre-trained model
is deployed and is used for further predictions. Predictions are done on a timely basis,
using a timer on Raspberry Pi 3B+ and provide regular updates about the condition of
the plants and trees in the farm.

3
TABLE OF CONTENTS

CHAPTER 1 : INTRODUCTION 4
1.1 Introduction 4
1.2 Flowchart 5
1.3 Motivation 6
1.4 Area of utility 6
CHAPTER 2: LITERATURE SURVEY 7
2.1 Convolutional Neural Networks 7
2.2 Leaf Disease Detection 7
CHAPTER 3: PROPOSED METHODOLOGY 9
3.1 Data Acquisition 9
3.2 Dataset design 10
3.3 Pre-processing of the Images 12
3.4 Neural Networks 14
3.4.1 Artificial neural networks 14
3.4.2 Types of ANNs 15
3.4.3 Forward Propagation 18
3.4.4 Back-propagation 19
3.5 Convolutional Neural Networks (CNNs) 20
3.5.1 Convolutional Layer 23
3.5.2Pooling Layer 27
3.5.3 Fully-connected layer 29
CHAPTER 4: REQUIREMENTS AND IMPLEMENTATION 30
4.1 Hardware Requirements 30
4.1.1Raspberry Pi 3B+ 30
4.1.2Pi Camera 33
4.2 Software Requirements 34
4.2.1Google Colab 34
4.2.2 TensorFlow 35
4.2.3 Keras 37
4.2.4 Thonny: Python IDE for Raspberry Pi 37

4
4.2.5 RealVNC: Remote connection for Raspberry Pi 38
4.3 Project Design 38
4.3.1 Hardware implementation 38
4.3.2 Software Implementation 46
4.3.2.1 General Model 47
4.3.2.2 Specific Model 48
4.3.2.3 Remote Deployment 49
CHAPTER 5: RESULTS 51
5.1 General model 51
5.1.1 Summary 51
5.1.2 Accuracy 52
5.2 Specific model 53
5.2.1 Summary 53
5.2.2Accuracy 54
5.3 Leaf Classification testing 54
CHAPTER 6: CONCLUSION AND FUTURE SCOPE 60
6.1 Conclusion 60
6.2 Future Scope 60
CHAPTER 7: BIBLIOGRAPHY 62

5
Chapter 1
Introduction
This chapter provides a brief outlook of the project and introduces the
motivation and need for this project

1.1 INTRODUCTION

Rural population of India is reported at 66.46% of total population in 2017, with


agricultural industry being the main source of employment, employing about 80% of
rural population. There exists a huge disparity between agricultural productivity of
India and that of most of the developed and developing world. An economy’s growth
highly depends on the agricultural productivity and hence it plays an important role
in the rise of the economy. The undetected and untreated diseases may hamper the
product quality, quantity. The plants may have diseases, which is very common, that
require strenuous efforts to detect due to late manifestation of their symptoms and
this is one of the reasons that detection of the diseases in plant is very crucial.

Wild and prolonged usage of broad spectrum pesticides in fields have a detrimental
effect on soil health and fertility, with the extreme case being total loss of fertility.
Instead, if a mechanism to identify and classify plant leaf diseases were available, the
pinpointed and localized methodologies of disease prevention and elimination can
be employed.

The existing method for detection is simply naked eye observation by either
common person which becomes unreliable after a certain extent or experts which
increases the cost and time consumed in monitoring as the size of the field increases.
People in the agricultural field, by and large, are unaware of the resources provided.
Therefore, building a methodology, which identifies the disease on the basis of
symptoms and classify it according to the type, to tackle the grave problems gives us
a cheaper and efficient alternative to the traditional way.

In this project, we have proposed a way which detects and classifies the disease
automatically to solve the laborious task of disease detection. The suggested
technique requires less human intervention, consumes less time and is more

6
accurate than the traditional method. Convolutional neural networks form the
backbone of the detection, classification and prediction process which is applied to
visual imagery for analysis and further processes.

1.2 FLOWCHART

The general flowchart followed by the project can be shown as:

Fig 1.1 General flowchart for the process

7
1.3 MOTIVATION

After the Green Revolution of 1965, the use of chemical pesticide in Indian farms
increased, boosting our agricultural efficiency and overall output. But currently,
more than half a century after the green revolution, the efficiency of our agricultural
sector is far below that of the western countries. Even after employing nearly 50%
of the workforce, its contribution in the GDP is less than 20%. This massive rift can
be bridged using state of the art technology, modern equipments and targeted use
of pesticides, for only the type of threat present. Is project is a minor contribution in
that direction. It would warn the farmer about the imminent threat like lack of
nutrients, toxicity and fertility of soil that manifest itself in form of plant diseases.

1.4 AREA OF UTILITY

• Provide a targeted approach to tackle plant diseases.


• Recognize the disease present in the plant with the unhealthy leaves.
• Can be implemented in all sizes and varieties of farms/plantation.
• Used in plants with a very low probability of diseases which can go unnoticed.

8
Chapter 2
Literature Survey
This chapter informs us about some of the previous research carried out
in the fields explored in this project

2.1 Convolutional Neural Networks

Convolutional Neural Networks, like neural networks, are made up of neurons with
learnable weights and biases. Each neuron receives several inputs, takes a weighted
sum over them, pass it through an activation function and responds with an
output.The convolution layer is the main building block of a convolutional neural
network.The convolution layer comprises of a set of independent filters. Each filter is
independently convolved with the image to obtain an output set of vectors.
Many classification tasks are classified using CNNs due to their ability to adapt to
extreme noise in their inputs and still give acceptable results.
• The Leaf classification challenge issued by PlantVillage was won using
Convolutional Neural Networks with a mean F1 score of 0.003
• The ImageNet Large Scale Visual Recognition Challenge which aims to tackle
classification problems are extremely large scales is mostly won by CNN
implementations such as GoogleNet and AlexNet with hidden layers to the
order of 152.

2.2 Leaf Disease Detection

• A Unified Framework for Multi-label Image Classification by Jiang ImageNet


Classification with Deep Convolutional Neural Networks by Alex Krizhevsky,
Ilya Sutskever, Geoffrey E. Hinton: They trained a deep convolutionla neural
network to classify a huge sample size of 1.2 million high resolution imagesin
the ImageNet LSVRC-2010 into 100 different classes. The neural network has
60 million parameters that comprise around 650,000 neurons. To avoid

9
overfitting, they employed the then recently introduced regularization
method called dropout.
• Fast and Accurate Detection and Classification of Plant Diseases H. Al-Hiary, S.
Bani-Ahmad, M. Reyalat, M. Braik and Z. ALRahamneh: Before passing a leaf
image to the CNN, they employed certain pre-processing technique to greatly
enhance the accuracy and the training speed of the CNN. The other additional
step is that the pixels with zeros red, green and blue values and the pixels on
the boundaries of the infected cluster (object) were completely removed.

• Color Transform Based Approach for Disease Spot Detection on Plant Leaf by
Piyush Chaudhary, Anand K. Chaudhari , Dr. A. N. Cheeran and Sharda Godara:
The first step in classification of leaf diseases is image segmentation. This
paper compares the accuracy in segmentation and thresholding via Otsu’s
Method by first transforming the RGB colour space into three different colour
spaces viz. CIELAB, HSI and YCbCr.

• A framework for detection and classification of plant leaf and stem diseases by
Dheeb Al Bashish , Malik Braik Sulieman Bani-Ahmad: The objective of this
paper is to provide a fast, automatic, cheap and accurate image-processing-
based solutions for classification of plant diseases. The first step involves
image segmentation using K-Means technique and in the second step, images
are passed through a trained neural network. The developed Neural Network
classifier that is based on statistical classification perform well and could
successfully detect and classify the tested diseases with a precision of around
93%.

• Very Deep Convolutional Networks For Large-scale Image Recognition by


Karen Simonyan& Andrew Zisserman: They have investigated the effect of the
convolutional network depth on its accuracy in the large-scale image
recognition setting, by thorough evaluation of networks of increasing depth
using an architecture with very small ( 3 × 3) convolution filters.

• Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang and Wei Xu:
Utilizing recurrent neural networks (RNNs) combined with CNNs, the
proposed CNN-RNN framework learns a joint image-label embedding to
characterize the semantic label dependency as well as the image-label
relevance, and it can be trained end-to-end from scratch to integrate both
information in a unified framework, in CNN-RNN.

10
Chapter 3
Proposed Methodology
This chapter informs us about the concepts used in executing the project
and gives an insight into these methodologies

3.1 Data Acquisition:

The data acquisition is the very first and crucial step in achieving the objective. The
process involves gathering of data suitable for our application.The good dataset must
possess at least three properties. They are as follows:

• Quality of dataset –
The dataset must be diverse with images varying in various degree of quality.
In cases when the quality of images is too high for our requirement, we can
compress it during the preprocessing of dataset. But if the quality of images
were too low for the required application, loss of data, and hence loss of
features, cripples the training and validation accuracy of the neural network,
rendering the entire model unreliable and utterly useless, with massive loss
values.

• Quantity of dataset –
There must be sufficiently large number of images present in the dataset for
traininga model around it. When the dataset has extremely less number of
images, the impact of a single image on the entire neural network and the
weights of the feature matrix is comparatively larger which could lead to
overfitting of the model, producing below satisfactory validation result. Thus,
by increasing the number of images in the dataset, while maintaining the
required diversity, we make the model more generalized, so that it can be
suitable for all kinds of testing images.

• Diverse dataset –
The dataset must be acquired from a diverse set of conditions. The images
must be taken from multiple angles, they must be acquired in multiple
illumination, they must be of multiple plants and of multiple diseases. Lack of
diversity in dataset leads to the monotonicity of the training algorithm, giving
rise to overfitting, and hence limiting the effectiveness and accuracy of the

11
prediction of the neural network. When the dataset is diverse, the neural
network comes across this diversity and the predictions are much more
accurate, and overfitting of the model is limited.

Images used in the training and validation process of the general model and specific
model were acquired through an external source, crowdAI. The dataset maintained
by the external source had ample amount of images belonging to multiple classes
which were used as per our needs. The format of the images was .JPG.

3.2 Dataset design:

The general model and specific model we built need a dataset to be segregated in a
certain way. Hence, the dataset was segregated in three different subsets.
For the general model, the diversity of the original dataset was maintained in all the
three subsets. Additionally, all the subsets of the original dataset are kept to be
mutually exclusive but not exhaustive. The original dataset was divided in 80:20
proportion with 80% of the original data in training set and rest in validation set.
We have,

• Training set –
Training set is further divided into two subsets that are:
a. diseased – images of leaves of diseased plants
b. healthy - images of leaves of healthy plants
This dataset is used to train the CNN model created and is the first dataset
that is used in the model. It contains about 6214 healthy leaf images and 5985
diseased leaf images. Training our neural network on this specific dataset
makes our network learn how to weigh different features in feature matrix.

• Validation set –
Validation set is similarly divided into two different datasets:
a. diseased – images of leaves of diseased plants
b. healthy - images of leaves of healthy plants
Validation dataset is used to verify the accuracy and reliability of the model
trained using the training set. It contains about 1500 healthy leaf images and
1500 diseased leaf images.This dataset can also be used for regularization
purpose where we can judge if the model is overfitting by monitoring the
validation loss.

12
• Testing set –
The certain number of images were kept aside from the original dataset with
same probability distribution as that of training dataset. We have also used
picam hardware for capturing images for testing them.
For the specific model, that contains only diseased tomato plant leaf images as it
operates only on diseased plant leaves, the diversity of the original dataset was
maintained in all the three subsets. Additionally, all the subsets of the original
dataset are kept to be mutually exclusive but not exhaustive. The original dataset
was divided in 80:20 proportion with 80% of the original data in training set and rest
in validation set.
We have,

• Training set –
Training set is further divided into 9 subsets that are:
a. xanthomonas_campestris
b. alternaria_solani
c. phytophthora_infestans
d. fulvia_fulva
e. septoria_lycopersici
f. tetranychus_urticae
g. corynespora_cassiicola
h. yellow_leaf_curl_virus
i. mosaic_virus
This dataset is used to train the CNN model created and is the first dataset
that is used in the model. It contains about 5209 leaf images.

• Validation set –
Validation set is similarly divided into 9 different datasets:
a. xanthomonas_campestris
b. alternaria_solani
c. phytophthora_infestans
d. fulvia_fulva
e. septoria_lycopersici
f. tetranychus_urticae
g. corynespora_cassiicola
h. yellow_leaf_curl_virus
i. mosaic_virus

13
Validation dataset is used to verify the accuracy and reliability of the model
trained using the training set. It contains about 1305 healthy leaf images.

3.3 Pre-processing of the Images:

In this project we use the ‘ImageDataGenerator’ library to pre-process the rather


small set of images to obtain more variations and better fit the model to a more
generalized set of images.
Data preparation is required when working with neural network and deep learning
models. Increasingly data augmentation is also required on more complex object
recognition tasks.
Keras provides the ImageDataGenerator class that defines the configuration for
image data preparation and augmentation. This includes capabilities such as:
• Sample-wise standardization.
• Feature-wise standardization.
• ZCA whitening.
• Random rotation, shifts, shear and flips.
• Dimension reordering.
• Save augmented images to disk.
After you have created and configured your ImageDataGenerator, you must fit it on
your data. This will calculate any statistics required to actually perform the
transforms to your image data.

Feature Standardization:
It is also possible to standardize pixel values across the entire dataset. This is called
feature standardization and mirrors the type of standardization often performed for
each column in a tabular dataset.

Random Rotations
Sometimes images in your sample data may have varying and different rotations in
the scene.
You can train your model to better handle rotations of images by artificially and
randomly rotating images from your dataset during training.

Random Shifts
Objects in your images may not be centered in the frame. They may be off-center in
a variety of different ways.

14
You can train your deep learning network to expect and currently handle off-center
objects by artificially creating shifted versions of your training data. Keras supports
separate horizontal and vertical random shifting of training data by the
width_shift_range and height_shift_range arguments.

Random Flips
Another augmentation to your image data that can improve performance on large
and complex problems is to create random flips of images in your training data.
Keras supports random flipping along both the vertical and horizontal axes using the
vertical_flip and horizontal_flip arguments.

Advantages of using ImageDataGenerator:


• Easy to write — We just have to call
keras.preprocessing.image.ImageDataGenerator() and set values for different
parameters like horizontal_flip, vertical_flip, rescale, brightness_range,
zoom_range, rotation_rangeetc
• Less to remember — We need not manually code cv2 image processing
techniques for flipping, varying brightness, zoom etc
• Easy to combine — As seen in the code example, we can easily
combine ImageDataGenerator with our custom image generator.
• Fast — If you want to use multiple threads to load training data,
Keras ImageDataGenerator.flow() has a workers argument, which can be
tuned, thus reducing the training time by orders of magnitude.

Consider some of these transforms performed on the following image –

Fig 3.3.1 Original image

15
Fig 3.3.2 Many methods used for augmentations (flipping, rotation, shear,
brightness, zoom, shifting

3.4 Neural Networks

3.4.1 Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely
inspired by the biological neural networks that constitute animal brains. Such systems “learn” to
perform tasks by considering examples, generally without being programmed with any task-specific
rules.

The Architecture of an Artificial Neural Network:

ANN is a set of connected neurons organized in layers:

• input layer: brings the initial data into the system for further processing by subsequent layers of
artificial neurons.

• hidden layer: a layer in between input layers and output layers, where artificial neurons take in
a set of weighted inputs and produce an output through an activation function.

16
• output layer: the last layer of neurons that produces given outputs for the program.

3.4.2 Types of ANNs:

Perceptron:

The simplest and oldest model of an ANN, the Perceptron is a linear classifier used for binary
predictions. This means that in order for it to work, the data must be linearly separable.

Fig 3.4.1 Different types of separations

Its Architecture:

Fig 3.4.2 Structure of Perceptron

17
Multi-layer ANN:
More sophisticated than the perceptron, a Multi-layer ANN (e.g.: Convolutional Neural Network,
Recurrent Neural Network etc …) is capable of solving more complex classification and regression tasks
thanks to its hidden layer(s).

Its Architecture:

Fig 3.4.3 Structure of Multilayer ANN

Activation Functions:

Definition: In artificial neural networks, the activation function of a node defines the output of that
node given an input or set of inputs.

Sigmoid:
A sigmoid function is a mathematical function having a characteristic “S”-shaped curve or sigmoid
curve. Often, sigmoid function refers to the special case of the logistic function which generate a set of
probability outputs between 0 and 1 when fed with a set of inputs. The sigmoid activation function is
widely used in binary classification.
Equation:

18
Graphical Representation:

Fig 3.4.4 Sigmoid activation function

ReLU:
Instead of the sigmoid activation function, most recent artificial neural networks use rectified linear
units (ReLUs) for the hidden layers. A rectified linear unit has output 0 if the input is less than 0, and raw
output otherwise. That is, if the input is greater than 0, the output is equal to the input.
Equation:

Graphical Representation:

Fig 3.4.4 ReLu activation function

Softmax:
Unlike the Sigmoid activation function, the Softmax activation function is used for multi-class
classification. Softmax function calculates the probabilities distribution of the event over ’n’ different
events. In general way of saying, this function will calculate the probabilities of each target class over all

19
possible target classes. Later the calculated probabilities will be helpful for determining the target class
for the given inputs.

Equation:

Why do we use activation functions ?


Without an activation function we will fail to introduce non-linearity into the network. An activation
function will allow us to model a response variable (target variable, class label, or score) that varies non-
linearly with its explanatory variables. Non-linear means that the output cannot be reproduced from a
linear combination of the inputs. Another way to think of it: without a non-linear activation function in
the network, an artificial neural network, no matter how many layers it has, will behave just like a single-
layer perceptron, because summing these layers would give you just another linear function.

3.4.3 Forward Propagation:

Definition: Forward propagation is the process of feeding the Neural Network with a set of inputs to
get their dot product with their weights then feeding the latter to an activation function and comparing
its numerical value to the actual output called “the ground truth”.

Cross entropy error:


Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a
probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges
from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be
bad and result in a high loss value. A perfect model would have a log loss of 0.

Equation:

Where ‘yi’ is the predicted probability value for class ‘i’ and ‘y′i’ is the true probability for that class.

3.4.4 Back-propagation:

Definition: Backpropagation is a method used in artificial neural networks to calculate a gradient that is
needed in the calculation of the weights to be used in the network.

Calculations:
First let’s lay some important derivatives:

20
Cross entropy error derivative:

Sigmoid derivative:

Graphical Representation:

Fig 3.4.5 Sigmoid and its derivative

An illustration of how a neural network backpropagates its error:

Fig 3.4.6 Backpropagation calculations

21
3.5 Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are very similar to ordinary Neural Networks from
the previous section: they are made up of neurons that have learnable weights and
biases. Each neuron receives some inputs, performs a dot product and optionally
follows it with a non-linearity. The whole network still expresses a single
differentiable score function: from the raw image pixels on one end to class scores at
the other. And they still have a loss function on the last (fully-connected) layer and all
the tips/tricks we developed for learning regular Neural Networks still apply.

ConvNet architectures make the explicit assumption that the inputs are images,
which allows us to encode certain properties into the architecture. These then make
the forward function more efficient to implement and vastly reduce the amount of
parameters in the network.

Architecture Overview
Regular Neural Nets. As we saw in the previous chapter, Neural Networks receive an
input (a single vector), and transform it through a series of hidden layers. Each
hidden layer is made up of a set of neurons, where each neuron is fully connected to
all neurons in the previous layer, and where neurons in a single layer function
completely independently and do not share any connections. The last fully-
connected layer is called the “output layer” and in classification settings it represents
the class scores.

Regular Neural Nets don’t scale well to full images. In our implementation, images
are only of size 64x64x3 (64 wide, 64 high, 3 colour channels), so a single fully-
connected neuron in a first hidden layer of a regular Neural Network would have
64*64*3 = 12288 weights. This amount still seems manageable, but clearly this fully-
connected structure does not scale to larger images. This full connectivity is wasteful
and the huge number of parameters would quickly lead to overfitting.

Convolutional Neural Networks take advantage of the fact that the input consists of
images and they constrain the architecture in a more sensible way. In particular,
unlike a regular Neural Network, the layers of a ConvNet have neurons arranged in 3
dimensions: width, height, depth. The neurons in a layer will only be connected to a
small region of the layer before it, instead of all of the neurons in a fully-connected
manner. Moreover, the final output layer would have only one output, giving us the
class that the image belongs to. Here is a visualization:

A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to
an output 3D volume with some differentiable function that may or may not have parameters.

As we described above, a simple ConvNet is a sequence of layers, and every layer of a ConvNet
transforms one volume of activations to another through a differentiable function.We usethree

22
Fig. 3.5.1 A normal neural network

Fig. 3.5.2 A convolutional neural network

main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-
Connected Layer (exactly as seen in regular Neural Networks). We will stack these layers to form a
full ConvNet architecture.
Example Architecture: Overview. We will go into more details below, but a simple ConvNet for
CIFAR-10 classification could have the architecture [INPUT - CONV - RELU - POOL - FC]. In more
detail:
• In INPUT The images fed to the network will be compressed to [64x64x3] an image of
width 64, height 64, and with three colour channels R,G,B.
• CONV layer will compute the output of neurons that are connected to local regions in the
input, each computing a dot product between their weights and a small region they are
connected to in the input volume. 32 feature of the size 3x3 can be used. So the image size
will change to [64x64x32].
• RELU layer will only apply an elementwise activation function, such as
the max(0,x)max(0,x) thresholding at zero. This leaves the size of the volume unchanged.
• POOLING layer will perform a downsampling operation along the spatial dimensions
(width, height), resulting in volume such as [16x16x3].

23
• FULL CONNECTION(i.e. fully-connected) layer will compute the class scores, resulting in
volume of size [1x1x2], where each of the 10 numbers correspond to a class score, such as
among the 2 categories of diseased and health. As with ordinary Neural Networks and as
the name implies, each neuron in this layer will be connected to all the numbers in the
previous volume.

In this way, ConvNets transform the original image layer by layer from the original pixel values to
the final class scores. Note that some layers contain parameters and other don’t. In particular, the
CONV/FC layers perform transformations that are a function of not only the activations in the input
volume, but also of the parameters (the weights and biases of the neurons). On the other hand,
the RELU/POOL layers will implement a fixed function. The parameters in the CONV/FC layers will
be trained with Adam so that the class scores that the ConvNet computes are consistent with the
labels in the training set for each image.
In summary:
• A ConvNet architecture is in the simplest case a list of Layers that transform the image
volume into an output volume .
• There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are used).
• Each Layer accepts an input 3D volume and transforms it to an output 3D volume through
a differentiable function.
• Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t).
• Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do,
RELU doesn’t).

Fig 3.5.3 Visualizing convolution, pooling and ReLu in successive layers

We now describe the individual layers and the details of their hyperparameters and
their connectivities.

24
3.5.1 Convolutional Layer
The Conv layer is the core building block of a Convolutional Network that does most
of the computational heavy lifting.
Overview and intuition: Lets first discuss what the CONV layer computes without
brain/neuron analogies. The CONV layer’s parameters consist of a set of learnable
filters. Every filter is small spatially (along width and height), but extends through the
full depth of the input volume. For example, a typical filter on a first layer of a
ConvNet might have size 5x5x3 (i.e. 5 pixels width and height, and 3 because images
have depth 3, the color channels). During the forward pass, we slide (more precisely,
convolve) each filter across the width and height of the input volume and compute
dot products between the entries of the filter and the input at any position. As we
slide the filter over the width and height of the input volume we will produce a 2-
dimensional activation map that gives the responses of that filter at every spatial
position. Intuitively, the network will learn filters that activate when they see some
type of visual feature such as an edge of some orientation or a blotch of some color
on the first layer, or eventually entire honeycomb or wheel-like patterns on higher
layers of the network. Now, we will have an entire set of filters in each CONV layer
(e.g. 12 filters), and each of them will produce a separate 2-dimensional activation
map. We will stack these activation maps along the depth dimension and produce
the output volume.
The brain view: If you’re a fan of the brain/neuron analogies, every entry in the 3D
output volume can also be interpreted as an output of a neuron that looks at only a
small region in the input and shares parameters with all neurons to the left and right
spatially (since these numbers all result from applying the same filter). We now
discuss the details of the neuron connectivities, their arrangement in space, and their
parameter sharing scheme.
Local Connectivity: When dealing with high-dimensional inputs such as images, as
we saw above it is impractical to connect neurons to all neurons in the previous
volume. Instead, we will connect each neuron to only a local region of the input
volume. The spatial extent of this connectivity is a hyperparameter called
the receptive field of the neuron. The extent of the connectivity along the depth axis
is always equal to the depth of the input volume. It is important to emphasize again
this asymmetry in how we treat the spatial dimensions (width and height) and the
depth dimension: The connections are local in space (along width and height), but
always full along the entire depth of the input volume.
Example 1. For example, suppose that the input volume has size [32x32x3], (e.g. an
RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron
in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a

25
total of 5*5*3 = 75 weights (and +1 bias parameter). Notice that the extent of the
connectivity along the depth axis must be 3, since this is the depth of the input
volume.
Example 2. Suppose an input volume had size [16x16x20]. Then using an example
receptive field size of 3x3, every neuron in the Conv Layer would now have a total of
3*3*20 = 180 connections to the input volume. Notice that, again, the connectivity is
local in space (e.g. 3x3), but full along the input depth (20).

Fig 3.5.4 Left: An example input an example volume of neurons in the first
Convolutional layer volume in red (e.g. a 32x32x3 CIFAR-10 image)

Each neuron in the convolutional layer is connected only to a local region in the
input volume spatially, but to the full depth (i.e. all color channels). Note, there are
multiple neurons (5 in this example) along the depth, all looking at the same region
in the input - see discussion of depth columns in text below.
Fig 3.5.5 Right: The neurons from the Neural Network chapter remain unchanged:
They still compute a dot product of their weights with the input followed by a non-
linearity, but their connectivity is now restricted to be local spatially.
Spatial arrangement. We have explained the connectivity of each neuron in the Conv
Layer to the input volume, but we haven’t yet discussed how many neurons there
are in the output volume or how they are arranged.

Three hyperparameters control the size of the output volume: the depth,
stride and zero-padding. We discuss these next:
1. First, the depth of the output volume is a hyperparameter: it corresponds to
the number of filters we would like to use, each learning to look for something
different in the input. For example, if the first Convolutional Layer takes as
input the raw image, then different neurons along the depth dimension may
activate in presence of various oriented edges, or blobs of color. We will

26
referto a set of neurons that are all looking at the same region of the input as
a depth column (some people also prefer the term fibre).
2. Second, we must specify the stride with which we slide the filter. When the
stride is 1 then we move the filters one pixel at a time. When the stride is 2 (or
uncommonly 3 or more, though this is rare in practice) then the filters jump 2
pixels at a time as we slide them around. This will produce smaller output
volumes spatially.
3. As we will soon see, sometimes it will be convenient to pad the input volume
with zeros around the border. The size of this zero-padding is a
hyperparameter. The nice feature of zero padding is that it will allow us to
control the spatial size of the output volumes (most commonly as we’ll see
soon we will use it to exactly preserve the spatial size of the input volume so
the input and output width and height are the same).

Use of zero-padding: In the example above on left, note that the input dimension
was 5 and the output dimension was equal: also 5. This worked out so because
our receptive fields were 3 and we used zero padding of 1. If there was no zero-
padding used, then the output volume would have had spatial dimension of only
3, because that it is how many neurons would have “fit” across the original input.
In general, setting zero padding to be \(P = (F - 1)/2\) when the stride is \(S = 1\)
ensures that the input volume and output volume will have the same size
spatially. It is very common to use zero-padding in this way and we will discuss
the full reasons when we talk more about ConvNet architectures.

Constraints on strides: Note again that the spatial arrangement hyperparameters


have mutual constraints. For example, when the input has size \(W = 10\), no
zero-padding is used \(P = 0\), and the filter size is \(F = 3\), then it would be
impossible to use stride \(S = 2\), since \((W - F + 2P)/S + 1 = (10 - 3 + 0) / 2 + 1 =
4.5\), i.e. not an integer, indicating that the neurons don’t “fit” neatly and
symmetrically across the input. Therefore, this setting of the hyperparameters is
considered to be invalid, and a ConvNet library could throw an exception or zero
pad the rest to make it fit, or crop the input to make it fit, or something. As we
will see in the ConvNet architectures section, sizing the ConvNets appropriately
so that all the dimensions “work out” can be a real headache, which the use of
zero-padding and some design guidelines will significantly alleviate.

Each of the 96 filters shown here is of size [11x11x3], and each one is shared by
the 55*55 neurons in one depth slice.

27
Fig 3.5.6 Example filters learned by Krizhevsky et al.

Notice that the parameter sharing assumption is relatively reasonable: If detecting a


horizontal edge is important at some location in the image, it should intuitively be
useful at some other location as well due to the translationally-invariant structure of
images. There is therefore no need to relearn to detect a horizontal edge at every
one of the 55*55 distinct locations in the Conv layer output volume.

Real-world example: The Krizhevsky et al. architecture that won the ImageNet
challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional
Layer, it used neurons with receptive field size \(F = 11\), stride \(S = 4\) and no zero
padding \(P = 0\). Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of
\(K = 96\), the Conv layer output volume had size [55x55x96]. Each of the 55*55*96
neurons in this volume was connected to a region of size [11x11x3] in the input
volume. Moreover, all 96 neurons in each depth column are connected to the same
[11x11x3] region of the input, but of course with different weights. As a fun aside, if
you read the actual paper it claims that the input images were 224x224, which is
surely incorrect because (224 - 11)/4 + 1 is quite clearly not an integer. This has
confused many people in the history of ConvNets and little is known about what
happened.

3.5.2 Pooling Layer

It is common to periodically insert a Pooling layer in-between successive Conv layers


in a ConvNet architecture. Its function is to progressively reduce the spatial size of
the representation to reduce the amount of parameters and computation in the
network, and hence to also control overfitting. The Pooling Layer operates

28
independently on every depth slice of the input and resizes it spatially, using the
MAX operation. The most common form is a pooling layer with filters of size 2x2
applied with a stride of 2 downsamples every depth slice in the input by 2 along both
width and height, discarding 75% of the activations. Every MAX operation would in
this case be taking a max over 4 numbers (little 2x2 region in some depth slice). The
depth dimension remains unchanged.

It is worth noting that there are only two commonly seen variations of the max
pooling layer found in practice: A pooling layer with \(F = 3, S = 2\) (also called
overlapping pooling), and more commonly \(F = 2, S = 2\). Pooling sizes with larger
receptive fields are too destructive.
General pooling. In addition to max pooling, the pooling units can also perform other
functions, such as average pooling or even L2-norm pooling. Average pooling was
often used historically but has recently fallen out of favor compared to the max
pooling operation, which has been shown to work better in practice.

Fig 3.5.7 Working of Pooling layer

29
Pooling layer downsamples the volume spatially, independently in each depth slice of
the input volume. Left: In this example, the input volume of size [224x224x64] is
pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice
that the volume depth is preserved. Right: The most common downsampling
operation is max, giving rise to max pooling, here shown with a stride of 2. That is,
each max is taken over 4 numbers (little 2x2 square).
Backpropagation. Recall from the backpropagation chapter that the backward pass
for a max(x, y) operation has a simple interpretation as only routing the gradient to
the input that had the highest value in the forward pass. Hence, during the forward
pass of a pooling layer it is common to keep track of the index of the max activation
(sometimes also called the switches) so that gradient routing is efficient during
backpropagation.
Getting rid of pooling. Many people dislike the pooling operation and think that we
can get away without it. For example, Striving for Simplicity: The All Convolutional
Net proposes to discard the pooling layer in favor of architecture that only consists of
repeated CONV layers. To reduce the size of the representation they suggest using
larger stride in CONV layer once in a while. Discarding pooling layers has also been
found to be important in training good generative models, such as variational
autoencoders (VAEs) or generative adversarial networks (GANs). It seems likely that
future architectures will feature very few to no pooling layers.

3.5.3 Fully-connected layer


Neurons in a fully connected layer have full connections to all activations in the
previous layer, as seen in regular Neural Networks. Their activations can hence be
computed with a matrix multiplication followed by a bias offset. See the Neural
Network section of the report for more information.
Converting FC layers to CONV layers
It is worth noting that the only difference between FC and CONV layers is that the
neurons in the CONV layer are connected only to a local region in the input, and that
many of the neurons in a CONV volume share parameters. However, the neurons in
both layers still compute dot products, so their functional form is identical.

30
Chapter 4
Requirements and Implementation
This chapter provides the various hardware and software requirements
of the project and how they’ve been implemented.

Fig 4.1 General implementation

4.1 Hardware Requirements


The convolutional network was trained on Google Colab and pre-trained model was
deployed on a remote device using Raspberry Pi 3B+ and Pi cam.
4.1.1 Raspberry Pi 3B+

Fig 4.1.1 Raspberry Pi 3 B+

31
The Raspberry Pi 3B+ is a marked improvement over the 3B model. It improves in the
following ways –
• Improved compatibility for network booting
• New support for Power over Ethernet
• Processor speed has increased from 1.2Ghz on Pi 3 to 1.4Ghz
• New dual band wireless LAN chip, 2.4Ghz and 5Ghz with embedded antenna
• Bluetooth 4.2 Low Energy
• Faster onboard Ethernet, up to 300mbps speed
Its specifications overall can be summed up as –
• SOC: Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC
• CPU: 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU
• RAM: 1GB LPDDR2 SDRAM
• WIFI: Dual-band 802.11ac wireless LAN (2.4GHz and 5GHz ) and Bluetooth 4.2
• Ethernet: Gigabit Ethernet over USB 2.0 (max 300 Mbps). Power-over-
Ethernet support (with separate PoE HAT). Improved PXE network and USB
mass-storage booting.
• Thermal management: Yes
• Video: Yes – VideoCore IV 3D. Full-size HDMI
• Audio: Yes
• USB 2.0: 4 ports
• GPIO: 40-pin
• Power: 5V/2.5A DC power input
• Operating system support: Linux and Unix
The Raspberry pi is a single computer board with credit card size, that can be used
for many tasks that your computer does, like games, word processing, spreadsheets
and also to play HD video. It was established by the Raspberry pi foundation from
the UK. It has been ready for public consumption since 2012 with the idea of making
a low-cost educational microcomputer.

32
Memory

The raspberry pi model Aboard is designed with 256MB of SDRAM and model B is
designed with 51MB.Raspberry pi is a small size PC compare with other PCs. The
normal PCs RAM memory is available in gigabytes. But in raspberry pi board, the
RAM memory is available more than 256MB or 512MB

CPU (Central Processing Unit)

The Central processing unit is the brain of the raspberry pi board and that is
responsible for carrying out the instructions of the computer through logical and
mathematical operations. The raspberry pi uses ARM11 series processor, which has
joined the ranks of the Samsung galaxy phone.

GPU (Graphics Processing Unit)

The GPU is a specialized chip in the raspberry pi board and that is designed to speed
up the operation of image calculations. This board designed with a Broadcom video
core IV and it supports OpenGL

Ethernet Port

The Ethernet port of the raspberry pi is the main gateway for communicating with
additional devices. The raspberry pi Ethernet port is used to plug your home router
to access the internet.

GPIO Pins

The general purpose input & output pins are used in the raspberry pi to associate
with the other electronic boards. These pins can accept input & output commands
based on programming raspberry pi. The raspberry pi affords digital GPIO pins. These
pins are used to connect other electronic components. For example, you can connect
it to the temperature sensor to transmit digital data.

Power Source Connector

The power source cable is a small switch, which is placed on side of the shield. The
main purpose of the power source connector is to enable an external power source.

UART

The Universal Asynchronous Receiver/ Transmitter is a serial input & output port.
That can be used to transfer the serial data in the form of text and it is useful for
converting the debugging code.

33
Display

The connection options of the raspberry pi board are two types such as HDMI and
Composite.Many LCD and HD TV monitors can be attached using an HDMI male cable
and with a low-cost adaptor. The versions of HDMI are 1.3 and 1.4 are supported and
1.4 version cable is recommended. The O/Ps of the Raspberry Pi audio and video
through HMDI, but does not support HDMI I/p. Older TVs can be connected using
composite video. When using a composite video connection, audio is available from
the 3.5mm jack socket and can be sent to your TV. To send audio to your TV, you
need a cable which adjusts from 3.5mm to double RCA connectors.

4.1.2 Pi Camera

Fig 4.1.2 Pi Camera


This Raspberry Pi Camera Module is a custom designed add-on for Raspberry Pi. It
attaches to Raspberry Pi by way of one of the two small sockets on the board upper
surface. This interface uses the dedicated CSI interface, which was designed
especially for interfacing to cameras.
The board itself is tiny, at around 25mm x 23mm x 8mm. It also weighs just over 3g,
making it perfect for mobile or other applications where size and weight are
important. It connects to Raspberry Pi by way of a short flexible ribbon cable. The
camera connects to the BCM2835 processor on the Pi via the CSI bus, a higher
bandwidth link which carries pixel data from the camera back to the processor.
Features :
• Supported Video Formats: 1080p @ 30fps, 720p @ 60fps and 640x480p 60/90
video
• Fully Compatible with Raspberry Pi 3 Model B.

34
• Small and lightweight camera module.
• Plug-n-Play camera for Raspberry Pi 3 Model B.

4.2 Software Requirements


The Convolutional neural network was trained using the ‘keras’ library written in
Python programming language using Google TensorFlow API as its backend on
Google Colab.
4.2.1 Google Colab
Google Colab is a free cloud service and now it supports free GPU. You can:
• Improve your Python programming language coding skills.
• Develop deep learning applications using popular libraries such as Keras,
TensorFlow, PyTorch, and OpenCV.
The most important feature that distinguishes Colab from other free cloud services
is: Colab provides GPU and is totally free.
The main existing deep learning frameworks like TensorFlow, Keras and PyTorch are
maturing and offer a lot of functionality to streamline the deep learning process.
There are also other great tool sets emerging for the deep learning practitioner. One
of these is the Google Colaboratory environment. This environment, based on
Python Jupyter notebooks, gives the user free access to Tesla K80 GPUs. If your local
machine lacks a GPU, there is now no need to hire out GPU time on Amazon AWS, at
least for prototyping smaller learning tasks. This opens up the ability of anybody to
experiment with deep learning beyond simple datasets like MNIST. Google has also
just recently opened up the free use of TPUs (Tensor Processing Units) within the
environment.
This service can be effectively exploited to accelerate not only deep learning but also
other classes of GPU-centric applications. For instance, it is faster to train a CNN on
Colaboratory's accelerated runtime than using 20 physical cores of a Linux server.
The performance of the GPU made available by Colaboratory may be enough for
several profiles of researchers and students. However, these free-of-charge
hardware resources are far from enough to solve demanding real-world problems
and are not scalable.
The specifications of the GPU computing power offered by Google Colab are –
• GPU: 1xTesla K80 , having 2496 CUDA cores, compute 3.7,
12GB(11.439GB Usable) GDDR5 VRAM

35
• CPU: 1xsingle core hyper threaded i.e(1 core, 2 threads) Xeon Processors
@2.3Ghz (No Turbo Boost) , 45MB Cache

• RAM: ~12.6 GB Available

• Disk: ~320 GB Available

• For every 12hrs or so Disk, RAM, VRAM, CPU cache etc data that is on our
alloted virtual machine will get erased

4.2.2 TensorFlow

Machine learning is a complex discipline. But implementing machine learning


models is far less daunting and difficult than it used to be, thanks to machine
learning frameworks—such as Google’s TensorFlow—that ease the process of
acquiring data, training models, serving predictions, and refining future results.

Created by the Google Brain team, TensorFlow is an open source library for
numerical computation and large-scale machine learning. TensorFlow bundles
together a slew of machine learning and deep learning (aka neural networking)
models and algorithms and makes them useful by way of a common metaphor.
It uses Python to provide a convenient front-end API for building applications
with the framework, while executing those applications in high-performance
C++.

TensorFlow can train and run deep neural networks for handwritten digit
classification, image recognition, word embeddings, recurrent neural networks,
sequence-to-sequence models for machine translation, natural language
processing, and PDE (partial differential equation) based simulations. Best of all,
TensorFlow supports production prediction at scale, with the same models used
for training.

Fig 4.2.1 TensorFlow Architecture

36
• Client:
• Defines the computation as a dataflow graph.
• Initiates graph execution using a session.

• Distributed Master
• Prunes a specific subgraph from the graph, as defined by the
arguments to Session.run().
• Partitions the subgraph into multiple pieces that run in different
processes and devices.
• Distributes the graph pieces to worker services.
• Initiates graph piece execution by worker services.

• Worker Services (one for each task)


• Schedule the execution of graph operations using kernel
implementations appropriate to the available hardware (CPUs,
GPUs, etc).
• Send and receive operation results to and from other worker
services.

• Kernel Implementations
• Perform the computation for individual graph operations.

4.2.3 Keras
Keras is a high-level library that’s built on top of Theano or TensorFlow. It
provides a scikit-learn type API (written in Python) for building Neural Networks.
Developers can use Keras to quickly build neural networks without worrying
about the mathematical aspects of tensor algebra, numerical techniques, and
optimisation methods.
The key idea behind the development of Keras is to facilitate experimentations
by fast prototyping. The ability to go from an idea to result with the least
possible delay is a key to good research.
This offers a huge advantage for scientists and beginner developers alike
because they can dive right into Deep Learning without getting their hands dirty
with low-level computations. The rise in the demand for Deep Learning has
resulted in the rise in demand of people skilled in Deep Learning.
Every organisation is trying to incorporate Deep Learning in one way or another,
and Keras offers a very easy to use as well as intuitive enough to understand API
which essentially helps you test and build Deep Learning applications with least
considerable efforts.

37
Features of Keras:
• Keras is a high-level interface and uses Theano or Tensorflow for its
backend.
• It runs smoothly on both CPU and GPU.
• Keras supports almost all the models of a neural network – fully
connected, convolutional, pooling, recurrent, embedding, etc.
Furthermore, these models can be combined to build more complex
models.
• Keras, being modular in nature, is incredibly expressive, flexible, and apt
for innovative research.
• Keras is a completely Python-based framework, which makes it easy to
debug and explore.

4.2.4 Thonny: Python IDE for Raspberry Pi

Thonny is a new IDE (integrated development environment) bundled with


the latest version of the Raspbian with PIXEL operating system. Using Thonny,
it’s now much easier to learn to code. Thonny comes with Python 3.6 built in, so
you don’t need to install anything.

When you start Thonny, you’ll see a new script editor and a shell. As with Python
2/3 IDLE, you enter a program in the script editor and run it in the shell. You can
then use the shell to interact directly with the program; accessing variables,
objects, and other program features.

Thonny has a range of additional features that are perfect for learning
programming. One of the best features is a powerful, but easy-to-use, debug
mode. Instead of running your program, it steps through the code line by line.
You can see the variables and objects being created, and values being passed
into functions or assessed by comparators.

4.2.5 RealVNC: Remote connection for Raspberry Pi


For a desktop-to-desktop connection RealVNC runs on Windows, on Mac OS X,
and on many Unix-like operating systems. A list of supported platforms can be
found on the website. A RealVNC client also runs on the Java platform and on
the Apple iPhone, iPod touch and iPad and Google Android devices. A Windows-
only client, VNC Viewer Plus is available, designed to interface to the embedded
server on Intel AMT chipsets found on Intel vPro motherboards.

The VNC software is already being used by many students and makers around
the world since it provides a simple and efficient way of controlling their
Raspberry Pi from an existing computer, tablet or mobile phone. There are a

38
very wide range of Raspberry Pi educational projects that will benefit from the
preinstalled VNC software, allowing students to get up and running within
minutes.

4.3 Project Design

4.3.1 Hardware implementation

The code was first implemented on local machines. The model was trained on an
online server runtime hosted by Google Colab to take advantage of the GPU
acceleration provided by it. See Software Implementation for more details.

The operating System installed on Raspberry Pi 3B+ was Raspbian.

Raspbian is an operating system based on Debian optimized for the Raspberry Pi


hardware. An operating system is the set of basic programs and utilities that
make your Raspberry Pi run. However, Raspbian provides more than a pure OS:
it comes with over 35,000 packages, pre-compiled software bundled in a nice
format for easy installation on your Raspberry Pi.

The initial build of over 35,000 Raspbian packages, optimized for best
performance on the Raspberry Pi, was completed in June of 2012. However,
Raspbian is still under active development with an emphasis on improving the
stability and performance of as many Debian packages as possible.

Steps to download Raspbian:

Step 1: Download the Required Software and Files

39
Step 2: Get the SD Card and the Card Reader

Step 3: Check the Drive in Which the SD Card Is Mounted

Go to my computer or My PC and find the drive name where the SD card


is mounted.

Step 4: Format the SD Card

Open SD Card Formatter and select the drive you noticed in the previous step.

Click on format and don't alter any other options.

When formatting is completed, click on OK.

Step 5: Write the OS on the SD Card

40
Step 6: Eject the SD Card

Now your OS in installed on your Raspberry Pi.

Following is the Raspberry Pi 3 B+ setup used in conjunction with PiCamera –

Fig 3.4 Practical implementation of Pi Cam on Raspberry Pi 3B+

Steps to install Raspberry Pi cam module:

1. Open up your Raspberry Pi Camera module. Be aware that the camera can be
damaged by static electricity. Before removing the camera from its grey anti-static
bag, make sure you have discharged yourself by touching an earthed object (e.g. a
radiator or PC Chassis).

41
2. Install the Raspberry Pi Camera module by inserting the cable into the Raspberry
Pi. The cable slots into the connector situated between the Ethernet and HDMI ports,
with the silver connectors facing the HDMI port.

3. Boot up your Raspberry Pi.

4.From the prompt, run "sudoraspi-config". If the "camera" option is not listed, you
will need to run a few commands to update your Raspberry Pi. Run "sudo apt-get
update" and "sudo apt-get upgrade"

42
5. Run "sudoraspi-config" again - you should now see the "camera" option.

6. Navigate to the "camera" option, and enable it. Select “Finish” and reboot your
Raspberry Pi.

The following are the steps to connect Raspberry Pi 3B+ remotely to a laptop or
PC –

We can install the VNC server software using the SSH connection that we
established earlier.

Enter the following command into your SSH terminal:

1. sudo apt-get update

2. sudo apt-get install tightvncserver

43
You will be prompted to confirm installation by typing “Y' and finally when
installation is complete, you should see the following:

We now need to run the VNC Server, so enter the following command into your
SSH window:

44
1. vncserver :1

You will be prompted to enter and confirm a password. It would make sense to
use “raspberry” for this, but passwords are limited to 8 characters, so I use
“raspberr”. Note that this is the password that you will need to use to connect
to the Raspberry Pi remotely.

You will also be asked if you want to create a separate “read-only” password –
say no.

From now on, the only command that you need to type within your SSH to start
the VNC server will be:

1. vncserver :1

The VNC server is now running and so we can attempt to connect to it, but first
we must switch to the computer from which we want to control the Pi and setup
a VNC client to connecting to the pi.

When you first run VNCViewer, you will see the following:

Enter the IP address of your Raspberry Pi, append :1 (to indicate the port) and
click on “Connect”. You will then get a warning message. Just click 'Continue'.

45
The following window will then popup for you to enter your password
(“raspberr”).

Finally, the VNC window itself should appear. You will be able to use the mouse
and do everything as if you were using the Pi's keyboard mouse and monitor,
except through your other computer.

As with SSH, since this is working over your network, your Pi could be situated
anywhere, as long as it is connected to your network.

46
4.3.2 Software Implementation

The general flowchart followed by the convolutional neural net development


process is :

Fig 4.3.1 Flowchart for software implementation

47
The software implementation was done in 3 major parts :

1) Implementing and training a general case model for detecting diseased or


healthy plant leaf using convolutional neural networks with the keras API for
Python using TensorFlow as backend on Google Colab.

2) Implementing and training a specific case model for detecting and classifying type
of disease (one of possible 8) in a tomato leaf using convolutional neural networks
with the keras API for Python using TensorFlow as backend on Google Colab.
3) Deploying both the trained models on a mobile hardware device such as
Raspberry Pi 3 capable of basic processing to deploy these devices remotely for
automated wide scale classification.

4.3.2.1 General Model


The general model implemented in Python using keras API on Google Colab is as
follows –

Fig 4.4.1 Skeletal structure of general model implemented in keras

The general model was trained on a dataset of 15000 images with an 80-20 train
test split.

Hyperparameters and number of epochs were tuned to give best possible result
in minimum training time.

48
The model weights were saved and selected for the epoch with minimum
error(val_loss) to avoid overfitting due to too many iterations.

Fig 4.4.2 Loss and accuracy per epoch for general model

4.3.2.2 Specific Model

The specific model implemented in Python using keras API on Google Colab is as
follows –

Fig 4.4.3 Skeletal structure of specific model implemented in keras

The specific model was trained on a dataset of 5000 images with an 80-20 train
test split for every class.

Hyperparameters and number of epochs were tuned to give best possible result
in minimum training time.

49
Fig 4.4.4 Loss and accuracy per epoch for specific model

The model weights were saved and selected for the epoch with minimum
error(val_loss) to avoid overfitting due to too many iterations.

4.3.2.3 Remote Deployment


Both the pre-trained models on Google Colab were saved and deployed remotely on
Raspberry Pi 3 B+ module.

Fig 4.4.5 Loading both pre-trained models on Raspberry Pi 3 B+


Input is taken either live from Pi Camera module attached to the Raspberry Pi 3 B+ or
user can provide his/her own image in provided directory to assess if the leaf is
diseased or not.
Further, if the leaf is diseased, user is given an option for model to classify the
disease if the leaf in the image provided is a tomato leaf.

50
Fig 4.4.6 Code snippet of exploring the user options
If the leaf is diseased and is a tomato leaf, depending on probabilities provided by
the outputs of the specific model, it gives us the most probable disease.

Fig 4.4.7 Different possible tomato leaf diseases that can be classified

51
Chapter 5
Results
This chapter present the analysis and results obtained during the
project

5.1 General model


5.1.1 Summary

Fig. 5.1.1: The General model summary

Layer 1: After the implementation of the first layer, we see that due to
convolution of the input image, 2 data row and arrows are lost from the
border of the images due to lack of zero padding. But as these parts carry no
information, there is no adverse effect on the neural network accuracy. And
number of parameters for this layer is 896.

Layer 2: Due to max-pooling mask of size 2x2, the image size is reduced to
half, i.e. 31x31.

52
Layer 3 and 4: The process followed in the steps 1 and 2 is repeated and an
input size of 14x14 is obtained for the neural network input.

Layer 5: The 2D input image of 14x14x32 is flattened into 1D as a vector of


dimension 6272.

Layer 6: This is the first hidden layer of the Artificial Neural Network(ANN) and
it has as 802944 parameters for mapping from input to hidden layer .

Layer 7: It is the output layer, with just 1 output and 128 parameters for
mapping from hidden layer to output layer.

5.1.2 Accuracy

Fig. 5.1.2: Accuracy of the General model.

The weights generated after the fifth and the final epoch are saved for further
usage and deployment on the Raspberry Pi. The accuracy after this epoch is
94.2% with validation accuracy of 93.46%. This epoch was selected because
adding more epochs does not improve accuracy but makes the model less
generalized and uniform.

Fig. 5.1.3 Accuracy vs epoch Fig. 5.1.4 Loss vs epoch of


of general model general model

53
5.2 Specific model

5.2.2 Summary

Fig. 5.2.1: The General model summary

The general model and the specific model yield the same result upto layer 6,
but after that dropout layers are used in the specific model to avoid
overfitting. The risk of overfitting is higher in this model due to the fact that
the dataset available for training is far less than that of the general model.

Layer 7 and 9: These are the dropout layers, used to avoid overfitting, as
mentioned before.

Layer 8: it is another hidden layer used in the specific model to connect the
two dropout layers.

Layer 10: It is the output layer, with only one node.

54
5.2.3 Accuracy

Fig. 5.2.2: Accuracy of the Specific model.

The weights generated after the 17th epoch are saved for further usage and
deployment on the Raspberry Pi. The accuracy after this epoch is 89.45% with
validation accuracy of 90.26%. This epoch was selected because adding more
epochs does not improve accuracy but makes the model less generalized and
uniform.

Fig. 5.2.3 Accuracy vs epoch Fig. 5.2.4 Loss vs epoch of


of specific model specific model

5.3 Leaf Classification testing


The output of the complete model is the probability of the plant leaf being
diseased or healthy. This has been demonstrated in following tests.

Case 1: Healthy plant leaf

Fig. 5.3.1: Output of the model on the io device

55
Fig. 5.3.2: Image captured by Picam

Case 2: Diseased plant leaf

Fig. 5.3.3: Image captured by Picam

56
Fig. 5.3.4: Output of the model on the io device

Case 3: Tomato leaf


a) Healthy leaf

Fig. 5.3.5: Image captured by Picam

Fig. 5.3.8: Output of the model on the io device

57
b) Alternaria Solani disease

Fig. 5.3.7: Image uploaded in directory

Fig. 5.3.8: Output of the model on the io device

58
c) Yellow curl Disease

Fig. 5.3.9: Image uploaded in directory

Fig. 5.3.10: Output of the model on the io device

59
Type of leaf Chances Chances of If tomato If tomato Final
(Tomato or of it being it being leaf, leaf, Classification
Non diseased non Chances of it Type of
Tomato) (%) diseased being disease
(%) diseased (%)
Non - 87.033 - - Non
Tomato Diseased
Non 92.613 - - - Diseased
Tomato
Tomato - 97.101 Unlikely - Non
Diseased
Tomato 99.999 - 99.563 Alternaria Diseased
Solani
Tomato 98.795 - 99.988 Yellow Curl Diseased

Table 5.1: Prediction and classification of different types of leaves.

60
Chapter 6
Conclusion and Future Scope
This chapter presents the conclusion made so far in the project and
further work that is expected from the team.

6.1 CONCLUSION
This model correctly classifies the state of health of the plant, by using the
images of its leaves and predicting their health. Further, it also classifies what disease
is present and what is the probability of the same. The progress made thus far would
enable automatic detection of the plant health. Development of automatic detection
system using advanced computer technology such as convolutional neural network
helps the farmers in the identification of diseases at an early stage and provides an
useful information for its control.
In order to employ proposed system we first have to train it with a set of
images of disorders. Applying this model to any other crop disorder requires only
spatial care to be taken in order to acquire a sufficient set of images for training
purpose as representative of these disorders. Due to integration of this proposed
system diagnosis accuracy will increase. Proposed system focuses on specific
diseases identification for tomato plant, it can be extended in order to include more
diseases. Extension of system in such a way that it will be capable to detect and
identify abnormalities on the other parts of plants also e.g. fruit, stem, & root.
Our model is capable of identifying the disease based on the features that we have
recognised like yellow spots, wilting, distortion, etc. These symptoms can also help us
to classify the cause of the disease. This would take the farmer one step closer to
finding the cure for the disease and hence fixing it. It also tells us the severity of the
disease and whether it is an issue that is perennial for all the crops in that field or is it
pertaining to one particular plant. For example, the figure 6.1 explains the
classification of diseases into bacterial, fungal or viral.

6.2 FUTURE SCOPE


We can include a few features like temperature recording, moisture sensing
and checking the soil fertility that will help the farmer figure out the optimum
conditions for the plant to grow. Once the farmer is able to see these parameters for
both, the diseased and the non-diseased plants, he can understand what conditions
are best for the plant to thrive and what conditions are being delirious to the plant’s
health.

61
Fig 6.1: Classifying the features of diseases based on the cause of the disease

62
Chapter 7
Bibliography

• Soybean Plant Disease Identification Using Convolutional Neural Network by


Serawork Wallelign Jimma Institute of Technology, Ethiopia, Mihai Polceanu
LAB-STICC, ENIB, France, Cedric Buche ´ LAB-STICC, ENIB, France

• Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image


Classification by Srdjan Sladojevic, Marko Arsenovic, Andras Anderla,
Dubravko Culibrk and Darko Stefanovic

• Deep learning models for plant disease detection and diagnosis by


Konstantinos P.Ferentinos

• Basic Study of Automated Diagnosis of Viral Plant Diseases Using


Convolutional Neural Networks by Yusuke Kawasaki, Hiroyuki Uga,
Satoshi Kagiwada and Hitoshi Iyatomi

• Plant Leaf Disease Detection using Deep Learning and Convolutional Neural
Network by Anandhakrishnan MG Joel Hanson , Annette Joy and Jerin Francis


A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases
and Pests Recognition by Alvaro Fuentes, Sook Yoon, Sang Cheol Kim and
Dong Sun Park

• The use of plant models in deep learning: an application to leaf counting in


rosette plants by Jordan Ubbens, Mikolaj Cieslak,
Przemyslaw Prusinkiewicz and Ian Stavness

• Basic Investigation on a Robust and Practical Plant Diagnostic by Erika


Fujita, Yusuke Kawasaki, Hiroyuki Uga

• https://anaconda.org/

• http://nodemcu.com/index_en.html

• https://www.raspberrypi.org/documentation/setup/
63
• http://neuralnetworksanddeeplearning.com

• https://deeplearning4j.org/

• https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.ht
ml#Contents

• https://www.crowdai.org/

• https://www.cyberciti.biz/hardware/raspberry-pi-3-model-b-released-
specs-pricing/

• https://towardsdatascience.com/getting-started-with-google-colab-
f2fff97f594c

• https://www.tensorflow.org/guide/extend/architecture

• https://www.raspberrypi.org/documentation/hardware/camera/

• https://keras.io/

• http://picamera.readthedocs.io/en/release-1.9/index.html

• http://scikit-learn.org/

• https://docs.python.org/2/howto/sockets.html

• https://www.raspberrypi.org/magpi/thonny/

64

You might also like