Image Recognition Using CIFAR 10
Image Recognition Using CIFAR 10
Mr. A. VENUGOPAL
(Assistant Professor)
CERTIFICATE
This is to certify that the project work entitled “IMAGE RECOGNITION USING CIFAR 10”
is submitted by Mr. Diddiga Harivamshi (16RA1A0512), Mr. Voladri Raju (15RA1A0537),
Mr. Uppal Laxman (16RA1A0516) bonafied students of Kommuri Pratap Reddy Institute of
Technology in a partial fulfillment of the requirement for the award of Bachelor of Technology
in Computer Science and Engineering of the Jawaharlal Nehru Technological University,
Hyderabad during the year 2016-20.
External Examiner
DECLARATION
We hereby declare that this project work entitled “IMAGE
RECOGNITION USING CIFAR-10” in partial fulfilment of requirements for the
award of degree of Computer Science and Engineering is a bonafide work carried
out by us during the academic year 2019-20.
We further declare that this project is a result of our effort and has not
been submitted for the award of any degree by us to any institute.
By
DIDDIGA HARIVAMSHI (16RAA1A0512)
VOLADRI RAJU (15RA1A0537)
UPPAL LAXMAN (16RA1A0516)
ACKNOWLEDGEMENT
We will be very much grateful to almighty our Parents who have made us
capable of carrying out our job.
We express our deep sense of gratitude and thanks to Internal guide, Mr. A.
Venugopal, Assistant Professor his support during the project report.
We are also very thankful to our Management, Staff Members and all Our
Friends for their valuable suggestions and timely guidance without which we would
not have been completed it.
By
Diddiga Harivamshi(16RA1A0512)
Voladri Raju(15RA1A0537)
Uppal Laxman(16RA1A0516)
KOMMURI PRATAP REDDY INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
PEO’s Statement
The graduates of Computer Science and Engineering will have successful
PEO1
career in technology.
The graduates of the program will have solid technical and professional
PEO2
foundation to continue higher studies.
The graduate of the program will have skills to develop products, offer
PEO3
services and innovation.
The graduates of the program will have fundamental awareness of industry
PEO4
process, tools and technologies.
KOMMURI PRATAP REDDY INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Program Outcomes
Table of Contents
Screenshots ------------------------------------------- 40 – 42
Conclusion ------------------------------------------- 43
1. Introduction
Image Processing and Machine Learning, the two trending technologies world. Did you
know that we are the most documented generation in history of humanity? Every minute
a whopping 1.78 million GB data gets produced online. That’s a lot of data and a big
chunk that of data is images and videos. This is where automated image processing and
machine learning comes in.
Innovative minds are exploring those with skills in Machine learning in general and
image processing in particular. You would be able to pass an input image to our program
and our program should be able to count the number of peoples appearing in that image.
Additionally, we would also be creating a bounding box around each of the detected
person.
An image recognition algorithm (a.k.a an image classifier) takes an image (or a patch of
an image) as input and outputs what the image contains. In other words, the output is a
class label (e.g. “cat”, “dog”, “table” etc.). How does an image recognition algorithm
know the contents of an image? Well, you have to train the algorithm to learn the
differences between different classes. If you want to find cats in images, you need to train
an image recognition algorithm with thousands of images of cats and thousands of images
of backgrounds that do not contain cats. Needless to say, this algorithm can only
understand objects / classes it has learned.
Technologies
All we need would be a working knowledge of Python, Pycharm Application and CIFAR-
10 dataset.
Python: Although there are multiple tutorials available online, personally, I found
dataquest.io to be a wonderful python learning platform, for beginners and experienced
alike.
Pycharm: Pycharm is an integrated development environment used in computer
programming, specifically for the Python language. It is developed by the Czech company
JetBrains.
CIFAR-10 Dataset: The CIFAR-10 dataset is a collection of images that are commonly
used to train machine learning and computer vision algorithms. It is one of the most widely
used datasets for machine learning research. The CIFAR-10 dataset contains 60,000
32x32 color images in 10 different classes.
Image Processing
Image processing is a method to perform some operations on an image, in order to get an
enhanced image or to extract some useful information from it. It is a type of signal
processing in which input is an image and output may be image or characteristics/features
associated with that image. Nowadays, image processing is among rapidly growing
technologies. It forms core research area within engineering and computer science
disciplines too.
Image processing basically includes the following three steps:
Importing the image via image acquisition tools;
Analyzing and manipulating the image;
Output in which result can be altered image or report that is based on image
analysis.
There are two types of methods used for image processing namely, analogue and digital
image processing. Analogue image processing can be used for the hard copies like
printouts and photographs. Image analysts use various fundamentals of interpretation
while using these visual techniques. Digital image processing techniques help in
manipulation of the digital images by using computers. The three general phases that all
types of data have to undergo while using digital technique are pre-processing,
enhancement, and display, information extraction.
2. Literature of Survey
Literature of survey on image detection mainly can also be proposed on the field of
doctoring manipulations. Doctoring typically involves multiple steps, which typically
involve a sequence of elementary image-processing operations, such as scaling, rotation,
contrast shift, smoothing, etc. The methodology used is based on the three categories of
statistical features including binary similarity, image quality and wavelet statistics. The
three categories of forensic features are as follows:
1. Image Quality Measures: These focuses on the difference between a doctored image
and its original version. The original not being available, it is emulated via the blurred
version of the test image.
2. Higher Order Wavelet Statistics: These are extracted from the multiscale decomposition
of the image.
3. Binary Similarity Measure: These measures capture the correlation and texture
properties between and within the low significance bit planes, which are more likely to be
affected by manipulations.
To deal with the detection of doctoring effects, firstly, single tools to detect the basic
image-processing operations are developed. Then, these individual “weak” detectors
assembled together to determine the presence of doctoring in an expert fusion scheme.
M. C. Stamm and K. J. R. Liu, [5] proposed different methods not only for the detection
of global and local contrast enhancement but also for identifying the use of histogram
equalization and for the detection of the global addition of noise to a previously JPEG-
compressed image. The methodologies used are as follows:
i). Detecting globally applied contrast enhancement in image
Contrast enhancement operations are viewed as nonlinear pixel mapping which introduce
artifacts into an image histogram. Nonlinear mappings are separated into regions where
the mapping is locally contractive. The contract mapping maps multiple unique input pixel
values to the same output pixel value. Result in the addition of sudden peak to an image
histogram.
ii). Detecting locally applied contrast enhancement in image
Contrast enhancement operation may be locally applied to disguise visual clues of image
tampering. Localized detection of these operations can be used as evidence of cut-and-
paste type forgery. The forensic technique is extended into a method to detect such type
of cut-and- paste forgery.
iii). Detecting Histogram equalization in image
Just like any other contrast enhancement operation, histogram equalization operation
introduces sudden peaks and gaps into an image histogram. The techniques are extended
into method for detecting histogram equalization in image.
iv). Detecting Noise in image
Additive noise may be globally applied to an image not only to cover visual evidence of
forgery, but also in an attempt to destroy forensically significant indicators of other
tampering operations. Though the detection of these types of operations may not
necessarily pertain to malicious tampering, they certainly throw in doubt the authenticity
of the image and its content.
First for detecting the contrast enhancement based manipulation involved in JPEG
compressed images and the second one is used for detecting composite image. The
methodologies are:
i. Global Contrast Enhancement Detection Algorithm proposed in this paper, detects the
contrast enhancement not only in uncompressed or high quality JPEG compressed images
but also in middle/low quality ones. The main identifying feature of gray level histogram
used is zero-height gap bin. Fig. 2 shows the definition of zero-height gap bin.
ii. Identify Source-Enhanced Composite Images A novel algorithm is proposed to identify
the source-enhanced composite image created by enforcing contrast adjustment on either
single or both source regions.Fig. 3 shows the both-source enhanced composite forged
image. The two source images used for creating cut-and-paste type of forged images may
have different color temperature or luminance contrast. So, in order to make the forged
image more real, contrast enhancement is performed on either one or both the regions. In
this paper, a new method was proposed to identify not only single source enhanced but
also both source enhanced cut-and paste type of forged images.
This architecture offered an alternative through a graphical user interface that combines
MATLAB, Simulink and XSG and explored important aspects concerned to hardware
implementation. Performance of this architecture implemented in SPARTAN-3E Starter
kit (XC3S500E-FG320) exceeds those of similar or greater resources architectures. The
proposed architecture reduced the resources available on target device by 50%. The result
of this study showed that MATLAB is among the famous software package and its result
is expected to be beneficial and able to assist users on effective image processing and
analysis in a newly developed software package:
Ching Yee Yong made a survey of image processing algorithms that were developed for
detection of masses and segmentation techniques. The result of this study showed that
MATLAB is among the famous software package; more than 60% of the respondents
prefer to use MATLAB for their image processing work. The Microsoft Photo Editor is
the second popular software for images editing process. More than 30% of respondents
are very likely to use a ready-to-use package for processing image rather than given source
code. The result is expected to be beneficial and is able to assist users on effective image
processing and analysis in a newly developed software package. A preliminary image
processing tool prototype that was developed is also being presented in the paper.
Deepak Kumar Garget discussed a method that involves processing of ECG paper records
by an efficient and iterative set of digital image processing techniques for the conversion
of ECG paper image data to time series digitized signal form, resulting in convenient
storage and retrieval of ECG information. The method involved are calculation of Heart
rate, QRS Width and Stability (variation in R-R peaks) from the extracted signal.
Comparison of the above calculated parameters with the manually calculated parameters
showed an accuracy of 96.4%, thus proving the effectiveness of the process. The author
also proposed the development of fuzzy based ECG diagnosis system that assists the
doctors in diagnosis.
ShiruiGao emphasized the MATLAB based medical image processing tools. It includes
the theoretical background and examples. Through MATLAB this paper made the
introduction of the post-imaging quality in medical technology and medical imaging. It
also introduces the medical image processing technology and describes the image
3. SYSTEM ANALYSIS
This part of our project documentation focuses on existing system, proposed system,
advantages of proposed system, modules in the project along with their description and
the system requirements.
Existing System:
The image recognition applications in industry include fingerprint or retina recognition,
processing records of security or traffic cameras. The applications in medicine include
ultrasound imaging, magnetic resonance. Stereography is the art of using two almost
identical photographs to create a three-dimensional (3D) image. The viewer requires
special glasses or a stereoscope to see the 3D image. With modern technology, it has
applications in motion picture and television industry. Stereography is a complicated
process. Modern stereography uses specialized computer software and camera hardware.
Volumetric displays do not require special goggles. The three-dimensional graphics
created by this type of display can be viewed from any angle. Each viewer can observe
the picture from a different perspective. To create volumetric graphics, a technique called
as swept surface volumetric display, which is based on persistence of vision is adopted.
Here use of fast-moving lit surfaces creates the illusion of a solid shape. To display
volumetric 3D images there is another option which is called as static volume. No moving
parts are used in the visible area of the display. However, mirrors and lenses are used to
direct a beam of laser light. Very fast pulses of laser light are directed at different points
in the air. Persistence of vision gives the illusion of a single solid object. This method is
useful for medical diagnosis. A 3D display can show a realistic image of a heart
Face detection is a computer technology that determines the locations and sizes of human
faces in arbitrary (digital) images. It detects facial features and ignores anything else, such
as buildings, trees and bodies. Early face-detection algorithms focused on the detection of
frontal human faces, whereas newer algorithms attempt to solve the more general and
difficult problem of multi-view face detection. It is also used in video surveillance. Some
recent digital cameras use face detection for autofocus.
Proposed System:
As of recent, self-driving vehicles have been gaining traction despite their development
since the 1920s. These automated vehicles are able to navigate using a form of visual
image recognition developed with deep neural networks. Image recognition is important
so that these systems can understand what to avoid, such as trees, animals, and other
vehicles, and where the road is. With the continuous development of these systems, we
expect image classification to be better, faster, and more accurate.
To further understand and explore the concept of image recognition, our team has taken a
basic approach to classify a set of images. Much simpler to visual image recognition, our
team has selected the CIFAR-10 dataset that has pictures consisting of only one class, that
is, each image will only belong to one category. There are a total of 10 categories. Further
description of the dataset will be in the “Data Collection & Description” section below.
We have found other approaches to image classification of this particular dataset, but we
wanted to explore what we learned from the course. Specifically, we used logistic
regression, random forest, and convolutional neural network models. Having only been
briefly introduced to convolutional neural networks, we found that this was a great chance
to delve into this model further through application. For analysis of the models, we
observed various outputs, such as the area under receiver operating characteristic curve
(AUROC) and the confusion matrix. From selection of the dataset, our team understood
the implications that came with it, primarily, the large size of the dataset. This was
something our team had to consider when training the models. This will be in the “Data
Collection & Description” section below.
Modules Description
A module is a Python object with arbitrarily named attributes that you can bind
and reference. Simply, a module is a file consisting of Python code. A module can define
functions, classes and variables. A module can also include runnable code.
TensorFlow/Keras:
TensorFlow is an open source library created for Python by the Google
Brain team. TensorFlow compiles many different algorithms and models
together, enabling the user to implement deep neural networks for use in
tasks like image recognition/classification and Natural language
processing. TensorFlow is a powerful framework that functions by
implementing a series of processing nodes, each node representing a
mathematical operation, with the entire series of nodes being called a "graph".
In terms of Keras, it is a high-level API (application programming interface) that can use
TensorFlow's functions underneath (as well as other ML libraries like Theano). Keras was
designed with user-friendliness and modularity as its guiding principles. In practical
terms, Keras makes implementing the many powerful but often complex functions of
TensorFlow as simple as possible, and it's configured to work with Python without any
major modifications or configuration. The Packages preferred under TensorFlow module
are:
Sequential
The sequential API allows you to create models layer-by-layer for most problems. It is
limited in that it does not allow you to create models that share layers or have multiple
inputs or outputs
Dropout
The term "dropout" is used for a technique which drops out some nodes of the network.
Dropping out can be seen as temporarily deactivating or ignoring neurons of the network.
This technique is applied in the training phase to reduce overfitting effects.
Dense
A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron
providing one output to the next layer. It's the most basic layer in neural networks. A
Dense (10) has ten neurons
Flatten
This Python package provide a function flatten () for flattening dict-like objects. It also
provides some key joining methods (reducer), and you can choose the reducer you want
or even implement your own reducer.
Conv2D
Keras Conv2D is a 2D Convolution Layer, this layer creates a convolution kernel that is
wind with layers input which helps produce a tensor of outputs.
MaxPooling2D
Max pooling is a sample-based discretization process. The objective is to down-sample
an input representation (image, hidden-layer output matrix, etc.), reducing its
dimensionality and allowing for assumptions to be made about features contained in the
sub-regions binned.
SGD
Accelerating Deep Learning Using Distributed SGD — An Overview. Stochastic Gradient
Descent (SGD) and its multiple variants (such as RMSProp or Adam) are the most popular
training algorithms for Deep Learning. These algorithms are inherently serial due to their
iterative nature.
The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6,000
images per class. There are 50,000 training images and 10,000 test images. We also
considered the CIFAR-100 dataset that has an identical number of 60000 total images
with 100 classes, but this reduced the number of images per class down to 600 from 6,000.
We felt that this fractured the dataset too much to create an effective model.
The chosen CIFAR-10 dataset is divided into five training batches and one test batch, each
with 10,000 images. The test batch contains exactly 1,000 randomly-selected images from
each class. The training batches contain the remaining images in random order, but some
training batches may contain more images from one class than another. Between them,
the training batches contain exactly 5,000 images from each class.
The classes are completely mutually exclusive. There is no overlap between automobiles
and trucks. “Automobile” includes sedans, SUVs, things of that sort. “Truck” includes
only big trucks. Neither includes pickup trucks.
4. SYSTEM REQUIREMENTS
Hardware Requirements
1. Processor above Pentium
2. Operating system: windows 7 or higher version
3. RAM: more than 2 GB
4. GPU: min 512 MB
Software Requirements:
1. Python 3.7 version or above.
2. Windows 7,8 or 10 versions.
3. Pycharm Application.
4. I3 and above generation computers.
5. Minimum of 2GB RAM to avoid interruptions.
6. GPU is recommended.
Install pip on your system in python, Download get-pip.py installer script and then open
the folder in which it has downloaded and run the following command in the command
prompt to install pip successfully: python get-pip.py We need to install all these python
libraries in the system terminal.
Also install TensorFlow which is very important package for making machine learning
models. Install it using the command pip install tensorflow in cmd.
You also need keras package for python which is installed using the command pip install
keras.
5. SYSTEM STUDY
This System study includes the feasibility study where we are going to discuss about
economical feasibility, technical feasibility and social feasibility.
Feasibility Study:
The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to ensure
that the proposed system is not a burden to the company.
For feasibility analysis, some understanding of the major requirements for the system is
essential. Three key considerations involved in the feasibility analysis are:
Economical Feasibility
Technical Feasibility
Social Feasibility
1. Economical Feasibility
This study is carried out to check the economic impact that the
system will have on the organization. The amount of fund that the company can pour into
the research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was achieved
because most of the technologies used are freely available. Only the customized products
had to be purchased.
2. Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement; as only minimal or null changes are required for
implementing this system.
3. Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of acceptance
by the users solely depends on the methods that are employed to educate the user about
the system and to make him familiar with it. His level of confidence must be raised so that
he is also able to make some constructive criticism, which is welcomed, as he is the final
user of the system.
6. SYSTEM DESIGN
Feature Extraction
In order to carry out image recognition/classification, the neural network must carry out
feature extraction. Features are the elements of the data that you care about which will
be fed through the network. In the specific case of image recognition, the features are the
groups of pixels, like edges and points, of an object that the network will analyze for
patterns.
Feature recognition (or feature extraction) is the process of pulling the relevant features
out from an input image so that these features can be analyzed. Many images contain
annotations or metadata about the image that helps the network find the relevant
features.
Classification
Image Recognition refers to the task of inputting an image into a neural network and
having it output some kind of label for that image. The label that the network outputs will
correspond to a pre-defined class. There can be multiple classes that the image can be
labeled as, or just one. If there is a single class, the term "recognition" is often applied,
whereas a multi-class recognition task is often called "classification".
A subset of image classification is object detection, where specific instances of objects
are identified as belonging to a certain class like animals, cars, or people.
The first layer of a neural network takes in all the pixels within an image. After all the
data has been fed into the network, different filters are applied to the image, which forms
representations of different parts of the image. This is feature extraction and it creates
"feature maps".
This process of extracting features from an image is accomplished with a "convolutional
layer", and convolution is simply forming a representation of part of an image. It is from
this convolution concept that we get the term Convolutional Neural Network (CNN), the
type of neural network most commonly used in image classification/recognition.
If you want to visualize how creating feature maps works, think about shining a flashlight
over a picture in a dark room. As you slide the beam over the picture you are learning
about features of the image. A filter is what the network uses to form a representation of
the image, and in this metaphor, the light from the flashlight is the filter.
The width of your flashlight's beam controls how much of the image you examine at one
time, and neural networks have a similar parameter, the filter size. Filter size affects how
much of the image, how many pixels, are being examined at one time. A common filter
size used in CNNs is 3, and this covers both height and width, so the filter examines a 3 x
3 area of pixels.
While the filter size covers the height and width of the filter, the filter's depth must also
be specified.
How does a 2D image have depth?
Digital images are rendered as height, width, and some RGB value that defines the pixel's
colors, so the "depth" that is being tracked is the number of color channels the image has.
Grayscale (non-color) images only have 1 color channel while color images have 3 depth
channels.
All of this means that for a filter of size 3 applied to a full-color image, the dimensions of
that filter will be 3 x 3 x 3. For every pixel covered by that filter, the network multiplies
the filter values with the values in the pixels themselves to get a numerical representation
of that pixel. This process is then done for the entire image to achieve a complete
representation. The filter is moved across the rest of the image according to a parameter
called "stride", which defines how many pixels the filter is to be moved by after it
calculates the value in its current position. A conventional stride size for a CNN is 2.
The end result of all this calculation is a feature map. This process is typically done with
more than one filter, which helps preserve the complexity of the image.
Block Diagram
CIFAR 10 DATASET
UML DIAGRAMS
UML stands for Unified Modelling Language. UML is a standardized general-purpose
modelling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or process
may also be added to; or associated with, UML.
The Unified Modelling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as well as
for business modelling and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modelling of large and complex systems. The UML is a very important
part of developing objects oriented software and the software development process. The
UML uses mostly graphical notations to express the design of software projects.
Goals
The Primary goals in the design of the UML are as follows:
Provide users a ready-to-use, expressive visual modelling Language so that they can
develop and exchange meaningful models.
Provide extendibility and specialization mechanisms to extend the core concepts.
Be independent of particular programming languages and development process.
Provide a formal basis for understanding the modelling language.
Encourage the growth of OO tools market.
Support higher level development concepts such as collaborations, frameworks, patterns
and components.
Integrate best practices.
USER
DATASET
Fig 8 : Use Case Diagram
SEQUENCE DIAGRAM
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.
USER DATASET
7. IMPLEMENTATION
per-pixel manipulations,
masking and transparency handling,
image filtering, such as blurring, contouring, smoothing, or edge finding,
image enhancing, such as sharpening, adjusting brightness, contrast or color,
adding text to images and much more.
Keras
Keras is an Open Source Neural Network library written in Python that runs on top of
Theano or TensorFlow. It is designed to be modular, fast and easy to use. It was developed
by François Chollet, a Google engineer.
Keras doesn't handle low-level computation. Instead, it uses another library to do it, called
the "Backend. So Keras is high-level API wrapper for the low-level API, capable of
running on top of TensorFlow, CNTK, or Theano.
Keras High-Level API handles the way we make models, defining layers, or set up
multiple input-output models. In this level, Keras also compiles our model with loss and
optimizer functions, training process with fit function. Keras doesn't handle Low-Level
API such as making the computational graph, making tensors or other variables because
it has been handled by the "backend" engine.
CIFAR 10
Note: As CIFAR 10 consists of 32*32 size images. We need to feed the images of 32*32
size only to detect the name of the image.
NumPy
NumPy is a Python package which stands for ‘Numerical Python’. It is the core library
for scientific computing, which contains a powerful n-dimensional array object, provide
tools for integrating C, C++ etc. It is also useful in linear algebra, random number
capability etc. NumPy array can also be used as an efficient multi-dimensional container
for generic data. Now, let me tell you what exactly is a python numpy array.
NumPy Array: Numpy array is a powerful N-dimensional array object which is in the
form of rows and columns. We can initialize numpy arrays from nested Python lists and
access it elements.
H5Py
The h5py package is a Pythonic interface to the HDF5 binary data format.
It lets you store huge amounts of numerical data, and easily manipulate that data from
NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they
were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized
and tagged however you want.
H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy
array syntax. For example, you can iterate over datasets in a file, or check out the .shape
or .dtype attributes of datasets.
After further research, we found that by using a Convolutional Neural Network can
automatically extract features from the images by using adjacent pixel information with a
window matrix and creating a convolution layer to learn patterns in the images to better
classify the dataset.
4-Layer CNN
Our images were 32x32x3: 32 pixels long, 32 pixels wide, and 3 pixels for red, green, and
blue values. This model used the training set as the input layer, 4 convolutional layers,
two that have 32 output channels and two that have 64 output channels. For each
convolutional layer, we have decided on a kernel size of 3x3, a common option for image
classification. It will scan a 3x3 window of the image at a time.
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), activation='relu',
padding='same', kernel_constraint=maxnorm(3)))
Next, we have selected the ReLu function for activation. The ReLu function takes max(0,
x) to add non-linearity to the network. Another activation function that could be deployed
would be the sigmoid function, but ReLu was chosen due to its efficiency.
Next, we have selected max pooling layers that are 2x2. These pooling layers will take 4
of the max values out of the convolutional layer for downsampling. This is done to prevent
overfitting.
model.add(MaxPooling2D(pool_size= (2,2)))
Lastly, the fully-connected output layer uses a softmax activation to calculate the loss. To
test how the model trains with respect to epochs, we trained the model on 20, 50, and 100
epochs.
6-Layer CNN
In the 6-layer CNN, the only difference is the addition of two layers with 128 output
channels each.
model.add(Conv2D(128,(3,3),padding='same',activation='relu'))
model.add(Dropout(0.2))model.add(Conv2D(128,(3,3),padding='same',activation='relu')
)
model.add(MaxPooling2D(pool_size=(2,2)))
Coding:
Image_Recognition_Trainer.py:
from PIL import Image
from keras.datasets import cifar10
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), activation='relu', padding='same',
kernel_constraint=maxnorm(3)))
model.add(MaxPooling2D(pool_size= (2,2)))
model.add(Flatten())
model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
import h5py
model.save('trained_model.h5')
Image_Recognition_Tester.py
labels = ['airplane', 'automobile', 'bird','cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
model = load_model('Trained_model.h5')
Note: Increase in number of “epochs” in the program increases the performance of the image recognizer
because the number in epochs referred to number of times the model is trained. So better the trainer
results in accurate output.
End-User Instructions
1. First the user need to run the image_recognition_trainer.py file to train the model
to recognise the image we upload.
2. After successfully training our model now the user need to run
image_recognition_tester.py file to run the model for recognising the images and
to upload images.
3. When we run the model the program prompts the user to specify the path of the
image in your PC. After specifying the path of the image you want to recognise. Hit
enter.
4. After hitting enter the image is processed by our model and the image is
opened with the default image reader of your computer. And the
recognised image name is displayed on the run console of the program.
Note: The images which we upload into the model must be of size 32*32
because the CIFAR 10 dataset consists of 32*32 images only (i.e. collect
the images of 32*32 from any sources and mention that path of the image
when prompted and then test the model to recognise your image).
8. SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product It is
the process of exercising software with the intent of ensuring that the software system
meets its requirements and user expectations and does not fail in an unacceptable manner.
There are various types of test. Each test type addresses a specific testing requirement.
Types of Testing’s:
Unit testing:
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path
of a business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
Integration testing:
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components
were individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing
the problems that arise from the combination of components.
Functional Test:
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals. Functional testing is centered on the following items:
Valid Input: identified classes of valid input must be accepted.
Invalid Input: identified classes of invalid input must be rejected.
Functions: identified functions must be exercised.
Output: identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked. Organization
and preparation of functional tests is focused on requirements, key functions, or special
test cases. In addition, systematic coverage pertaining to identify Business process flows;
data fields, predefined processes, and successive processes must be considered for testing.
Before functional testing is complete, additional tests are identified and the effective value
of current tests is determined.
System Testing:
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system
testing is the configuration oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration
points.
Results obtained after comparing all models with logistic regression model:
Taking all constructed models into account, we can see that the Convolutional
Neural Network models perform the best. Much of the weights are centralized on the
diagonal of the confusion matrices (shown below), representing correct predictions. The
other two models, Logistic Regression and Random Forests, have confusion matrices that
have elements on the matrices that are shaded darker on the off-diagonal compared to the
Convolutional Neural Network models. These two models have higher misclassification
rates. Specifically, these models both classified ‘airplanes’ and ‘ships’ relatively well.
Similarly, they also both heavily misclassified ‘cats’.
One major drawback that we had was the size of the dataset. CIFAR-10 had 50,000
training images and 10,000 testing images for each model to go through. As our team was
training the models, we realized that we lacked computational power, despite using a
GPU. As a result, we had to use a fraction of the training images in order to get results
from our first model, Logistic Regression. When comparing the results of Logistic
Regression to the other models, it did the worst. Possibly due to the shortage of data,
Logistic Regression did not perform as well as it could have.
In our models, we chose a small set of parameters to perform grid search on to obtain the
best parameters to train a given model. To get better results, our team could have
considered using more parameters.
Figure III. Confusion Matrix for 4-Layered Convolutional Neural Network, 20 epochs
Figure IV. Confusion Matrix for 4-Layered Convolutional Neural Network, 50 epochs
Figure V. Confusion Matrix for 4-Layered Convolutional Neural Network, 100 epochs
Figure VI. Confusion Matrix for 6-Layered Convolutional Neural Network, 20 epochs
Figure VII. Confusion Matrix for 6-Layered Convolutional Neural Network, 50 epochs
Figure VIII. Confusion Matrix for 6-Layered Convolutional Neural Network, 100 epochs
9. SCREENSHOTS
10. CONCLUSION
Through these four models, we can see how each image classification model fares against
each other. Simpler models like Logistic Regression and Random Forest showed that they
were able to classify some classes relatively well, such as ‘airplanes’ and ‘ships’, but
strongly misclassified ‘cats’. Our best model was the Convolutional Neural Network,
which had a highly distinguishable diagonal on the confusion matrix, indicating high
classification rates.
Given higher computational power, our team believes that we can achieve better results,
especially for the models that did poorly. We would be able to add more parameters to our
grid search to find the optimal model for Logistic Regression and Random Forest. It would
also make it possible to utilize 100% of the dataset for all models.
In the future, we would like to explore more of image classification. We would like to
improve on our models, through the methods stated above, and try to create models for
the dataset CIFAR-100, with more 100 classes instead of 10.
Overall, we’ve come to a better understanding of how image classification works and what
parameters and elements are crucial to each.
Hope it was easy to go through tutorial as I have tried to keep it short and simple.
Beginners who are interested in Convolutional Neural Networks can start with this
application. In short, you have learnt how to implement following concepts with python
and Keras which are given below:
Plotting images with matplotlib.
Z-score (mean-std normalization) of images.
Building a deep Convolutional Neural Network.
Applying batch normalization.
Regularization: Dropout & Kernel regularizers.
Data Augmentation: ImageDataGenerator in Keras.
Saving & Loading CNN models.
The full python implementation of Image recognition task with ~90% accuracy on
CIFAR-10 dataset can be found with the help Neural Networks of Deep Learning
concepts.
REFERENCES
5) Github.com