0% found this document useful (0 votes)
2 views

19ImageClassification

Image Classification(AI)

Uploaded by

puchiechea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

19ImageClassification

Image Classification(AI)

Uploaded by

puchiechea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

CSE 634: Data Mining

Professor: Anita Wasilewska

IMAGE CLASSIFICATION USING CONVOLUTIONAL


NEURAL NETWORKS
REFERENCES
• http://www3.cs.stonybrook.edu/~cse634/L7ch6NN.pdf
• http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf

• https://deeplearning.web.unc.edu/files/2016/12/An-overview-of-gradient-descent-optimization-algorithm.pdf

• https://www.slideshare.net/infobuzz/back-propagation

• http://people.uncw.edu/tagliarinig/Courses/415/Lectures/An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt

• https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59

• https://cs231n.github.io

• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing
systems. 2012.

• https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177

• https://www.kdnuggets.com/2016/06/visual-explanation-backpropagation-algorithm-neural-networks.html

• https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/http://www.emergentmind.com/neural-network

• https://en.wikipedia.org/wiki/Convolutional_neural_network
Paper
• Name: "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012
• Authors : Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton
• Conference: ILSVRC(ImageNet Large Scale Visual Recognition
Competition)-2012
OVERVIEW

• Introduction to Image Classification


• Loss functions, Optimization and Gradient descent
• Neural Networks and Backpropagation Algorithm
• Convolutional Neural Networks
• Paper : Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification
with deep convolutional neural networks." Advances in neural information processing
systems. 2012.
IMAGE CLASSIFICATION

Input: An image( matrix of pixel dimensions)

Categories/Labels : A set of pre-determined values which


define an image.

Output: A label corresponding to the input image.

www.tenserflow.com
CHALLENGES

• Illumination:

• Deformation:

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
CHALLENGES
• Occlusion:

• Background Clutter:

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
INITIAL ATTEMPTS

Detect edges

• Compute explicit “Rules” based on corners and boundaries and identify Labels based on
these rules.
ex: Two lines meeting at a corner are a cat’s ears.
Pitfalls
• Time consuming, since we have start all over for an other object label.

John Cannmy “A computational approach to edge detection” IEEE TPAMI 1986


A DATA DRIVEN APPROACH

Classifier
training data test data

output

https://www.cs.toronto.edu/~kriz/cifar.html
K-NEAREST NEIGHBORS
• Use a distance metric ex:L1 or L2 distance and compute the K-nearest neighbors
i.e. K “trained” images having least difference of the distance metric from the
chosen image.
• A majority vote is taken among the K neighbors and that is selected as the label
of the test image.

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
K-NEAREST NEIGHBORS

• Simply Memorize all training data and labels

• Choose a K on the training data and evaluate it on the testing data


Pitfalls
• Distance metric not very effective.

• Curse of dimensionality.
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
LINEAR CLASSIFICATION
A linear classifier is of the form
f(x,W) = Wx + b
x – Input vector {x1,x2,..xn} where xi is the value of a pixel
dimension
W – set of weights assigned to each pixel dimension
determined by the training data for each label.
b – bias for each label.
f(x,W) – vector of scores corresponding to each label

f(x,W)

https://www.pyimagesearch.com/2016/08/22/an-intro-to-linear-classification-with-python/
INTERPRETING A LINEAR CLASSIFIER

• Each image is a point In the high dimensional space

• The linear classifier puts in the linear decision boundaries separating one category
from the rest of the categories.

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
AN EXAMPLE
Column
Vector
Dog
score
+ = Cat
score
Ship
score
W x b

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
LOSS FUNCTIONS

• Loss functions for classification are computationally feasible loss functions representing
the price paid for inaccuracy of predictions in classification problems (problems of
identifying which category a particular observation belongs to).

• It describes how far off the result your network produced is from the expected result - it
indicates the magnitude of error your model made on its prediction.

Source: https://en.wikipedia.org/wiki/Loss_functions_for_classification
https://stackoverflow.com/questions/42877989/what-is-a-loss-function-in-simple-words
A loss function tells how good
are: our current classifier is

Given a dataset of examples

Where is image and


is (integer) label

Loss over the dataset is a sum of


cat 3.2 1.3 2.2 loss over examples:

car 5.1 4.9 2.5


frog -1.7 2.0 -3.1

Source: https://www.pinterest.com/pin/34973272730414755;https://www.freepik.com/free-photo/car-in-
glossy-red_758995.htm#term=convert&page=1&position=4;https://study.com/academy/lesson/what-is-a-
natural-habitat-definition-habitat-destruction-quiz.html
are:

where is the image and


where is the (integer) label,

and using the shorthand for the


scores vector:

the SVM loss has the form:


cat 3.2 1.3 2.2
car 5.1 4.9 2.5
frog -1.7 2.0 -3.1

Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
Multiclass SVM loss:
are:

Given an example “Hinge loss”


where is the
image and where is the
(integer) label,

and using the shorthand for the


scores vector:

cat 3.2 1.3 2.2 the SVM loss has the form:

car 5.1 4.9 2.5


frog -1.7 2.0 -3.1

Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and


where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2


= max(0, 5.1 - 3.2 + 1)
car 5.1 4.9 2.5 +max(0, -1.7 - 3.2 + 1)
frog -1.7 2.0 -3.1 = max(0, 2.9) + max(0, -3.9)
= 2.9 + 0
Loss: 2.9 = 2.9
Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and


where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2


car 5.1 4.9 2.5 = max(0, 1.3 – 4.9 + 1)
+max(0, 2.0 – 4.9 + 1)
frog -1.7 2.0 -3.1 = max(0, -2.6) + max(0, -1.9)
=0+0
Loss: 0 =0
Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and


where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2


car 5.1 4.9 2.5 = max(0, 2.2 – (-3.1) + 1)
+max(0, 2.5 – (-3.1) + 1)
frog -1.7 2.0 -3.1 = max(0, 6.3) + max(0, 6.6)
= 6.3 + 6.6
Loss: 12.9 = 12.9
Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and


where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2


car 5.1 4.9 2.5 Loss over full dataset is average:

frog -1.7 2.0 -3.1


L = (2.9 + 0 + 12.9)/3
Loss: 2.9 0 12.9 = 5.27
Source: https://www.pinterest.com/pin/349732727304147554;https://www.freepik.com/free-photo/car-in-
glossy-red_758995.htm#term=convert&page=1&position=4; https://study.com/academy/lesson/what-is-
a-natural-habitat-definition-habitat-destruction-quiz.html
OverFitting

Data loss: Model predictions Regularization: Model should


be “simple”, so it works on test
should match training data data

Source: http://cs231n.stanford.edu/
2017/
OPTIMIZATION

• Optimization Algorithms are used to update weights and biases i.e. the internal
parameters of a model to reduce the error.

Source: https://medium.com/data-science-group-iitr/loss-functions-and-optimization-algorithms-demystified-
bb92daff331cs
Gradient Descent
• Gradient descent is a way to minimize an objective
function J(w) parameterized by a model's parameters
w

• It updates the parameters in the opposite direction of


the gradient of the objective function w.r.t. to the
parameters (∇wJ(w))

• The learning rate η determines the size of the steps


we take to reach a (local) minimum

Source: https://deeplearning.web.unc.edu/files/2016/12/An-overview-of-gradient-descent-optimization-
algorithm.pdf
https://giphy.com/gifs/gradient-O9rcZVmRcEGqI
Gradient Descent
Vanilla Gradient Descent Algorithm:

• Start with an initial set of coefficients for the function


These could be 0.0 or a small random value.
coefficient = 0.0
• Calculate the derivative of the cost. The derivative is a concept from calculus and refers to the slope of the
function at a given point. We need to know the slope so that we know the direction(sign) to move the
coefficient values in order to get a lower cost on the next iteration.
delta = derivative(cost)
• Now that we know from the derivative which direction is downhill, we can now update the coefficient
values.
• A learning rate parameter (alpha) must be specified that controls how much the coefficients can change on
each update.
coefficient = coefficient – (alpha * delta)
• This process is repeated until the cost of the coefficients (cost) is 0.0 or close enough to zero to be good
enough.

Source: https://machinelearningmastery.com/gradient-descent-for-machine-learning/
NEURAL NETWORKS AND
BACKPROPAGATION ALGORITHM
INTRODUCTION

• Backpropagation is a method used in artificial neural networks to calculate a


gradient that is needed in the calculation of the weights to be used in the
network. It is commonly used to train deep neural network, a term referring to
neural networks with more than one hidden layer.
• The term is an abbreviation for “backwards propagation of errors”.

Source: https://www.slideshare.net/infobuzz/back-propagation
https://en.wikipedia.org/wiki/Backpropagation
INTUITION
• As the algorithm's name implies, the errors (and
therefore the learning) propagate backwards from
the output nodes to the inner nodes.
• So technically speaking, backpropagation is used
to calculate the gradient of the error of the network
with respect to the network's modifiable weights.
• This gradient is almost always then used in a
simple stochastic gradient descent algorithm to find
weights that minimize the error. “

Source: https://www.slideshare.net/infobuzz/back-propagation
https://en.wikipedia.org/wiki/Backpropagation
BASIC NEURON MODEL - FEEDFORWARD
NETWORK

• Inputs xi are fed through input


connections
• Specific functions are modeled using
real weights wi
• The output of the neuron is a
nonlinear function f of its weighted
inputs

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
INPUTS TO NEURONS
• Arise from other neurons or from outside the network

• Nodes whose inputs arise outside the network are called input nodes and simply
copy values

• An input may excite or inhibit the response of the neuron to which it is applied,
depending upon the weight of the connection

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
Weights
• Normally, positive weights are considered as excitatory while negative weights are
thought of as inhibitory
• Learning is the process of modifying the weights in order to produce a network that
performs some function

Output
The response function is normally nonlinear
Samples include
• Sigmoid

• Piecewise linear

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
BACKPROPAGATION PREPARATION

• Training Set
A collection of input-output patterns that are used to train the network

• Testing Set
A collection of input-output patterns that are used to assess network performance

• Learning Rate-α
A scalar parameter, analogous to step size in numerical integration, used to set
the rate of adjustments

Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
NETWORK ERROR
• Total-Sum-Squared-Error (TSSE)

• Root-Mean-Squared-Error (RMSE)

Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
A PSEUDO-CODE ALGORITHM
• Randomly choose the initial weights
• While error is too large
• For each training pattern (presented in random order)
• Apply the inputs to the network
• Calculate the output for every neuron from the input layer, through the hidden
layer(s), to the output layer
• Calculate the error at the outputs
• Use the output error to compute error signals for pre-output layers
• Use the error signals to compute weight adjustments
• Apply the weight adjustments
• Periodically evaluate the network performance

Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
APPLY INPUTS FROM A PATTERN

Feedforward

• Apply the value of each input parameter to


each input node

Outputs
Inputs
• Input nodes compute only the identity
function

Source http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
CALCULATE OUTPUTS FOR EACH NEURON BASED
ON THE PATTERN

• The output from neuron j for pattern p is


Opj where Feedforward

Outputs
Inputs
and

• k ranges over the input indices and Wjk is


the weight on the connection from input k
to neuron j
Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
CALCULATE THE ERROR SIGNAL FOR EACH
OUTPUT NEURON
• The output neuron error signal δpj is given by δpj=(Tpj-Opj) Opj (1-Opj)

• Tpj is the target value of output neuron j for pattern p

• Opj is the actual output value of output neuron j for pattern p

Source:
http://people.uncw.edu/tagliarinig/Courses/415/Lectures/An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
CALCULATE THE ERROR SIGNAL FOR EACH
HIDDEN NEURON
• The hidden neuron error signal δpj is given by

where δpk is the error signal of a post-synaptic neuron k and Wkj is the weight of
the connection from hidden neuron j to the post-synaptic neuron k

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
CALCULATE AND APPLY WEIGHT
ADJUSTMENTS

• Compute weight adjustments


ΔWji at time t by

ΔWji(t)= α δpj Opi

• Apply weight adjustments


according to

Wji(t+1) = Wji(t) + ΔWji(t)

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
https://giphy.com/gifs/neural-networks-4LiMmbAcvgTQs
MERITS AND DEMERITS OF
BACKPROPAGATION
MERITS DEMERITS

• Relatively simple implementation. • Slow and inefficient. Can get stuck in


• Mathematical Formula used in local minima resulting in sub-optimal
algorithm can be applied to any solutions .
network. It does not require any • A large amount of input/output data is
special mention of the features of the
available, but you're not sure how to
function to be learnt.
relate it to the output.
• Batch update of weights exist, which
• Outputs can be “fuzzy” or non-
provides a smoothing effect on the
weight correction terms. numeric.

Source: https://www.slideshare.net/infobuzz/back-propagation
https://en.wikipedia.org/wiki/Backpropagation
CONVOLUTIONAL NEURAL NETWORK
WHY CNN?
• ConvNets are powerful due to their ability to extract the core features of an image
and use these features to identify images that contain features like them.

• Even in a two layer CNN we can start to see the network paying a lot of attention
to regions like the whiskers, nose, and eyes of the cat.

• These are the types of features that would allow the CNN to differentiate a cat
from a bird for example.

Source:
https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59
NEURAL NETWORK

Source:
https://cs231n.github.io
CNN ARCHITECTURE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
CNN LAYERS
• Convolutional layers
• Activation layers
• Pooling layers
• Fully Connected Layer
CONVOLUTIONAL LAYER
• The convolutional layer is the core building block of a CNN.
• The CONV layer’s parameters consist of a set of learnable filters (Kernel).
• Conv layer maintains the structural aspect of the image
• As we move over an image we effectively check for patterns in that section of the
image.
• When training an image, these filter weights change, and so when it is time to
evaluate an image, these weights return high values if it thinks it is seeing a
pattern it has seen before.
• The combinations of high weights from various filters let the network predict the
content of an image.
Source:https://en.wikipedia.org/wiki/Convolutional_neural_network
https://cs231n.github.io
https://medium.com/@Aj.Cheng/convolutional-neural-network-d9f69e473feb
CONVOLUTED IMAGE

Source: https://cs231n.github.io
CONVOLUTION

Source: https://cs231n.github.io
CONVOLUTION EXAMPLE

1 1 1 0 1
0 1 1 1 0 1 0 1
0 0 1 1 1 0 1 0
0 0 1 1 0 1 0 1
0 1 1 0 0
Convolved Feature
Image
CONVOLUTION EXAMPLE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
CONVOLUTION(IMPORTANT TERMINOLOGY)
• Stride: The distance the window moves each time.
• Kernel: The “window” that moves over the image.
• Depth: Depth of the output volume is a hyperparameter. It corresponds to the
number of filters we would like to use, each learning to look for something
different in the input.
• Zero-padding: Hyperparameter. We will use it to exactly preserve the spatial size
of the input volume so the input and output width and height are the same
MULTIPLE FILTERS

Source: https://cs231n.github.io
SIMPLE FULLY CONNECTED NN VS CNN

CNN retains the structure of the image

Source: https://cs231n.github.io
CNN LAYERS
• Convolutional layers
• Activation layers
• Pooling layers
• Fully Connected Layer
CNN ARCHITECTURE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
ACTIVATION LAYER
• The purpose of the Activation Layer is to squash the value of the Convolution
Layer into a range, usually [0,1]
• This layer increases the nonlinear properties of the model and the overall network
without affecting the receptive fields of the convolution layer.
• Examples: tanh, sigmoid, ReLu

Source: https://cs231n.github.io
DIFFERENT ACTIVATION FUNCTIONS

Source: https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59
MULTIPLE LAYERS OF CNN AND RELU

Source: https://cs231n.github.io
CNN LAYERS
• Convolutional layers
• Activation layers
• Pooling layers
• Fully Connected Layer
CNN ARCHITECTURE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
POOLING LAYER
• Pooling Layer’s function is to progressively reduce the spatial size of the
representation to reduce the amount of parameters and computation in the
network, and hence to also control overfitting.
• Max pooling and Average pooling are the most common pooling functions. Max
pooling takes the largest value from the window of the image currently covered by
the kernel, while average pooling takes the average of all values in the window.
POOLING LAYER(MAX POOL)

Source: https://sefiks.com/2017/11/03/a-gentle-introduction-to-convolutional-neural-networks/
POOLING LAYER(GRAPHICAL
REPRESENTATION)

Source:https://ithelp.ithome.com.tw/articles/10187424
SUMMARY OF CNN LAYERS
• Convolutional layers multiply kernel value by the image window and optimize
the kernel weights over time using gradient descent
• Pooling layers describe a window of an image using a single value which is the
max or the average of that window(Max Pool vs Average Pool)
• Activation layers squash the values into a range, typically [0,1] or [-1,1].
• Fully Connected Layer Neurons have full connections to all activations in the
previous layer, as seen in regular Neural Networks. Their activations can hence
be computed with a matrix multiplication followed by a bias offset.

Source: https://cs231n.github.io
DEMO
• https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
IMAGENET CLASSIFICATION WITH DEEP CONVOLUTIONAL
NEURAL NETWORKS

By Alex Krizhevsky, Ilya Sutskever, Geoffrey E.


Hinton

Journal: Advances in neural information processing


systems (2012)

Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
Outline

● Goal
● Dataset
● Architecture
● Overfitting
● Reducing Overfitting
● Results
ILSVRC: ImageNet Large Scale Visual Recognition Competition

● Annual competition of image classification at large scale


● 1.2M images in 1K categories
● Classification: make 5 guesses about the image label

Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf
Goal

Image Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf


DATASET PREPROCESSING OF DATA

● The dataset used was a subset of ● The ImageNet consisted of variable-


ImageNet dataset with roughly 1000 resolution images thus each image
images of each of the 1000 was downsampled to a fixed
categories. resolution of 256 x 256.
● Given a rectangular image, the
● In all, there were roughly, image was rescaled such that the
○ 1.2 million Training images shorter side was of length 256, and
then cropped out the central
○ 50,000 validation images
256×256 patch from the resulting
○ 150,000 test images
image.
● So the network was trained on
(centered) raw RGB values of the
pixels.
Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
THE ARCHITECTURE

Image Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
THE ARCHITECTURE

● The net contains eight layers with ● CONV1


weights; the first five are ● MAX POOL1
convolutional and the remaining ● NORM1
● CONV2
three are fullyconnected layers.
● MAX POOL2
● NORM2
● The output of the last fully- ● CONV3
connected layer is fed to a 1000- ● CONV4
way softmax which produces a ● CONV5
distribution over the 1000 class ● Max POOL3
labels. ● FC6
● FC7
● FC8

Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
THE ARCHITECTURE

Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
Training on Multiple GPUs

Image Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf


Overfitting

● 60 million parameters, 650,000


neurons
○ Overfits alot
REDUCING OVERFITTING
● The focus of this paper was to reduce overfitting whilst outperforming state-of-the-art
models.
● The two ways implemented to reduce overfitting were:
○ Data Augmentation
○ Dropout
● Data Augmentation: It is the process of artificially enlarging the dataset using label-
preserving transformations. It was done in two ways:
○ Generated image translations and horizontal reflections. This is done by extracting
random 224 × 224 patches (and their horizontal reflections) from the 256×256 images
and training the network on these extracted patches. This increases the size of the
training set by a factor of 2048.
○ Altered the intensities of the RGB channels in training images
● Dropout: It is a method of setting the output of each hidden neuron to zero with probability
Source:of 0.5. Use
Krizhevsky, Alex,of dropout
Ilya Sutskever,forces theE.network
and Geoffrey to learn
Hinton. "Imagenet more with
classification robust features while
deep convolutional avoiding
neural networks." Advances in
neural information processing systems. 2012.
overfitting.
RESULTS

● The network won the


contest, achieving
top-1 and top-5 test
set error rates of
37.5% and 16.4%.

Image Source: http://teleported.in/posts/


decoding-resnet-architecture/

You might also like