0% found this document useful (0 votes)

2 views

19ImageClassification

Image Classification(AI)

Uploaded by

puchiechea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

19ImageClassification

Image Classification(AI)

Uploaded by

puchiechea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

CSE 634: Data Mining

Professor: Anita Wasilewska

IMAGE CLASSIFICATION USING CONVOLUTIONAL

NEURAL NETWORKS
REFERENCES
• http://www3.cs.stonybrook.edu/~cse634/L7ch6NN.pdf
• http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf

• https://deeplearning.web.unc.edu/files/2016/12/An-overview-of-gradient-descent-optimization-algorithm.pdf

• https://www.slideshare.net/infobuzz/back-propagation

• http://people.uncw.edu/tagliarinig/Courses/415/Lectures/An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt

• https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59

• https://cs231n.github.io

• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing
systems. 2012.

• https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177

• https://www.kdnuggets.com/2016/06/visual-explanation-backpropagation-algorithm-neural-networks.html

• https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/http://www.emergentmind.com/neural-network

• https://en.wikipedia.org/wiki/Convolutional_neural_network
Paper
• Name: "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012
• Authors : Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton
• Conference: ILSVRC(ImageNet Large Scale Visual Recognition
Competition)-2012
OVERVIEW

• Introduction to Image Classification

• Loss functions, Optimization and Gradient descent
• Neural Networks and Backpropagation Algorithm
• Convolutional Neural Networks
• Paper : Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification
with deep convolutional neural networks." Advances in neural information processing
systems. 2012.
IMAGE CLASSIFICATION

Input: An image( matrix of pixel dimensions)

Categories/Labels : A set of pre-determined values which

define an image.

Output: A label corresponding to the input image.

www.tenserflow.com
CHALLENGES

• Illumination:

• Deformation:

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
CHALLENGES
• Occlusion:

• Background Clutter:

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
INITIAL ATTEMPTS

Detect edges

• Compute explicit “Rules” based on corners and boundaries and identify Labels based on
these rules.
ex: Two lines meeting at a corner are a cat’s ears.
Pitfalls
• Time consuming, since we have start all over for an other object label.

John Cannmy “A computational approach to edge detection” IEEE TPAMI 1986

A DATA DRIVEN APPROACH

Classifier
training data test data

output

https://www.cs.toronto.edu/~kriz/cifar.html
K-NEAREST NEIGHBORS
• Use a distance metric ex:L1 or L2 distance and compute the K-nearest neighbors
i.e. K “trained” images having least difference of the distance metric from the
chosen image.
• A majority vote is taken among the K neighbors and that is selected as the label
of the test image.

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
K-NEAREST NEIGHBORS

• Simply Memorize all training data and labels

• Choose a K on the training data and evaluate it on the testing data

Pitfalls
• Distance metric not very effective.

• Curse of dimensionality.
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
LINEAR CLASSIFICATION
A linear classifier is of the form
f(x,W) = Wx + b
x – Input vector {x1,x2,..xn} where xi is the value of a pixel
dimension
W – set of weights assigned to each pixel dimension
determined by the training data for each label.
b – bias for each label.
f(x,W) – vector of scores corresponding to each label

f(x,W)

https://www.pyimagesearch.com/2016/08/22/an-intro-to-linear-classification-with-python/
INTERPRETING A LINEAR CLASSIFIER

• Each image is a point In the high dimensional space

• The linear classifier puts in the linear decision boundaries separating one category
from the rest of the categories.

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
AN EXAMPLE
Column
Vector
Dog
score
+ = Cat
score
Ship
score
W x b

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture2.pdf
LOSS FUNCTIONS

• Loss functions for classification are computationally feasible loss functions representing
the price paid for inaccuracy of predictions in classification problems (problems of
identifying which category a particular observation belongs to).

• It describes how far off the result your network produced is from the expected result - it
indicates the magnitude of error your model made on its prediction.

Source: https://en.wikipedia.org/wiki/Loss_functions_for_classification
https://stackoverflow.com/questions/42877989/what-is-a-loss-function-in-simple-words
A loss function tells how good
are: our current classifier is

Given a dataset of examples

Where is image and

is (integer) label

Loss over the dataset is a sum of

cat 3.2 1.3 2.2 loss over examples:

car 5.1 4.9 2.5

frog -1.7 2.0 -3.1

Source: https://www.pinterest.com/pin/34973272730414755;https://www.freepik.com/free-photo/car-in-
glossy-red_758995.htm#term=convert&page=1&position=4;https://study.com/academy/lesson/what-is-a-
natural-habitat-definition-habitat-destruction-quiz.html
are:

where is the image and

where is the (integer) label,

and using the shorthand for the

scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2
car 5.1 4.9 2.5
frog -1.7 2.0 -3.1

Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
Multiclass SVM loss:
are:

Given an example “Hinge loss”

where is the
image and where is the
(integer) label,

and using the shorthand for the

scores vector:

cat 3.2 1.3 2.2 the SVM loss has the form:

car 5.1 4.9 2.5

frog -1.7 2.0 -3.1

where is the image and

where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2

= max(0, 5.1 - 3.2 + 1)
car 5.1 4.9 2.5 +max(0, -1.7 - 3.2 + 1)
frog -1.7 2.0 -3.1 = max(0, 2.9) + max(0, -3.9)
= 2.9 + 0
Loss: 2.9 = 2.9
Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and

where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2

car 5.1 4.9 2.5 = max(0, 1.3 – 4.9 + 1)
+max(0, 2.0 – 4.9 + 1)
frog -1.7 2.0 -3.1 = max(0, -2.6) + max(0, -1.9)
=0+0
Loss: 0 =0
Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and

where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2

car 5.1 4.9 2.5 = max(0, 2.2 – (-3.1) + 1)
+max(0, 2.5 – (-3.1) + 1)
frog -1.7 2.0 -3.1 = max(0, 6.3) + max(0, 6.6)
= 6.3 + 6.6
Loss: 12.9 = 12.9
Source: https://www.pinterest.com/pin/349732727304147554
https://www.freepik.com/free-photo/car-in-glossy-
red_758995.htm#term=convert&page=1&position=42
are:

where is the image and

where is the (integer) label,
and using the shorthand for the
scores vector:

the SVM loss has the form:

cat 3.2 1.3 2.2

car 5.1 4.9 2.5 Loss over full dataset is average:

frog -1.7 2.0 -3.1

L = (2.9 + 0 + 12.9)/3
Loss: 2.9 0 12.9 = 5.27
Source: https://www.pinterest.com/pin/349732727304147554;https://www.freepik.com/free-photo/car-in-
glossy-red_758995.htm#term=convert&page=1&position=4; https://study.com/academy/lesson/what-is-
a-natural-habitat-definition-habitat-destruction-quiz.html
OverFitting

Data loss: Model predictions Regularization: Model should

be “simple”, so it works on test
should match training data data

Source: http://cs231n.stanford.edu/
2017/
OPTIMIZATION

• Optimization Algorithms are used to update weights and biases i.e. the internal
parameters of a model to reduce the error.

Source: https://medium.com/data-science-group-iitr/loss-functions-and-optimization-algorithms-demystified-
bb92daff331cs
Gradient Descent
• Gradient descent is a way to minimize an objective
function J(w) parameterized by a model's parameters
w

• It updates the parameters in the opposite direction of

the gradient of the objective function w.r.t. to the
parameters (∇wJ(w))

• The learning rate η determines the size of the steps

we take to reach a (local) minimum

Source: https://deeplearning.web.unc.edu/files/2016/12/An-overview-of-gradient-descent-optimization-
algorithm.pdf
https://giphy.com/gifs/gradient-O9rcZVmRcEGqI
Gradient Descent
Vanilla Gradient Descent Algorithm:

• Start with an initial set of coefficients for the function

These could be 0.0 or a small random value.
coefficient = 0.0
• Calculate the derivative of the cost. The derivative is a concept from calculus and refers to the slope of the
function at a given point. We need to know the slope so that we know the direction(sign) to move the
coefficient values in order to get a lower cost on the next iteration.
delta = derivative(cost)
• Now that we know from the derivative which direction is downhill, we can now update the coefficient
values.
• A learning rate parameter (alpha) must be specified that controls how much the coefficients can change on
each update.
coefficient = coefficient – (alpha * delta)
• This process is repeated until the cost of the coefficients (cost) is 0.0 or close enough to zero to be good
enough.

Source: https://machinelearningmastery.com/gradient-descent-for-machine-learning/
NEURAL NETWORKS AND
BACKPROPAGATION ALGORITHM
INTRODUCTION

• Backpropagation is a method used in artificial neural networks to calculate a

gradient that is needed in the calculation of the weights to be used in the
network. It is commonly used to train deep neural network, a term referring to
neural networks with more than one hidden layer.
• The term is an abbreviation for “backwards propagation of errors”.

Source: https://www.slideshare.net/infobuzz/back-propagation
https://en.wikipedia.org/wiki/Backpropagation
INTUITION
• As the algorithm's name implies, the errors (and
therefore the learning) propagate backwards from
the output nodes to the inner nodes.
• So technically speaking, backpropagation is used
to calculate the gradient of the error of the network
with respect to the network's modifiable weights.
• This gradient is almost always then used in a
simple stochastic gradient descent algorithm to find
weights that minimize the error. “

• Inputs xi are fed through input

connections
• Specific functions are modeled using
real weights wi
• The output of the neuron is a
nonlinear function f of its weighted
inputs

• Nodes whose inputs arise outside the network are called input nodes and simply
copy values

• An input may excite or inhibit the response of the neuron to which it is applied,
depending upon the weight of the connection

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
Weights
• Normally, positive weights are considered as excitatory while negative weights are
thought of as inhibitory
• Learning is the process of modifying the weights in order to produce a network that
performs some function

Output
The response function is normally nonlinear
Samples include
• Sigmoid

• Piecewise linear

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
BACKPROPAGATION PREPARATION

• Training Set
A collection of input-output patterns that are used to train the network

• Testing Set
A collection of input-output patterns that are used to assess network performance

• Learning Rate-α
A scalar parameter, analogous to step size in numerical integration, used to set
the rate of adjustments

• Root-Mean-Squared-Error (RMSE)

Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
A PSEUDO-CODE ALGORITHM
• Randomly choose the initial weights
• While error is too large
• For each training pattern (presented in random order)
• Apply the inputs to the network
• Calculate the output for every neuron from the input layer, through the hidden
layer(s), to the output layer
• Calculate the error at the outputs
• Use the output error to compute error signals for pre-output layers
• Use the error signals to compute weight adjustments
• Apply the weight adjustments
• Periodically evaluate the network performance

Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
APPLY INPUTS FROM A PATTERN

Feedforward

• Apply the value of each input parameter to

each input node

Outputs
Inputs
• Input nodes compute only the identity
function

Source http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
CALCULATE OUTPUTS FOR EACH NEURON BASED
ON THE PATTERN

• The output from neuron j for pattern p is

Opj where Feedforward

Outputs
Inputs
and

• k ranges over the input indices and Wjk is

the weight on the connection from input k
to neuron j
Source:http://people.uncw.edu/tagliarinig/Courses/415/Lectures/An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
CALCULATE THE ERROR SIGNAL FOR EACH
OUTPUT NEURON
• The output neuron error signal δpj is given by δpj=(Tpj-Opj) Opj (1-Opj)

• Tpj is the target value of output neuron j for pattern p

• Opj is the actual output value of output neuron j for pattern p

Source:
http://people.uncw.edu/tagliarinig/Courses/415/Lectures/An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
CALCULATE THE ERROR SIGNAL FOR EACH
HIDDEN NEURON
• The hidden neuron error signal δpj is given by

where δpk is the error signal of a post-synaptic neuron k and Wkj is the weight of
the connection from hidden neuron j to the post-synaptic neuron k

• Compute weight adjustments

ΔWji at time t by

ΔWji(t)= α δpj Opi

• Apply weight adjustments

according to

Wji(t+1) = Wji(t) + ΔWji(t)

Source: http://people.uncw.edu/tagliarinig/Courses/415/Lectures/
An%20Introduction%20To%20The%20Backpropagation%20Algorithm.ppt
https://giphy.com/gifs/neural-networks-4LiMmbAcvgTQs
MERITS AND DEMERITS OF
BACKPROPAGATION
MERITS DEMERITS

• Relatively simple implementation. • Slow and inefficient. Can get stuck in

• Mathematical Formula used in local minima resulting in sub-optimal
algorithm can be applied to any solutions .
network. It does not require any • A large amount of input/output data is
special mention of the features of the
available, but you're not sure how to
function to be learnt.
relate it to the output.
• Batch update of weights exist, which
• Outputs can be “fuzzy” or non-
provides a smoothing effect on the
weight correction terms. numeric.

Source: https://www.slideshare.net/infobuzz/back-propagation
https://en.wikipedia.org/wiki/Backpropagation
CONVOLUTIONAL NEURAL NETWORK
WHY CNN?
• ConvNets are powerful due to their ability to extract the core features of an image
and use these features to identify images that contain features like them.

• Even in a two layer CNN we can start to see the network paying a lot of attention
to regions like the whiskers, nose, and eyes of the cat.

• These are the types of features that would allow the CNN to differentiate a cat
from a bird for example.

Source:
https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59
NEURAL NETWORK

Source:
https://cs231n.github.io
CNN ARCHITECTURE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
CNN LAYERS
• Convolutional layers
• Activation layers
• Pooling layers
• Fully Connected Layer
CONVOLUTIONAL LAYER
• The convolutional layer is the core building block of a CNN.
• The CONV layer’s parameters consist of a set of learnable filters (Kernel).
• Conv layer maintains the structural aspect of the image
• As we move over an image we effectively check for patterns in that section of the
image.
• When training an image, these filter weights change, and so when it is time to
evaluate an image, these weights return high values if it thinks it is seeing a
pattern it has seen before.
• The combinations of high weights from various filters let the network predict the
content of an image.
Source:https://en.wikipedia.org/wiki/Convolutional_neural_network
https://cs231n.github.io
https://medium.com/@Aj.Cheng/convolutional-neural-network-d9f69e473feb
CONVOLUTED IMAGE

Source: https://cs231n.github.io
CONVOLUTION

Source: https://cs231n.github.io
CONVOLUTION EXAMPLE

1 1 1 0 1
0 1 1 1 0 1 0 1
0 0 1 1 1 0 1 0
0 0 1 1 0 1 0 1
0 1 1 0 0
Convolved Feature
Image
CONVOLUTION EXAMPLE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
CONVOLUTION(IMPORTANT TERMINOLOGY)
• Stride: The distance the window moves each time.
• Kernel: The “window” that moves over the image.
• Depth: Depth of the output volume is a hyperparameter. It corresponds to the
number of filters we would like to use, each learning to look for something
different in the input.
• Zero-padding: Hyperparameter. We will use it to exactly preserve the spatial size
of the input volume so the input and output width and height are the same
MULTIPLE FILTERS

Source: https://cs231n.github.io
SIMPLE FULLY CONNECTED NN VS CNN

CNN retains the structure of the image

Source: https://cs231n.github.io
CNN LAYERS
• Convolutional layers
• Activation layers
• Pooling layers
• Fully Connected Layer
CNN ARCHITECTURE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
ACTIVATION LAYER
• The purpose of the Activation Layer is to squash the value of the Convolution
Layer into a range, usually [0,1]
• This layer increases the nonlinear properties of the model and the overall network
without affecting the receptive fields of the convolution layer.
• Examples: tanh, sigmoid, ReLu

Source: https://cs231n.github.io
DIFFERENT ACTIVATION FUNCTIONS

Source: https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59
MULTIPLE LAYERS OF CNN AND RELU

Source: https://cs231n.github.io
CNN LAYERS
• Convolutional layers
• Activation layers
• Pooling layers
• Fully Connected Layer
CNN ARCHITECTURE

Source: https://medium.com/dbrs-innovation-labs/visualizing-neural-networks-in-virtual-space-7e3f62f7177
POOLING LAYER
• Pooling Layer’s function is to progressively reduce the spatial size of the
representation to reduce the amount of parameters and computation in the
network, and hence to also control overfitting.
• Max pooling and Average pooling are the most common pooling functions. Max
pooling takes the largest value from the window of the image currently covered by
the kernel, while average pooling takes the average of all values in the window.
POOLING LAYER(MAX POOL)

Source: https://sefiks.com/2017/11/03/a-gentle-introduction-to-convolutional-neural-networks/
POOLING LAYER(GRAPHICAL
REPRESENTATION)

Source:https://ithelp.ithome.com.tw/articles/10187424
SUMMARY OF CNN LAYERS
• Convolutional layers multiply kernel value by the image window and optimize
the kernel weights over time using gradient descent
• Pooling layers describe a window of an image using a single value which is the
max or the average of that window(Max Pool vs Average Pool)
• Activation layers squash the values into a range, typically [0,1] or [-1,1].
• Fully Connected Layer Neurons have full connections to all activations in the
previous layer, as seen in regular Neural Networks. Their activations can hence
be computed with a matrix multiplication followed by a bias offset.

Source: https://cs231n.github.io
DEMO
• https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
IMAGENET CLASSIFICATION WITH DEEP CONVOLUTIONAL
NEURAL NETWORKS

By Alex Krizhevsky, Ilya Sutskever, Geoffrey E.

Hinton

Journal: Advances in neural information processing

systems (2012)

Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
Outline

● Goal
● Dataset
● Architecture
● Overfitting
● Reducing Overfitting
● Results
ILSVRC: ImageNet Large Scale Visual Recognition Competition

● Annual competition of image classification at large scale

● 1.2M images in 1K categories
● Classification: make 5 guesses about the image label

Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf
Goal

Image Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

DATASET PREPROCESSING OF DATA

● The dataset used was a subset of ● The ImageNet consisted of variable-

ImageNet dataset with roughly 1000 resolution images thus each image
images of each of the 1000 was downsampled to a fixed
categories. resolution of 256 x 256.
● Given a rectangular image, the
● In all, there were roughly, image was rescaled such that the
○ 1.2 million Training images shorter side was of length 256, and
then cropped out the central
○ 50,000 validation images
256×256 patch from the resulting
○ 150,000 test images
image.
● So the network was trained on
(centered) raw RGB values of the
pixels.
Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
THE ARCHITECTURE

Image Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
THE ARCHITECTURE

● The net contains eight layers with ● CONV1

weights; the first five are ● MAX POOL1
convolutional and the remaining ● NORM1
● CONV2
three are fullyconnected layers.
● MAX POOL2
● NORM2
● The output of the last fully- ● CONV3
connected layer is fed to a 1000- ● CONV4
way softmax which produces a ● CONV5
distribution over the 1000 class ● Max POOL3
labels. ● FC6
● FC7
● FC8

Source: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks."
Advances in neural information processing systems. 2012.
THE ARCHITECTURE

Image Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

Overfitting

● 60 million parameters, 650,000

neurons
○ Overfits alot
REDUCING OVERFITTING
● The focus of this paper was to reduce overfitting whilst outperforming state-of-the-art
models.
● The two ways implemented to reduce overfitting were:
○ Data Augmentation
○ Dropout
● Data Augmentation: It is the process of artificially enlarging the dataset using label-
preserving transformations. It was done in two ways:
○ Generated image translations and horizontal reflections. This is done by extracting
random 224 × 224 patches (and their horizontal reflections) from the 256×256 images
and training the network on these extracted patches. This increases the size of the
training set by a factor of 2048.
○ Altered the intensities of the RGB channels in training images
● Dropout: It is a method of setting the output of each hidden neuron to zero with probability
Source:of 0.5. Use
Krizhevsky, Alex,of dropout
Ilya Sutskever,forces theE.network
and Geoffrey to learn
Hinton. "Imagenet more with
classification robust features while
deep convolutional avoiding
neural networks." Advances in
neural information processing systems. 2012.
overfitting.
RESULTS

● The network won the

contest, achieving
top-1 and top-5 test
set error rates of
37.5% and 16.4%.

Image Source: http://teleported.in/posts/

decoding-resnet-architecture/

Lab NN KNN SVM
No ratings yet
Lab NN KNN SVM
13 pages
Machine Learning Algorithms - pptx-1
No ratings yet
Machine Learning Algorithms - pptx-1
129 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
125 pages
03 Classification
No ratings yet
03 Classification
93 pages
Week 10
No ratings yet
Week 10
31 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Lec 05
No ratings yet
Lec 05
54 pages
UNIT-3
No ratings yet
UNIT-3
100 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Ds 2
No ratings yet
Ds 2
27 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
1 page
cs231n Github Io Linear Classify
No ratings yet
cs231n Github Io Linear Classify
21 pages
CS231n Convolutional Neural Networks For Visual Recognition 3
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 3
15 pages
lecture19
No ratings yet
lecture19
8 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
PyTorch Neural Network Classifcation
No ratings yet
PyTorch Neural Network Classifcation
1 page
03 Ai
No ratings yet
03 Ai
59 pages
SVM
No ratings yet
SVM
40 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
SVM
No ratings yet
SVM
57 pages
Tud DL Lecture02 Background LinearClass
No ratings yet
Tud DL Lecture02 Background LinearClass
51 pages
COMP9517 Lab3 - Theory
No ratings yet
COMP9517 Lab3 - Theory
16 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
SVM Class
No ratings yet
SVM Class
33 pages
Lec 04
No ratings yet
Lec 04
70 pages
SWE622 Lecture 3 Classification
No ratings yet
SWE622 Lecture 3 Classification
57 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Lecture 2
No ratings yet
Lecture 2
101 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
Image Classification
No ratings yet
Image Classification
18 pages
Part 2
No ratings yet
Part 2
225 pages
Learning 2
No ratings yet
Learning 2
104 pages
08classification I
No ratings yet
08classification I
52 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Part 11 MD
No ratings yet
Part 11 MD
53 pages
Lecture 2
No ratings yet
Lecture 2
98 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
CSC 2541: Neural Net Training Dynamics: Lecture 1 - A Toy Model: Linear Regression
No ratings yet
CSC 2541: Neural Net Training Dynamics: Lecture 1 - A Toy Model: Linear Regression
62 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
9.b Handout-3-GD variants
No ratings yet
9.b Handout-3-GD variants
3 pages
ML Notes
No ratings yet
ML Notes
79 pages
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
No ratings yet
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
29 pages
Machine Learning - Lecture 5
No ratings yet
Machine Learning - Lecture 5
19 pages
Prediction Errors Tech Report
No ratings yet
Prediction Errors Tech Report
9 pages
Object Recog
No ratings yet
Object Recog
102 pages
CS231n Convolutional Neural Networks For Visual Recognition PDF
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition PDF
16 pages
Machine Learning Lectures
No ratings yet
Machine Learning Lectures
82 pages
Lect 1
No ratings yet
Lect 1
24 pages
DL1-Ver1
No ratings yet
DL1-Ver1
49 pages
Lecture 3
No ratings yet
Lecture 3
105 pages
Lecture 2 PDF
No ratings yet
Lecture 2 PDF
62 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Enhancing Surface Quality of Metal Parts Manufactured Via LPBF: ANN Classifier and Bayesian Learning Approach
No ratings yet
Enhancing Surface Quality of Metal Parts Manufactured Via LPBF: ANN Classifier and Bayesian Learning Approach
9 pages
XI-Unit-04-Data-Analysis
No ratings yet
XI-Unit-04-Data-Analysis
12 pages
Cyber Security
No ratings yet
Cyber Security
29 pages
ArcGIS Image Analysis Workflow
No ratings yet
ArcGIS Image Analysis Workflow
54 pages
Data Mining Dan Bigdata
No ratings yet
Data Mining Dan Bigdata
38 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
45 pages
M L - B D D U P S F: Achine Earning Ased Iabetes Etection Sing Hotoplethysmography Ignal Eatures
No ratings yet
M L - B D D U P S F: Achine Earning Ased Iabetes Etection Sing Hotoplethysmography Ignal Eatures
11 pages
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
No ratings yet
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
16 pages
Ncracit 2023
No ratings yet
Ncracit 2023
479 pages
Kano Application in Restaurant Industry
No ratings yet
Kano Application in Restaurant Industry
4 pages
Sentiment Analysis in Marathi Language
No ratings yet
Sentiment Analysis in Marathi Language
5 pages
Navarro-Cerrillo Et Al - 2010 - Evaluating Models To Assess The Distribution of Buxus Balearica in Southern Spain
No ratings yet
Navarro-Cerrillo Et Al - 2010 - Evaluating Models To Assess The Distribution of Buxus Balearica in Southern Spain
12 pages
Virtusa Som ML Resume
No ratings yet
Virtusa Som ML Resume
3 pages
Prediction of Automobile Insurance Fraud Claims Us
No ratings yet
Prediction of Automobile Insurance Fraud Claims Us
7 pages
Python Drone Tracker Project Report
No ratings yet
Python Drone Tracker Project Report
18 pages
LN and ML-based Model Architecture For Recruiting IT Professionals
No ratings yet
LN and ML-based Model Architecture For Recruiting IT Professionals
18 pages
A I IN FINANCE UT COURSE SYLLABUS & BIOS
No ratings yet
A I IN FINANCE UT COURSE SYLLABUS & BIOS
10 pages
ML Unit 1 MCQ
100% (1)
ML Unit 1 MCQ
9 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
BAI601-NLP
No ratings yet
BAI601-NLP
5 pages
RED 4335 Field Experience Information Lesson Plan Template and Rubric
No ratings yet
RED 4335 Field Experience Information Lesson Plan Template and Rubric
11 pages
Download Methods of Multivariate Analysis (Wiley Series in Probability and Statistics Book 709) 3rd Edition – Ebook PDF Version ebook All Chapters PDF
100% (1)
Download Methods of Multivariate Analysis (Wiley Series in Probability and Statistics Book 709) 3rd Edition – Ebook PDF Version ebook All Chapters PDF
51 pages
Forecasting Cryptocurrency Returns From Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision
No ratings yet
Forecasting Cryptocurrency Returns From Sentiment Signals: An Analysis of BERT Classifiers and Weak Supervision
29 pages
University of Mumbai Sample MCQ Question Bank Course Code and Name: BDA ITC801 /R16 Class: BE Semester:8 Options A B C D
No ratings yet
University of Mumbai Sample MCQ Question Bank Course Code and Name: BDA ITC801 /R16 Class: BE Semester:8 Options A B C D
6 pages
Prediction of Kidney Failure Disease by Using Machine Learning
No ratings yet
Prediction of Kidney Failure Disease by Using Machine Learning
45 pages
Intro To ML PDF
No ratings yet
Intro To ML PDF
66 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Face Recognition Using LBPH
No ratings yet
Face Recognition Using LBPH
16 pages
E-Mail Spam Classification Via Machine Learning and Natural Language Processing
No ratings yet
E-Mail Spam Classification Via Machine Learning and Natural Language Processing
7 pages
Machine Learning Natural Language 2023
No ratings yet
Machine Learning Natural Language 2023
28 pages

19ImageClassification

Uploaded by

19ImageClassification

Uploaded by

CSE 634: Data Mining

Professor: Anita Wasilewska

IMAGE CLASSIFICATION USING CONVOLUTIONAL

• Introduction to Image Classification

Input: An image( matrix of pixel dimensions)

Categories/Labels : A set of pre-determined values which

Output: A label corresponding to the input image.

John Cannmy “A computational approach to edge detection” IEEE TPAMI 1986

• Simply Memorize all training data and labels

• Choose a K on the training data and evaluate it on the testing data

• Each image is a point In the high dimensional space

Given a dataset of examples

Where is image and

Loss over the dataset is a sum of

car 5.1 4.9 2.5

where is the image and

and using the shorthand for the

the SVM loss has the form:

Given an example “Hinge loss”

and using the shorthand for the

car 5.1 4.9 2.5

where is the image and

the SVM loss has the form:

cat 3.2 1.3 2.2

where is the image and

the SVM loss has the form:

cat 3.2 1.3 2.2

where is the image and

the SVM loss has the form:

cat 3.2 1.3 2.2

where is the image and

the SVM loss has the form:

cat 3.2 1.3 2.2

frog -1.7 2.0 -3.1

Data loss: Model predictions Regularization: Model should

• It updates the parameters in the opposite direction of

• The learning rate η determines the size of the steps

• Start with an initial set of coefficients for the function

• Backpropagation is a method used in artificial neural networks to calculate a

• Inputs xi are fed through input

• Apply the value of each input parameter to

• The output from neuron j for pattern p is

• k ranges over the input indices and Wjk is

• Tpj is the target value of output neuron j for pattern p

• Opj is the actual output value of output neuron j for pattern p

• Compute weight adjustments

ΔWji(t)= α δpj Opi

• Apply weight adjustments

Wji(t+1) = Wji(t) + ΔWji(t)

• Relatively simple implementation. • Slow and inefficient. Can get stuck in

CNN retains the structure of the image

By Alex Krizhevsky, Ilya Sutskever, Geoffrey E.

Journal: Advances in neural information processing

● Annual competition of image classification at large scale

Image Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

● The dataset used was a subset of ● The ImageNet consisted of variable-

● The net contains eight layers with ● CONV1

Image Source: http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

● 60 million parameters, 650,000

● The network won the

Image Source: http://teleported.in/posts/

You might also like