0% found this document useful (0 votes)
14 views

IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages

This document provides an overview of object recognition and image classification techniques in computer vision. It discusses challenges such as variability in viewpoint, illumination, scale, deformation, occlusion, background clutter, and intra-class variations that make object recognition difficult. The document then describes common approaches to image classification, including collecting labeled image data to train classifiers using machine learning algorithms. A "bag-of-words" model represents images as collections of local features without spatial or geometric information.

Uploaded by

ngocnguyenthe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages

This document provides an overview of object recognition and image classification techniques in computer vision. It discusses challenges such as variability in viewpoint, illumination, scale, deformation, occlusion, background clutter, and intra-class variations that make object recognition difficult. The document then describes common approaches to image classification, including collecting labeled image data to train classifiers using machine learning algorithms. A "bag-of-words" model represents images as collections of local features without spatial or geometric information.

Uploaded by

ngocnguyenthe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

22/05/2021

Contents
• Overview of ‘semantic vision’?
• Image classification/ recognition
• Bag-of-words
Computer Vision ‒ Recall
‒ Vocabulary tree
Chapter 7 (part 2): Object recognition
• Classification
‒ K nearest neighbors
‒ Naïve Bayes
‒ Support vector machine

Is this a street light?


(Recognition / classification)

Overview of ‘semantic vision’?

3 4
22/05/2021

Where are the people? Is that Potala palace?


(Detection) (Identification)

5 6

Sky
What’s in the scene? What type of scene is it?
(semantic segmentation) (Scene categorization)

Mountain
Tree
s

Outdoor
Marketplace
Building
City

Vendor
s
People

Ground
7 8
22/05/2021

What are these people doing?


(Activity / Event Recognition)
Object recognition
Is it really so hard?
Find the chair in this image Output of normalized correlation

This is a chair

9 10

Object recognition
And it can get a lot harder
Is it really so hard?
Find the chair in this image

Pretty much garbage


Simple template matching is not going to make it
A “popular method is that of template matching, by point to point correlation of a model
pattern with the image pattern. These techniques are inadequate for three-dimensional
scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
of parts.” Nivatia & Binford, 1977.
11 12
22/05/2021

Why is this hard?

Variability: Camera position


Illumination
Shape parameters

13 14

Challenge: variable viewpoint Challenge: variable illumination

Michelangelo 1475-1564
image credit: J. Koenderink
15 16
22/05/2021

Challenge: scale Challenge: deformation

17 18

Challenge: Occlusion Challenge: background clutter

Magritte, 1957 Kilmeny Niland. 1995


19 20
22/05/2021

Challenge: intra-class variations

Challenge: Background clutter


21 22 Lazebnik
Svetlana

Image Classification/ Recognition

Image Classification/
Recognition

23 24
22/05/2021

Image Classification: Problem Data-driven approach


• Collect a database of images with labels
• Use ML to train an image classifier
• Evaluate the classifier on test images

25 26

A simple pipeline - Training A simple pipeline - Training


Training
Training Training Labels
Images Images

Image Image
Training
Features Features

27 28
22/05/2021

A simple pipeline - Training A simple pipeline - Training


Training Training
Training Labels Training Labels
Images Images

Image Learned Image Learned


Training Training
Features Classifier Features Classifier

Test Image
Image
Features

29 30

A simple pipeline - Training


Training
Training Labels
Images

Image Learned
Training
Features Classifier

Bag of words
Test Image Basic model: recall
Image Learned Vocabulary tree
Prediction
Features Classifier

31 32
22/05/2021

Some local feature are


very informative
What object do these parts belong to?

An object as

a collection of local features


(bag-of-features)

• deals well with occlusion


• scale invariant
• rotation invariant

33 34

(not so) crazy assumption


CalTech6 dataset

Works pretty well for image-level classification


spatial information of local features
can be ignored for object recognition (i.e., verification)

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
35 36
22/05/2021

Bag-of-features Texture recognition

histogram

represent a data item (document, texture, image)


as a histogram over features Universal texton dictionary

an old idea
(e.g., texture recognition and information retrieval)

37 38

Vector Space Model A document (datapoint) is a vector of counts over each word (feature)

counts the number of occurrences just a histogram over words

1 6 2 1 0 0 0 1
Tartan robot CHIMP CMU bio soft ankle sensor

What is the similarity between two documents?

Use any distance you want but the cosine distance is fast.

0 4 0 1 4 5 3 2
Tartan robot CHIMP CMU bio soft ankle sensor

G. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979

http://www.fodey.com/generators/newspaper/snippet.asp 39 40
22/05/2021

TF-IDF
Term Frequency Inverse Document Frequency

but not all words are created equal


weigh each word by a heuristic

inverse document
term frequency
frequency

(down-weights common terms)

41 42

Dictionary Learning:
Learn Visual Words using clustering

Encode:
build Bags-of-Words (BOW) vectors
Standard BOW pipeline for each image
(for image classification)
Classify:
Train and test data using BOWs

43 44
22/05/2021

Dictionary Learning: Dictionary Learning:


Learn Visual Words using clustering Learn Visual Words using clustering

1. extract features (e.g., SIFT) from images 2. Learn visual dictionary (e.g., K-means clustering)

45 46

What kinds of features can we extract?

• Regular grid
• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005 Compute SIFT
• Interest point detector descriptor Normalize patch
• Csurka et al. 2004 [Lowe’99]
• Fei-Fei & Perona, 2005
• Sivic et al. 2005
Detect patches
• Other methods [Mikojaczyk and Schmid ’02]
• Random sampling (Vidal-Naquet &
[Mata, Chum, Urban & Pajdla, ’02]
Ullman, 2002)
[Sivic & Zisserman, ’03]
• Segmentation-based patches (Barnard
et al. 2003)

47 48
22/05/2021

How do we learn the dictionary?

49 50

… …

Clustering

51 52
22/05/2021

Visual vocabulary From what data should I learn the dictionary?


… ‒ Dictionary can be learned on separate training set
‒ Provided the training set is sufficiently representative, the
dictionary will be “universal”

Clustering

K-means clustering
53 61

Example visual dictionary Example dictionary


Appearance codebook

62 Source: 63
B. Leibe
22/05/2021

Another dictionary
Dictionary Learning:
Learn Visual Words using clustering



… Encode:
build Bags-of-Words (BOW) vectors
for each image

Classify:

Appearance codebook Train and test data using BOWs

Source: B. Leibe

64 65

1. Quantization: image features gets


associated to a visual word (nearest
cluster center)

Encode: Encode:
build Bags-of-Words (BOW) vectors build Bags-of-Words (BOW) vectors
for each image for each image 2. Histogram: count the
number of visual word
occurrences

66 67
22/05/2021

Scalability: Alignment to large databases


• What if we need to align a test image with thousands or
millions of images in a model database?
‒ Efficient putative match generation
• Fast nearest neighbor search, inverted indexes

Test image
frequency

Vocabulary
tree with
inverted
index
D. Nistér and H. Stewénius, Scalable
….. Recognition with a Vocabulary Tree,
Database CVPR 2006
codewords slide: S. Lazebnik
68 69

What is a Vocabulary Tree? What is a Vocabulary Tree?

• Multiple rounds of K-Means to compute decision tree (offline)


• Fill and query tree online

Nister and Stewenius CVPR 2006 Nister and Stewenius CVPR 2006

70 71
22/05/2021

Populating the vocabulary tree Populating the vocabulary tree


/inverted index /inverted index

Model images

Slide credit: D. Nister Slide credit: D. Nister


72 73

Model images Model images

Slide credit: D. Nister Slide credit: D. Nister


74 75
22/05/2021

Looking up a test image

Model images Model images Test image

Slide credit: D. Nister Slide credit: D. Nister


76 77

Dictionary Learning:
Learn Visual Words using clustering

Encode: K nearest neighbors


build Bags-of-Words (BOW) vectors
for each image Support Vector Machine

Classify: Naïve Bayes


Train and test data using BOWs

78 79
22/05/2021

Classification K nearest neighbors

80 81

Distribution of data from two classes Distribution of data from two classes

Which class does q belong too?


82 83
22/05/2021

Distribution of data from two classes


K-Nearest Neighbor (KNN) Classifier

Non-parametric pattern classification


approach
Consider a two class problem where
each sample consists of two
Look at the neighbors measurements (x,y).

For a given query point q, assign the k=1


class of the nearest neighbor

Compute the k nearest neighbors and k=3


assign the class by majority vote.

84 85

Nearest Neighbor is competitive

What is the best distance metric between data points?

‒ Typically Euclidean distance


Test Error Rate (%)
‒ Locality sensitive distance metrics
Linear classifier (1-layer NN) 12.0
K-nearest-neighbors, Euclidean 5.0
K-nearest-neighbors, Euclidean, deskewed 2.4
‒ Important to normalize.
MNIST Digit Recognition Dimensions have different scales
– Handwritten digits
K-NN, Tangent Distance, 16x16 1.1
– 28x28 pixel images: d = 784
K-NN, shape context matching 0.67 How many K?
– 60,000 training samples
– 10,000 test samples 1000 RBF + linear classifier 3.6
SVM deg 4 polynomial 1.1 ‒ Typically k=1 is good
2-layer NN, 300 hidden units 4.7
Yann LeCunn 2-layer NN, 300 HU, [deskewing] 1.6
LeNet-5, [distortions] 0.8 ‒ Cross-validation (try different k!)
Boosted LeNet-4, [distortions] 0.7
86 87
22/05/2021

Distance metrics Distance metrics

Euclidean

Cosine

Chi-squared

88 89

CIFAR-10 and NN results k-nearest neighbor


• Find the k closest points from training data
• Labels of the k points “vote” to classify

91 92
22/05/2021

Hyperparameters
• What is the best distance to use?
• What is the best value of k to use?

• i.e., how do we set the hyperparameters?

• Very problem-dependent
• Must try them all and see what works best

93 94

Validation

95 96
22/05/2021

Cross-validation

97 98

How to pick hyperparameters? kNN


• Methodology Pros
‒ Train and test ‒ simple yet effective
‒ Train, validate, test Cons
‒ search is expensive (can be sped-up)
• Train for original model ‒ storage requirements
• Validate to find hyperparameters ‒ difficulties with high-dimensional data
• Test to understand generalizability

99 100
22/05/2021

kNN -- Complexity and Storage


• N training images, M test images

• Training: O(1)
• Testing: O(MN)

• Hmm…
‒ Normally need the opposite
‒ Slow training (ok), fast testing (necessary)

101 102

Distribution of data from two classes

Naïve Bayes

Which class does q belong too?


103 104
22/05/2021

Distribution of data from two classes


• Learn parametric model for each class
• Compute probability of query
This is called the posterior:
the probability of a class z given the observed features X

For classification, z is a
X is a set of observed features
discrete random variable (e.g., features from a single image)
(e.g., car, person, building)

(it’s a function that returns a single probability value)


105 106

Recall:
The posterior can be decomposed according to
This is called the posterior: Bayes’ Rule
the probability of a class z given the observed features X likelihood prior

posterior

In our context…
For classification, z is a
Each x is an observed feature
discrete random variable (e.g., visual words)
(e.g., car, person, building)

(it’s a function that returns a single probability value)


107 108
22/05/2021

The naive Bayes’ classifier is solving this optimization A naive Bayes’ classifier assumes all features are
conditionally independent

MAP (maximum a posteriori) estimate

Bayes’ Rule

Recall:
Remove constants

To optimize this…we need to compute this

Compute the likelihood…


109 110

To compute the MAP estimate


count 1 6 2 1 0 0 0 1
word Tartan robot CHIMP CMU bio soft ankle sensor
Given (1) a set of known parameters (2) observations p(x|z) 0.09 0.55 0.18 0.09 0.0 0.0 0.0 0.09

Compute which z has the largest probability

Numbers get really small so use log probabilities

* typically add pseudo-counts (0.001)


** this is an example for computing the likelihood, need to multiply times prior to get posterior

111 112
22/05/2021

count 1 6 2 1 0 0 0 1
word Tartan robot CHIMP CMU bio soft ankle sensor
p(x|z) 0.09 0.55 0.18 0.09 0.0 0.0 0.0 0.09

log p(X|z=grand challenge) = - 14.58


log p(X|z=bio inspired) = - 37.48

Support Vector
Machine
count 0 4 0 1 4 5 3 2
word Tartan robot CHIMP CMU bio soft ankle sensor
http://www.fodey.com/generators/newspaper/snippet.asp p(x|z) 0.0 0.21 0.0 0.05 0.21 0.26 0.16 0.11

log p(X|z=grand challenge) = - 94.06


log p(X|z=bio inspired) = - 32.41
* typically add pseudo-counts (0.001)
** this is an example for computing the likelihood, need to multiply times prior to get posterior

113 116

Image Classification Score function

117 118
22/05/2021

Linear Classifier

data (histogram) Convert image to histogram representation

119 120

Distribution of data from two classes Distribution of data from two classes

Learn the decision boundary

Which class does q belong too?


121 122
22/05/2021

Hyperplanes (lines) in 2D

a line can be written as


dot product plus a bias

First we need to understand hyperplanes…

another version, add a weight 1 and


push the bias inside

123 124

Hyperplanes (lines) in 2D Hyperplanes (lines) in 2D


(offset/bias outside) (offset/bias inside) (offset/bias outside) (offset/bias inside)

Important property:
Free to choose any normalization of w

The line

and the line

define the same line

125 126
22/05/2021

What is the distance


to origin?
distance to origin
(hint: use normal form)

scale by

you get the normal form

127 128

What is the distance distance


between two parallel lines? between two
(hint: use distance to origin) parallel lines

Difference of distance to origin

129 130
22/05/2021

Now we can go to 3D …

Hyperplanes (planes) in 3D Hyperplanes (planes) in 3D

what are the dimensions of


this vector?

What happens if you change b?

131 132

Hyperplanes (planes) in 3D Hyperplanes (planes) in 3D

What’s the distance


between these
parallel planes?

133 134
22/05/2021

What’s the best w? What’s the best w?

135 136

What’s the best w? What’s the best w?

137 138
22/05/2021

What’s the best w? What’s the best w?

Intuitively, the line that is the


139
farthest from all interior points 140

What’s the best w? What’s the best w?

support vectors

Maximum Margin solution:


most stable to perturbations of data Want a hyperplane that is far away from ‘inner points’
141 142
22/05/2021

Find hyperplane w such that …


Can be formulated as a maximization problem

margin

What does this constraint mean?

label of the data point

Why is it +1 and -1?

the gap between parallel hyperplanes is maximized


143 144

Can be formulated as a maximization problem


‘Primal formulation’ of a linear SVM

Objective Function

Equivalently, Where did the 2 go?

Constraints

This is a convex quadratic programming (QP) problem


(a unique solution exists)
What happened to the labels?
145 146
22/05/2021

What’s the best w?

‘soft’ margin

147 148

What’s the best w? Separating cats and dogs

Very narrow margin Very narrow margin

149 150
22/05/2021

What’s the best w? What’s the best w?

Very narrow margin

Intuitively, we should allow for some misclassification if we Trade-off between the MARGIN and the MISTAKES
can get more robust classification 151
(might be a better solution)
152

Adding slack variables


‘soft’ margin

objective subject to

for
misclassified
point

153 154
22/05/2021

‘soft’ margin ‘soft’ margin

objective subject to objective subject to

for for

• Every constraint can be satisfied if slack is large


The slack variable allows for mistakes, • C is a regularization parameter
as long as the inverse margin is minimized. • Small C: ignore constraints (larger margin)
• Big C: constraints (small margin)
• Still QP problem (unique solution)
155 156

157 158
22/05/2021

References
Most of these slides were adapted from:

1. Ioannis Yannis, Gkioulekas (16-385 Computer


Vision, Spring 2020, CMU)

2. Kristen Grauman (CS 376: Computer Vision, Spring


2018, The University of Texas at Austin) Thank
3. Noah Snavely (Cornell University) you!
4. Fei-Fei Li (Stanford University)

159 160

You might also like