IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
IT5409 - Ch7 - Part2 - Object Recognition - v2 - 4pages
Contents
• Overview of ‘semantic vision’?
• Image classification/ recognition
• Bag-of-words
Computer Vision ‒ Recall
‒ Vocabulary tree
Chapter 7 (part 2): Object recognition
• Classification
‒ K nearest neighbors
‒ Naïve Bayes
‒ Support vector machine
3 4
22/05/2021
5 6
Sky
What’s in the scene? What type of scene is it?
(semantic segmentation) (Scene categorization)
Mountain
Tree
s
Outdoor
Marketplace
Building
City
Vendor
s
People
Ground
7 8
22/05/2021
This is a chair
9 10
Object recognition
And it can get a lot harder
Is it really so hard?
Find the chair in this image
13 14
Michelangelo 1475-1564
image credit: J. Koenderink
15 16
22/05/2021
17 18
Image Classification/
Recognition
23 24
22/05/2021
25 26
Image Image
Training
Features Features
27 28
22/05/2021
Test Image
Image
Features
29 30
Image Learned
Training
Features Classifier
Bag of words
Test Image Basic model: recall
Image Learned Vocabulary tree
Prediction
Features Classifier
31 32
22/05/2021
An object as
33 34
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
35 36
22/05/2021
histogram
an old idea
(e.g., texture recognition and information retrieval)
37 38
Vector Space Model A document (datapoint) is a vector of counts over each word (feature)
1 6 2 1 0 0 0 1
Tartan robot CHIMP CMU bio soft ankle sensor
Use any distance you want but the cosine distance is fast.
0 4 0 1 4 5 3 2
Tartan robot CHIMP CMU bio soft ankle sensor
http://www.fodey.com/generators/newspaper/snippet.asp 39 40
22/05/2021
TF-IDF
Term Frequency Inverse Document Frequency
inverse document
term frequency
frequency
41 42
Dictionary Learning:
Learn Visual Words using clustering
Encode:
build Bags-of-Words (BOW) vectors
Standard BOW pipeline for each image
(for image classification)
Classify:
Train and test data using BOWs
43 44
22/05/2021
1. extract features (e.g., SIFT) from images 2. Learn visual dictionary (e.g., K-means clustering)
45 46
• Regular grid
• Vogel & Schiele, 2003
• Fei-Fei & Perona, 2005 Compute SIFT
• Interest point detector descriptor Normalize patch
• Csurka et al. 2004 [Lowe’99]
• Fei-Fei & Perona, 2005
• Sivic et al. 2005
Detect patches
• Other methods [Mikojaczyk and Schmid ’02]
• Random sampling (Vidal-Naquet &
[Mata, Chum, Urban & Pajdla, ’02]
Ullman, 2002)
[Sivic & Zisserman, ’03]
• Segmentation-based patches (Barnard
et al. 2003)
47 48
22/05/2021
49 50
… …
Clustering
51 52
22/05/2021
Clustering
K-means clustering
53 61
…
Appearance codebook
62 Source: 63
B. Leibe
22/05/2021
Another dictionary
Dictionary Learning:
Learn Visual Words using clustering
…
…
…
… Encode:
build Bags-of-Words (BOW) vectors
for each image
Classify:
…
Appearance codebook Train and test data using BOWs
Source: B. Leibe
64 65
Encode: Encode:
build Bags-of-Words (BOW) vectors build Bags-of-Words (BOW) vectors
for each image for each image 2. Histogram: count the
number of visual word
occurrences
66 67
22/05/2021
Test image
frequency
Vocabulary
tree with
inverted
index
D. Nistér and H. Stewénius, Scalable
….. Recognition with a Vocabulary Tree,
Database CVPR 2006
codewords slide: S. Lazebnik
68 69
Nister and Stewenius CVPR 2006 Nister and Stewenius CVPR 2006
70 71
22/05/2021
Model images
Dictionary Learning:
Learn Visual Words using clustering
78 79
22/05/2021
80 81
Distribution of data from two classes Distribution of data from two classes
84 85
Euclidean
Cosine
Chi-squared
88 89
91 92
22/05/2021
Hyperparameters
• What is the best distance to use?
• What is the best value of k to use?
• Very problem-dependent
• Must try them all and see what works best
93 94
Validation
95 96
22/05/2021
Cross-validation
97 98
99 100
22/05/2021
• Training: O(1)
• Testing: O(MN)
• Hmm…
‒ Normally need the opposite
‒ Slow training (ok), fast testing (necessary)
101 102
Naïve Bayes
For classification, z is a
X is a set of observed features
discrete random variable (e.g., features from a single image)
(e.g., car, person, building)
Recall:
The posterior can be decomposed according to
This is called the posterior: Bayes’ Rule
the probability of a class z given the observed features X likelihood prior
posterior
In our context…
For classification, z is a
Each x is an observed feature
discrete random variable (e.g., visual words)
(e.g., car, person, building)
The naive Bayes’ classifier is solving this optimization A naive Bayes’ classifier assumes all features are
conditionally independent
Bayes’ Rule
Recall:
Remove constants
111 112
22/05/2021
count 1 6 2 1 0 0 0 1
word Tartan robot CHIMP CMU bio soft ankle sensor
p(x|z) 0.09 0.55 0.18 0.09 0.0 0.0 0.0 0.09
Support Vector
Machine
count 0 4 0 1 4 5 3 2
word Tartan robot CHIMP CMU bio soft ankle sensor
http://www.fodey.com/generators/newspaper/snippet.asp p(x|z) 0.0 0.21 0.0 0.05 0.21 0.26 0.16 0.11
113 116
117 118
22/05/2021
Linear Classifier
119 120
Distribution of data from two classes Distribution of data from two classes
Hyperplanes (lines) in 2D
123 124
Important property:
Free to choose any normalization of w
The line
125 126
22/05/2021
scale by
127 128
129 130
22/05/2021
Now we can go to 3D …
131 132
133 134
22/05/2021
135 136
137 138
22/05/2021
support vectors
margin
Objective Function
Constraints
‘soft’ margin
147 148
149 150
22/05/2021
Intuitively, we should allow for some misclassification if we Trade-off between the MARGIN and the MISTAKES
can get more robust classification 151
(might be a better solution)
152
objective subject to
for
misclassified
point
153 154
22/05/2021
for for
157 158
22/05/2021
References
Most of these slides were adapted from:
159 160