0% found this document useful (0 votes)
39 views6 pages

Fashion Modeling via Visual Patterns

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views6 pages

Fashion Modeling via Visual Patterns

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODELING FASHION

Qi Chen† , Gang Wang‡ , Chew Lim Tan†



School of Computing, National University of Singapore

School of EEE, Nanyang Technological University

{chenqi, tancl}@[Link], ‡ wanggang@[Link]

ABSTRACT
We propose a method to try to model fashionable dresses in
this paper. We first discover common visual patterns that ap-
pear in dress images using a human-in-the-loop, active clus-
tering approach. A fashionable dress is expected to contain
certain visual patterns which make it fashionable. An ap-
proach is proposed to jointly identify fashionable visual pat-
terns and learn a discriminative fashion classifier. The results
show that interesting fashionable patterns can be discovered
on a newly collected dress dataset. Furthermore, our model
can also achieve high accuracy on distinguishing fashionable Fig. 1. We aim to discover common visual patterns which
and unfashionable dresses. make dresses fashionable. The first row shows 5 fashionable
Index Terms— Fashion Classification, Visual Patten Dis- dresses with a shared fashionable visual pattern (labeled in
covery, Active Clustering magenta), and the second row shows non-fashionable ones.

1. INTRODUCTION visual patterns from a big collection of dress images. Given


discovered visual patterns, it will be much easier for designer-
Recently, computer vision techniques have been developed to s to know the current trend and invent their own designs. The
tackle many important problems such as image organization second application is fashion search. Sophisticated fashion
[1], controller free gaming [2], and surveillance [3]. But not classifier can help find fashionable products on the Internet.
much work has been done to study fashion, which is one of the We develop a discriminative model to train fashion classi-
largest industrial sectors around the world and has the market fiers and discover fashionable visual patterns simultaneously,
size of hundreds of billions of dollars each year. Besides, based on the assumption that these two tasks are complemen-
studying fashion may help reveal interesting human psycho- tary to each other. Our approach has two stages. In the first
logical mechanisms and social meanings. stage, we discover basic common visual patterns in dress im-
In this paper, we consider modeling fashion using com- ages. We first partition each dress into five parts. For each
puter vision techniques. Specifically, we target dress fashion. part, we produce a set of clusters corresponding to different
The goal of “fashion modeling” is two-fold. First, we aim visual patterns. Automatic, unsupervised clustering method-
to discover visual patterns which make a dress fashionable. s usually cannot produce good clustering results due to the
Second, we aim to train a discriminative classifier to identi- ambiguity of visual features. We introduce a discriminative,
fy fashionable dresses from unfashionable ones. These two human-in-the-loop approach to build high quality part clus-
tasks can help each other: identifying fashionable visual pat- ters. In the second stage, we learn a latent structural SVM
terns can help train an improved discriminative classifier; and classifier using both fashionable and unfashionable dress im-
an improved discriminative classifier can help better identify ages. The occurrence of each visual pattern is treated as a la-
fashionable visual patterns. Hence we learn visual pattern- tent variable. Our model jointly infers image class labels and
s and discriminative classifiers simultaneously in a coherent identifies visual patterns which make dresses fashionable.
framework. The discovered visual patterns and learnt dis- We conduct experiments on a dataset with around 2,600
criminative classifiers can be applied in the following two sce- realistic dress images. The results show that our method can
narios. The first one is to help designers on fashionable dress reasonably find fashionable visual patterns. We also perform
design. Currently, designers usually get inspirations from ex- experiments on fashion classification, which is shown to be
amples. However, it is not easy to summarize fashionable promising compared to the state of the art algorithm.
1.1. Related Work images are downloaded and assumed as our positive data. The
negative data are collected from Google image search and A-
In terms of fashion modeling, there are a few previous work- mazon dress shop. First, we crawl Google images using some
s that are related to ours. One close line of work is proba- subjective queries, such as “out of style dresses”. Second,
bly fashion/clothes search. Tsay and Lin [4] search clothes we search the cheapest dresses in Amazon online store, e.g.
according to the region of interest of an exemplar. [5] ap- dresses which are cheaper than 20 USD are crawled. More
ply content based retrieval techniques to find clothing. Liu et than 1500 dress images are downloaded as our negative data.
al. [6] address the cross scenario clothing retrieval problem After collecting both fashionable and non-fashionable dress
through parts alignment and auxiliary set. Work [7] demon- images, we further annotate them manually for the final class
strates a model to recognize and parse pictures of people in- labels. We get clean dress images without background by
to their constituent garments and develops a visual garment manually segmenting them since the focus of this paper is
retrieval application which is pose-invariant. In [8], a prac- to understand fashionable patterns. In the real application, a
tical system named magic closest is developed for automatic dress/person detector could be run to localize them.
clothing recommendation based on a user-input occasion. On
previous [Link] (which is now acquired by Google), user-
s can find clothes, shoes, or bags according to their favorite 3. APPROACH
color, texture, shape; or find similar items of an exemplar.
3.1. Framework
None of these works tries to discover fashionable visual pat-
terns from a collection of labeled images. Compared to them, Our approach consists of two phases: visual pattern discov-
our method can enable more flexible search. For example, we ery phase and latent structural SVM learning phase. In the
can rank the clothes according to fashionable visual patterns. first phase, basic common visual patterns that appear in dress
Besides enabling flexible visual search, our method can also images are discovered. In the second phase, a discriminative
discover fashionable visual patterns to help inspire designers. latent structural SVM model is learnt to differentiate fashion-
Visual pattern discovery is a popularly studied topic in able and non-fashionable dresses and meanwhile identify vi-
computer vision. Most work concentrates on discovering sual patterns which make a dress fashionable.
common object categories in a unsupervised way [9, 10, 11,
12]. In this paper, we aim to discover visual patterns for each 3.2. Basic Visual Pattern Discovery by Active Clustering
dress part. Automatic visual pattern discovery is hard. We
propose a human-in-the-loop approach to combine humans Dresses vary greatly in appearance. However, there are still
and computers to reliably discover visual clusters. From this many common patterns such as “V-neck”, especially when
perspective, Parikh and Grauman [13] is most relevant to our only considering a dress part such as “neck”. Most visual
work, which employs humans to judge whether one attribute patterns of dresses are not easy to be named, so we cannot
is nameable or not. However, in our work, humans are asked directly use the semantic attributes as in [18]. Instead, we
to label whether one data sample is from a particular cluster or treat it as a clustering problem: image regions in the same
not and the visual knowledge is propagated to other unlabeled cluster have the same visual pattern, which are characterized
samples, which is more similar to active learning. by shape, color, and texture. Only local visual patterns instead
We develop a latent structural SVM as the classifier. The of global ones are considered, as global visual patterns have
occurrence of visual patterns is treated as latent variables. much larger variation. Each dress is partitioned into five parts
This is intuitive: a dress is classified as “fashionable” because (a rough partitioning is shown in Fig. 2). For each part, we
it has certain fashionable visual patterns. Latent structural cluster the corresponding image regions to discover common
SVMs are popularly used in computer vision [14, 15, 16] and patterns. Note that in this phase, no class label information is
natural language process [17] due to its superior performance. used. Unsupervised, automatic clustering is very hard due to
We develop a variant of structural SVM model to model fash- the ambiguity of visual features. We develop a human-in-the-
ion in this paper. loop, discriminative method to efficiently combine humans
and computers to produce high quality visual clusters.
At the beginning, initial clusters are created via Affini-
2. DATASET ty Propagation [19], which can automatically determine the
number of clusters. After that, each cluster is refined by train-
Judging the fashion of a dress is a subjective task. As a s- ing a discriminative SVM classifier iteratively with humans in
tart, we focus on the top brands which are more likely to lead the loop to actively label examples. Specifically, refinement
the trend of fashion. Eight of the most famous fashion brand- of the ith cluster Ci is performed in the following steps:
s are firstly identified, that are Chanel, Dior, Donna Karan, (1) A canonical image region Ri is identified as the one clos-
Giorgio Armani, Gucci, Prada, Valentino and Versace. Then est to the cluster center of Ci . This Ri is labeled as the first
we search dresses which are published by these eight brands target data in this cluster.
from a professional fashion web site. More than 1,000 dress (2) A binary SVM classifier is trained to differentiate target
data and outliers in this cluster. Besides of labeled data, we the occurrence of a specific visual pattern on a dress image
also apply noisy data in the training set. For the noisy target is not deterministic and treated as a latent variable. The vi-
data, we select a number of image regions in Ci which are sual pattern latent variable h is defined as a K-dimensional
most similar to Ri . The noisy negative data is sampled from vector h = (h1 , h2 , ..., hK ), where hk indicates the pres-
other clusters. With the SVM classifier, image regions in clus- ence of the k-th visual pattern. hk is binary: 1/0 means the
ter Ci are classified and ranked according to the classification presence/absence of the k-th visual pattern. A training exam-
scores. ple is represented as (x, y). x is the dress image itself, and y is
(3) The canonical image Ri is updated as the one with the the class label (+1 for fashionable, and -1 for unfashionable).
largest classification scores among all the labeled target data.
(4) As there is limited number of labeled training data (e.g. 3.3.1. Model 1: A Basic Latent Structural SVM Model
only one in the first step), we select a number of informative
samples from Ci for labeling. Following [20], a set of unla- Given model parameters ω, the overall score of a hypothesis
beled samples which are closest to the SVM decision bound- h can be expressed in terms of dot product between the model
ary are selected. parameters and the joint feature vector Φ(x, y, h),
(5) For one selected image region x in cluster Ci , we label
whether it is similar to Ri . If the feedback is positive, x is la- fω (x, y, h) = ω · Φ(x, y, h) (1)
beled as target data; otherwise, x is labeled as an outlier. The
ground truth training set is updated accordingly. Eq. 1 measures the compatibility among the input x, output y,
This refinement process is iterative from step (2) to step (5) when the latent variable h is specified. We define ω·Φ(x, y, h)
until the percentage of labeled image regions reaches a pre- as:
defined threshold. In our experiments, we set this threshold X
to 40%. ω · Φ(x, y, h) = ωk ψ(x, hk , y) +
k
X
3.2.1. Window Search in the Clustering Process ωj,k ϕ(hj , hk , y) (2)
(j,k)∈E
There is no exact ground truth dress part annotation, though
we roughly know the positions. To reduce noise, in each it- The model parameters ω is a concatenation of all the ωk and
eration, sliding window search is performed for each dress to ωj,k . The details of this potential function are introduced be-
more accurately localize part regions. As shown in Fig. 2, a low.
dress is roughly partitioned into 5 parts, and a central point is (1) The first term characterizes the compatibility among
fixed for each part. Then a set of windows are generated by the image x, the occurrence of the kth visual pattern, and the
sliding the window with different scales through the whole image class label y. We further define it as:
image. For each generated window, we assign it to the near- 
est part. In the initial clustering, a random window is selected [υ(x, hk ), y] if hk = 1
ψ(x, hk , y) = (3)
for each part. During the refinement of each cluster, once a 0 if hk = 0
new binary classifier is trained, we search the best window for
each part that has the highest classification score. Intuitively, if the kth visual pattern appears (hk = 1), it should
In our experiments, 273 common visual patterns are dis- be compatible with the appearance feature υ(x, hk ) and the
covered for all the 5 parts (55, 54, 55, 57 and 52, respectively). class label y. Similar to [16], rather than keeping υ(x, hk ) as
Some examples of the discovered common visual patterns are a high dimensional image feature vector, we represent it as
illustrated in Fig. 2. a classification score output by the pre-trained visual pattern
SVM classifier, which is generated in Section 3.2. We cali-
brate the output of each visual pattern classifier using a sig-
3.3. Latent Structural SVM Models
moid function, so the score ranges from 0 to 1. Write fk (x) as
After the first phase, we have discovered a number of com- the “probabilistic” output of the k-th visual pattern classifier
mon visual patterns and trained a binary SVM classifier for on x, we write υ(x, hk ) as υ(x, hk ) = fk (x) − 0.5.
each of them. Now we aim to discriminatively identify visual Note that we also apply the sliding window search method
patterns which make a dress fashionable, and simultaneously to find the image region in x which produces the highest
learn an SVM classifier to differentiate fashionable and non- fk (x). If υ(x, hk ) < 0, then extra cost will be enforced if
fashionable dresses, in a coherent framework. The intuition is hk = 1. This significantly speeds up the learning algorithm
that: a fashionable dress is likely to have some visual patterns compared to using the original feature vectors. The occur-
which make it fashionable. rence of hk must also be compatible with the class label y.
We do not have ground truth visual pattern labels for ei- If the kth visual pattern appears in fashionable dresses much
ther training data or test data. We can only generate confi- more frequently, then the model parameter corresponding to
dence scores by applying the visual pattern classifiers. Hence the y dimension is likely to be positive.
Fig. 2. The illustration of the part partition and examples of discovered visual patterns. In the left figure, we roughly partition
the whole image into 5 parts. A center point is located for each part, and the search space of each part is labeled using the red
lines. We show two discovered visual patterns for each part in the right figure. Regions labeled in magenta represent the same
fashionable visual patterns, while regions labeled in blue show the same non-fashionable visual patterns.

(2) There are dependencies between some pair of visu- 3.3.2. Model Inference and Training
al patterns. A dress is more likely to be fashionable or un-
The above section shows how we compute the compatibility
fashionable, when a certain pair of visual patterns co-occurs.
score when h is specified. However, in the inference proce-
Based on this intuition, we add the second term to charac-
dure, h is a latent variable. For an input x, we should find a
terize the compatibility between the class label y and visual
class label y ∗ and a h∗ which produce the largest confidence
pattern pairs. We build an undirected graph G = (V, E) with
score:
a tree structure, where each node v ∈ V represents a visual
< y ∗ , h∗ >= arg max fω (x, y, h) (5)
pattern, and an edge (j, k) ∈ E means the jth and the kth vi- (y,h)
sual pattern have dependencies. For each visual pattern pair,
Experimentally, we use dynamic programming to solve the
we count the co-occurrence frequencies in the training data.
following inference problem.
We then perform the maximum spanning tree algorithm to
Similar to [17, 14], we adopt an EM-like algorithm to
generate edges E in the graph. For each pair (j, k) ∈ E, we
learn model parameters, as there are latent variables in the
represent it as.
model. We first initialize the model parameters, then the
 learning algorithm alternates between inferring latent vari-
y if hj = 1 and hk = 1
ϕ(hj , hk , y) = (4) ables and updating model parameters. Assume we have T
0 otherwise
training examples: {(xi , yi )|i = 1, 2, ..., T }. With the in-
ferred latent variable h∗i , it becomes a standard latent struc-
Intuitively, if a visual pattern pair (j, k) co-occurs more often
tural SVM learning problem. The optimization problem is
in the fashionable dress images, then it is more compatible
formulated as following:
with y = 1, and the corresponding model parameter ωj,k is
more likely to be positive. T
1 X
This model learns the discriminative classifier by find- min kωk2 + C ξi
ω 2
ing hidden rationales: one dress is fashionable because it has i=1
certain fashionable visual patterns and visual pattern pairs. s.t. ω · Φ(xi , yi , h∗i ) − max ω · Φ(xi , yˆi , ĥi ) ≥ 1 − ξi ,
Hence, it can help identify which visual patterns and visual (yˆi ,hˆi )
pattern pairs make a dress fashionable. ξi ≥ 0, ∀yˆi 6= yi (6)
Table 1. Comparison on the classification accuracy. “C”, “S” Table 2. Comparison on the precision, recall and F1 measure.
and “T” represents color, SIFT and texton features. “All” is “ SP(All)+χ2 ” means the spatial pyramid SVM trained on all
the combination of “C”, “S” and “T”. “SP” means a 3-layer the features with χ2 kernel.
spatial pyramid [23] is used. “Linear” means a linear SVM. Method Model 1 Model 2 SP(All)+χ2
“χ2 ” and “HIK” represents the χ2 kernel and histogram inter- Precision 0.80 0.88 0.89
section kernel. “Model 1” and “Model 2” are the two models Recall 0.83 0.94 0.90
proposed in this paper. F1 0.81 0.91 0.90
C S T All
Linear 0.73 0.78 0.74 0.77
HIK 0.74 0.86 0.79 0.88
χ2 0.82 0.85 0.83 0.88 4. EXPERIMENTS
SP(C) SP(S) SP(T) SP(All)
We perform experiments on our collected dress dataset. The
Linear 0.64 0.83 0.75 0.71
dataset contains 1011 positive data and 1637 negative data.
HIK 0.77 0.89 0.87 0.90
We discover the common visual patterns and learn the latent
χ2 0.79 0.89 0.88 0.90 structural SVM models on 1785 training images (676 posi-
Model 1 Model 2 tive data and 1109 negative data), the remaining 863 images
0.85 0.93 are used for testing. We combine three features to train a bi-
nary SVM classifier for each visual pattern in common visual
pattern discovery process. The three features are color word-
s, bag of SIFT words, and bag of texton words. For color
Following [21], we adopt the stochastic gradient descen- features, we encode the RGB value of each pixel with an in-
t method to learn the model parameter ω. It picks a xi at each teger between 0 and 511. Then we represent each image as
iteration, infer its (yˆi , ĥi ), and update the model parameters a 512 dimensional histogram. SIFT is extracted densely and
in a gradient descent fashion. Interested readers may refer to quantized into 1000 visual words. To extract texture infor-
[21] for more detailed introduction. mation, each image is convolved with the Leung-Mailk filter
bank [22]. The generated filter responses are quantized in-
to 1000 textons. Images are then represented as a “bag-of-
3.3.3. Model 2: A model with conventional image classifiers words” histogram, for SIFT and texton features, respectively.
For each type of feature, we construct a histogram intersec-
Our current model is built based on the discovered common tion kernel. Then we combine these three kernels with equal
visual patterns. Some discriminative information which is weights to train an SVM classifier. With visual pattern classi-
suitable for classification might be lost when creating this in- fiers available, it usually takes several hours to train our latent
termediate representation. We here propose a new model to structural SVM model, and takes less than one second to do
fully exploit the discrimination power of the original image inference on a test image.
features. We define the potential function as:
X 4.1. Classification Performance
ω · Φ(x, y, h) = ωr φ(x, y) + ωk ψ(x, hk , y) +
k
We first evaluate fashion classification. Except our method,
X we also evaluate a set of binary SVM classifiers trained on
ωj,k ϕ(hj , hk , y) (7) multiple features and their combinations, with histogram in-
(j,k)∈E
tersection kernel (HIK) and χ2 kernels, respectively. The
classification results are presented in Table 1. Among al-
The first term represents a conventional image linear clas- l the baselines, the spatial pyramid SVM trained on all the
sifier without considering the visual pattern information. A- features with χ2 kernel (SP(All)+χ2 ) produces the best s-
gain, we don’t keep φ(x, y) as a high dimensional feature core. It is very interesting that computer vision algorithms can
vector. Instead, we define it as φ(x, y) = y · (f (x) − 0.5), achieve very high accuracy on this classification task, which
where f (x) is the “probabilistic” output of a conventional im- may suggest that a fashion classifier can be applied in real
age classifier, which is trained using the positive and nega- visual shopping applications. Our Model 2 is trained based
tive training data offline. Intuitively, the class label y must on SP(All)+χ2 , and both Model 1 and Model 2 achieve quite
be compatible with the classification score f (x). This model promising accuracy. Table 2 shows the precision, recall and
can effectively combine a conventional image classifier and F1 measure scores for Model 1, Model 2 and one of the base-
our visual pattern based classifier for more effective classifi- line SP(All)+χ2 . The Model 2 achieves the highest recall and
cation. F1 scores among these three methods.
4.2. Qualitative Results of Discovered Fashionable Visual [4] J.J. Tsay, C.H. Lin, C.H. Tseng, and K.C. Chang, “On visual
Patterns clothing search,” in TAAI, 2011.
[5] H. Wang, J. Du, and Q. Guo, “The application of content based
Our models can identify fashionable visual pattern by joint-
image retrieval technology in clothing retrieval system,” CTA,
ly learning the classifier. As formulated in Eq. 3, a learned
2009.
model parameter ωk contains two dimensions. The first di-
mension measures how well the k-th visual pattern is com- [6] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan, “Street-to-
shop: Cross-scenario clothing retrieval via parts alignment and
patible with its appearance feature, and the second dimension
auxiliary set,” in CVPR, 2012.
measures how well it is compatible with its class label. Hence
we could use ωk to identify which patterns are more likely to [7] Kota Yamaguchi, M Hadi Kiapour, Luis E Ortiz, and Tamara L
be fashionable. For the k-th visual pattern, we measure its Berg, “Parsing clothing in fashion photographs,” in CVPR,
2012.
fashionability Fk as:
 [8] Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu,
ωk (1) ∗ ωk (2) if ωk (1) > 0 Changsheng Xu, and Shuicheng Yan, “Hi, magic closet, tell
Fk = (8)
NULL if ωk (1) <= 0 me what to wear!,” in ACM Multimedia, 2012.
where “NULL” means we neglect those patterns which are [9] Y.J. Lee and K. Grauman, “Object-graphs for context-aware
not compatible with their appearance features. A large and category discovery,” in CVPR, 2010.
positive Fk indicates this pattern is fashionable and stable. [10] B.C. Russell, W.T. Freeman, A.A. Efros, J. Sivic, and A. Zis-
For our Model 1, there are 86 fashionable patterns discov- serman, “Using multiple segmentations to discover objects and
ered in 5 parts. In Fig. 2, we show the top ranked fashion- their extent in image collections,” in CVPR, 2006.
able and unfashionable visual patterns discovered by Model [11] G. Kim, C. Faloutsos, and M. Hebert, “Unsupervised modeling
1. For each part, the first 4 images share the same fashionable of object categories using link analysis techniques,” in CVPR,
visual pattern, which is labeled in magenta. We find the dis- 2008.
covered visual patterns are reasonable and make sense. Such [12] J. Yuan and Y. Wu, “Spatial random partition for common
discovered visual patterns can help people understand what visual pattern discovery,” in ICCV, 2007.
makes a dress fashionable or unfashionable (at this specific [13] D. Parikh and K. Grauman, “Interactively building a discrimi-
time). They can also help designers capture the trend of fash- native vocabulary of nameable attributes,” in CVPR, 2011.
ion when creating their own designs.
[14] L.L. Zhu, Y. Chen, A. Yuille, and W. Freeman, “Latent hi-
erarchical structural learning for object detection,” in CVPR,
5. CONCLUSIONS 2010.
[15] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ra-
In this paper, we present a method to discover fashionable manan, “Object detection with discriminatively trained part-
visual patterns and learn fashion classifiers. The generated based models,” TPAMI, 2009.
results can be useful for design and visual shopping. In future,
[16] Y. Wang and G. Mori, “A discriminative latent model of object
we are interested in investigating how to perform relevance classes and attributes,” ECCV, 2010.
feedback for fashionable visual pattern based retrieval. We
are also interested in applying this method to other products [17] C.N.J. Yu and T. Joachims, “Learning structural svms with
latent variables,” in ICML, 2009.
such as handbags and shoes.
[18] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing
objects by their attributes,” in CVPR, 2009.
6. ACKNOWLEDGMENT
[19] B.J. Frey and D. Dueck, “Clustering by passing messages be-
This research is supported by the Singapore National Re- tween data points,” Science, 2007.
search Foundation under its International Research Centre @ [20] S. Tong and D. Koller, “Support vector machine active learn-
Singapore Funding Initiative and administered by the IDM ing with applications to text classification,” The Journal of
Programme Office. Machine Learning Research, 2002.
[21] S. Branson, P. Perona, and S. Belongie, “Strong supervision
7. REFERENCES from weak annotation: Interactive training of deformable part
models,” in ICCV, 2011.
[1] N. Snavely, S.M. Seitz, and R. Szeliski, “Photo tourism: ex-
[22] T. Leung and J. Malik, “Representing and recognizing the vi-
ploring photo collections in 3d,” in TOG, 2006.
sual appearance of materials using three-dimensional textons,”
[2] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, IJCV, 2001.
R. Moore, A. Kipman, and A. Blake, “Real-time human pose
[23] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of fea-
recognition in parts from single depth images,” in CVPR, 2011.
tures: Spatial pyramid matching for recognizing natural scene
[3] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual
categories,” in CVPR, 2006.
surveillance of object motion and behaviors,” Systems, Man,
and Cybernetics, Part C: Applications and Reviews, 2004.

You might also like