0% found this document useful (0 votes)

44 views8 pages

Domain Generalization with UNVP Method

Uploaded by

minliang2kbj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views8 pages

Domain Generalization with UNVP Method

Uploaded by

minliang2kbj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2020 17th Conference on Computer and Robot Vision (CRV)

Domain Generalization via Universal Non-volume Preserving Approach

Dat T. Truong1,3,4 , Chi Nhan Duong2 , Khoa Luu1 , Minh-Triet Tran3,4 , Ngan Le1
1 University of Arkansas, USA
2 Concordia University, Canada
3 University of Science, Ho Chi Minh city, Vietnam
4 Vietnam National University, Ho Chi Minh city, Vietnam
{tt032, khoaluu, thile}@[Link], dcnhan@[Link], tmtriet@[Link]

Abstract—Recognition across domains has recently become

an active topic in the research community. However, it has been
largely overlooked in the problem of recognition in new unseen
domains. Under this condition, the delivered deep network
models are unable to be updated, adapted, or fine-tuned.
Therefore, recent deep learning techniques, such as domain
adaptation, feature transferring, and fine-tuning, cannot be
applied. This paper presents a novel approach to the problem
of domain generalization in the context of deep learning. The
proposed method1 is evaluated on different datasets in various Figure 1. Comparison between domain adaptation (A) and our proposed
problems, i.e. (i) digit recognition on MNIST, SVHN, and domain generalization (B) problems
MNIST-M, (ii) face recognition on Extended Yale-B, CMU-
PIE and CMU-MPIE, and (iii) pedestrian recognition on RGB work models are usually unable to be retrained or fine-tuned
and Thermal image datasets. The experimental results show with the inputs in new unseen domains or environments,
that our proposed method consistently improves performance
as illustrated in Fig. 2. Thus, domain adaptation cannot
accuracy. It can also be easily incorporated with any other
CNN frameworks within an end-to-end deep network design be applied in these problems since the new unseen target
for object detection and recognition problems to improve their domains are unavailable.
performance. Besides, there are some prior works to perform recogni-
tion problems with high accuracy by presenting new loss
I. I NTRODUCTION functions [3] [4] or increasing deep network structures
Deep learning-based detection and recognition studies [5] via mining hard samples in training sets. These loss
have been recently achieving very accurate performance in functions are deployed to deal with hard samples con-
visual applications. However, many such methods assume sidered as unseen domains. However, these methods are
the testing images come from the same distribution as the limited to be generalized in new unseen domains in real-
training ones and often fail when performing in new un- world applications. Some real-world problems are unable
seen domains. Indeed, detection and classification crossing to observe training samples from new unseen domains in
domains have recently become active topics in the research the training process. Therefore, in the scope of this work,
communities. In particular, domain adaptation [1] [2] has there is no assumption about the new unseen domains.
received significant attention in computer vision. In the Our proposed method can be supportively incorporated with
domain adaptation (Fig. 1(A)), we usually have a large- Convolutional Neural Networks (CNNs)-based detection and
scale training set with labels, i.e., the source domain A, classification methods to train within an end-to-end deep
and a small training set with or without labels, i.e., the learning framework to improve the performance potential.
target domain B. The knowledge from the source domain A. Contributions of this Work
A is learned and adapted to the target domain B. During
the testing time, the trained model is deployed only in the This work presents a novel domain generalization ap-
target domain B. Recent results in domain adaptation have proach to learn to better generalize new unseen domains.
shown significant improvement in the many computer vision The restrictive setting is considered in this work where
applications. However, the trained models are potentially there is only single source domain available for training.
deployed not only in the target domain B but also in many Table I summarizes the differences between our approach
other new unseen domains, e.g., C, D, etc. (Fig. 1(B)) in real- and the prior works. Our contributions can be summarized
world applications. In these scenarios, the released deep net- as follows.
A novel approach named Universal Non-volume Preserv-
1 Source code will be publicly available. ing (UNVP) and its extension named Extended Universal

978-1-7281-9891-0/20/$31.00 ©2020 IEEE 93

DOI 10.1109/CRV50864.2020.00021

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.

%
%

$
# "
!

#
!
!
!

$
!
!

Figure 2. The ideas of domain generalization. The deep model is trained Figure 3. Illustration of the proposed UNVP method. The traditional
only in a single domain (A), i.e. RGB images. It is deployed in other unseen classiﬁer fails to model new samples in unseen domains (top). Meanwhile,
domains, i.e. thermal images (B) and infrared images (C). UNVP consistently maintains the feature distribution in each class while
searching for a new shifting domain (bottom).

Non-volume Preserving (E-UNVP) frameworks are ﬁrstly

introduced to generalize environments of new unseen do- minimizing the differences in the marginal distributions of
mains from a given single-source training domain. Secondly, multiple domains, whereas Y. Li [20] proposed an end-to-
the environmental features extracted from the environment end conditional invariant deep domain generalization ap-
modeling via Deep Generative Flows (DGF) and the discrim- proach by leveraging deep neural networks for domain-
inative features extracted from the deep network classifiers invariant representation learning. To address the problem of
are then unified together to provide final generalized deep unseen domains, R. Volpi et al. presented Adversarial Data
features that are robustly discriminative in new unseen do- Augmentation (ADA) [16] to generalize to unseen domains.
mains. Our approach is designed within an end-to-end deep III. T HE P ROPOSED M ETHOD
learning framework and inherits the power of the CNNs.
It can be quickly end-to-end integrated with a CNN-based Far apart from previous augmentation methods that tried
deep network design for object detection or recognition to to generate new samples in image space using prior knowl-
improve the performance. Finally, the proposed method has edge with the hope that these samples can cover unseen
experimented in various visual modalities and applications domains, our approach, on the other hand, focuses on
with consistently improving performances. modeling the environment density as multiple Gaussian
distributions in a deep feature space and uses this knowledge
II. R ELATED W ORK for the generalization process. In this way, the new samples
Domain Adaptation has recently become one of the most are automatically synthesized with more semantic meaning
popular research topics in the field [1] [6] [7] [8] [2]. Ganin while consistently maintaining the feature structures (see
et al. [1] proposed to incorporate both classification and Fig. 3). Thus, without the need to see the samples in target
domain adaptation to a unified network so that both tasks can domains, the method is still able to handle the domain
be learned together. Similarly, Tzeng et al. [2] later intro- shifting effectively and robustly achieves high performance
duced a unified framework for Unsupervised Domain Adap- in these unseen domains.
tation based on adversarial learning objectives (ADDA). It In particular, the proposed UNVP and E-UNVP ap-
uses a loss function in a discriminator to be solely dependent proaches present a new tractable CNN deep network to ex-
on its target distribution. Liu et al. [9] presented Coupled tract the deep features of samples in the source environment
Generative Adversarial Network (CoGAN) for learning a and formulate their probability densities to multiple Gaus-
joint distribution of multi-domain images. It is then applied sian distributions (Fig. 3). From these learned distributions, a
to domain adaptation. density-based augmentation approach is employed to expand
Domain Generalization aims to learn a classification data distributions of the source environment for generalizing
model from a single-source domain and generalize that to different unseen domains. This architecture design allows
knowledge to achieve high performance in unseen target do- unifying deep feature modeling and distribution modeling
mains robustly. To learn a domain-invariant feature represen- within an end-to-end framework.
tation, M. Ghifary et al. [17] used multi-view autoencoders The proposed framework consists of two main streams:
to perform cross-domain reconstructions. Later, [18] intro- (1) Discriminative feature modeling with a deep network
duced MMD-AAE to learn a feature representation by jointly classifier; and (2) Deep Generative Flows to model the
optimizing a multi-domain autoencoder regularized via the domain variations in the form of distributions. They are
Maximum Mean Discrepancy (MMD) distance. Recently, K. together going through an end-to-end learning process that
Muandet et al. [19] presented a kernel-based algorithm for alternatively minimizes the within-class distributions and

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.
Table I: Comparison in the properties between our proposed approaches (UNVP and E-UNVP) and other recent methods,
where 55represents not applicable properties. Gaussian Mixture Model (GMM), Probabilistic Graphical Model (PGM),
Convolutional Neural Network (CNN), Adversarial Loss (adv ), Log Likelihood Loss (LL ), Cycle Consistency Loss (cyc ),
Discrepancy Loss (dis ) and Cross-Entropy Loss (CE ).
Domain Loss End-to Target-domain Target-domain Deployable
Architecture
Modelity Function -End sample-free label-free Domains
FT [10] Transfer Learning CNN 2 51 55 55 Two
UBM [11] Adaptation GMM LL 55 55 51 Any
DANN [1] Adaptation CNN adv 51 55 51 Two
CoGAN [9] Adaptation CNN+GAN adv 51 55 51 Two
I2IAdapt [12] Adaptation CNN+GAN adv + cyc 51 55 51 Two
ADDA [13] Adaptation CNN+GAN adv 51 55 51 Two
MCD [14] Adaptation CNN+GAN adv + dis 51 55 51 Two
CrossGrad [15] Generalization Bayesian Net CE 51 51 51 Any
ADA [16] Generalization CNN CE 51 51 51 Any
Our UNVP Generalization PGM+CNN LL + CE 51 51 51 Any
Our E-UNVP Generalization PGM+CNN LL + CE 51 51 51 Any

synthesizing new useful samples to generalize to new unseen from a real data distribution to a Gaussian distribution in
domains. Notice that our proposed framework does not latent space. Then, we can model the environment variation
require the presence of samples in the target domains during via deviations from the Gaussian distributions of all classes
the training process. in a latent domain. When F is well-defined with tractable
computation of its Jacobian determinant, the two-way con-
A. Domain Variation Modeling as Distributions
nection, i.e., inference and generation, is existed for x and
This section aims at learning a Deep Generative Flow z.
model, i.e. function F, that maps an image x in image space The prior class distributions. Motivated from these prop-
I to its latent representation z in latent domain Z such erties, given C classes, we choose C Gaussian distribu-
that the density function pX (x) can be estimated via the tions with different means {μ1 , μ2 , .., μC } and covariances
probability density function pZ (z). Then via F, rather than {Σ1 , Σ2 , ..., ΣC } as prior distributions for these classes, i.e.
representing the environment variation, i.e. pX (x), directly zc ∼ N (μc , Σc ). It is worth noting that even when we
in the image space, it can be easily modeled via variables in choose Gaussian Distributions, our framework is not limited
latent space, i.e. pZ (z), with more semantic manner. When to other distribution types.
pZ (z) follows prior distributions, all samples in the given
Mapping function structure. To enforce the information
domain can be effectively modeled in the forms of latent
flow from an image domain to a latent space with different
distributions.
abstraction levels, we formulate the mapping function F as
Structure and Variable Relationship. Let x ∈ I be a data
a composition of several sub-functions fi as follows.
sample in image domain I, y be its corresponding class
label, and z = F(x, y, θ) where θ denotes the parameters of F = f1 ◦ f2 ◦ ... ◦ fN (3)
F, the probability density function of x can be formulated
via the change of variable formula as follows: where N is the number of sub-functions. The Jacobian ∂F
∂x
∂F(z, y; θ) ∂f1 ∂f2 ∂fN
pX (x, y; θ) = pZ (z, y; θ) (1)
can be derived by ∂F∂x = ∂x · ∂f1 · · · ∂fN −1 . With this struc-
∂x ture, the properties of each fi will define the properties for
where pX (x, y) and pZ (z, y; θ) define the distributions of the whole mapping function F. For example, if the Jacobian
samples of class y in image and latent domains, respectively. of ∂f
∂x is tractable, then F is also tractable. Furthermore, if
1

∂F (z,y;θ) fi is a non-linear function built from a composition of CNN

denotes the Jacobian matrix with respect to x.
∂x
layers then F becomes a deep convolution neural network.
Then the log-likelihood is computed by.
There are several ways to construct the sub-functions, i.e.
different CNN structures for non-linearity property.
∂F(z, y; θ)
log pX (x, y; θ) = log pZ (z, y; θ) + log
(2)
∂x f (x) = bx+(1−b) [x exp (S(b x) + T (b x)] (4)
Eqns. (1) and (2) provide two facts: (1) learning the density
function of samples in class y is equivalent to estimate the where b = [1, ..., 1, 0, ..., 0] is a binary mask, and is the
density of its latent representation z and determinant of Hadamard product. S and T deﬁne the scale and translation
the associated Jacobian matrix ∂F ∂x ; and (2) if the latent
functions during mapping process.
distribution pZ is deﬁned as a Gaussian distribution, the Learning the mapping function and Environment Model-
learned function F explicitly becomes the mapping function ing. To learn the parameter θ for mapping function F, the

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.
where {X, Y} denotes the images and their labels; d(·, ·) is
the distance between probability distributions; PX src
(X, Y)
and PX (X, Y) are the density distributions of the source
and current expanded environments, respectively.
src
Since both PX and PX are density distributions, the
src
Wasserstein distance with respect to PX and PX can be
adopted. Notice that from previous section, we have leaned
a mapping function F that maps the density functions from
image space, i.e. PX , to prior distributions in latent space,
i.e. PZ . Moreover, since F is invertible with the speciﬁc
formula of its sub-functions, computing d(PX , PX src
) is
equivalent to d(PZ , PZsrc ). From this, we can efﬁciently
Figure 4. The distributions: (A) MNIST. (B) MNIST-M using a Pure-CNN estimate cost as the transformation cost between Gaussian
trained on MNIST, (C) MNIST-M using our UNVP trained on MNIST. (D) distributions. Then d(PX , PX src
) is reformulated by.
MNIST-M using our E-UNVP trained on MNIST.
src
d(PX , PX ) = d(PZ , PZsrc )

= inf E [cost (F(xc ), F (xsrc
c ))]
log-likelihood in Eqn. (2) is maximized as follows. c xc ,xsrc
c (7)

∗
θ = arg max i
log pX (x , c; θ) (5) = inf E [cost (zc , zsrc
c )]
θ c zc ,zsrc
c
c i

Notice that after learning the mapping function, all images where cost(·, ·) denotes the transformation cost between
of all classes are mapped into the corresponding distribu- Gaussian distributions:
2 src
src 2
tions of their classes. Then the environment density can be cost (zc , zc ) = ||μc − μc ||2
considered as the composition of these distributions. Figure c (8)
1/2 1/2 1/2
4(A) illustrated an example of the learned environment dis- +Tr(Σsrc
c + Σc − 2((Σsrc
c ) Σc (Σsrc
c ) ) )
tributions of MNIST with 10 digit classes. When only sam-
{μc , Σc } and {μc , Σc }
are the means and covariances of the
ples in MNIST are used for training, the density distributions
distributions of class c in the source and the expanded en-
of MNIST-M, i.e., unseen during training, using Pure-CNN,
vironment, respectively. Plugging this distance and applying
in our UNVP and E-UNVP are shown in Fig. 4 (B, C, D),
the Lagrangian relaxation to Eqn. (6), we have
respectively. In the next section, a generalization approach is
src
proposed so that using only samples in a source environment, arg min sup E [(X, Y; M, F , θ, θ1 )] − α · d(PX , PX )
θ1
the learned model can expand the density distributions of
P

the source environment so that they can cover as much as = arg min sup{(x, c; M, F, θ, θ1 ) − α · cost(F(x), F(xsrc
c ))}
θ1 x
c
possible the distributions of unseen environments.
To solve this objective function, the optimization process
B. Unseen Domain Generalization can be divided into two alternative steps: (1) generate the
After modeling the source environment variation as the sample x for each class such that
compositions of its class distributions, this section introduces x = arg max{(x, c; M, F, θ, θ1 ) − α · cost(F(x), F(xsrc
c ))}
x
the generalization process of these distributions with respect (9)
to a classiﬁcation model M such that the expansion of these and consider x as a new “hard” example for class c; and
distributions can help M generalize to unseen environments (2) add x to the training data and optimize the model M.
with high accuracy. Notice that M can be any type of Deep In other words, this two-step optimization process aims
CNN such as LeNet [21], AlexNet [22], VGG [23], ResNet at ﬁnding new samples belonging to distributions that are
[5], DenseNet [24]. ρ distance far away from the distributions of the source
Let (X, Y; M, F, θ, θ1 ) be the training loss function of environment, and making M became more robust when
M, and θ1 be the parameters of M. The generalization classifying these examples. In this way, after a certain of it-
process of M can be formulated as updating the parameters eration, the distributions learned from M can be generalized
θ1 such that it can robustly classify the samples having latent so that they can cover as much as possible the distributions
distributions that are distance ρ away from the samples in of new unseen environments.
the source environment. Then, the objective function of M
is formulated as. C. Universal Non-volume Preserving (UNVP) Models
The proposed UNVP consists of two main branches:
arg min sup E [(X, Y; M, F, θ, θ1 )] (6) (1) Discriminative Feature Modeling and (2) Generative
θ1 src )≤ρ
P :d(PX ,PX

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.
Notice that in the structure of F, the choice of Gaussian
distributions for all classes play an important role and
directly affects the performance of the generative model. By
varying the choices for these distributions, different variants
of UNVP can be introduced.
Universal Non-volume Preserving Models (UNVP)::
The means and covariances of Gaussian distributions are
pre-defined for all C classes where μc = 1c; Σ = I; zc ∼
N (μc , I) where 1 is the all-one vector.
Extended Universal Non-volume Preserving Models (E-
UNVP):: Rather than fixing the means and covariances of
the Gaussian distributions of C classes, we consider them
as variables and flexibly learned during the environment
modeling as well as adjusted during domain generalization.
Particularly, given the class label c, F maps each sample xc
to a Gaussian distribution with the mean and covariance as.
μc = γGm (c) + λHm (n)
(10)
Σc = Gstd (c)
Figure 5. The Training Process of Our proposed UNVP. consists
of one pre-training step and a two-stage optimization by alternatively where Gm (c) and Gstd (c) denote the learnable function
minimizing the within-class distributions and synthesizing new samples for that map label c to the mean and covariance values of its
generalization. Gaussian distribution. n is a noise signal that is generated
following the normal distribution. Hm (n) defines the allow-
able shifting range of the Gaussian given the noise signal
Distribution Modeling. While the discriminative part focuses
n. γ and λ are user-defined parameters that control the
on constructing a classifier that minimizes within-class dis-
separation of the Gaussian Distributions between different
tributions, the generative one aims at embedding samples
classes and the contribution of Hm (n) to μc . We choose the
of all classes into their corresponding latent distributions
Fully Connected structure for Gm (c) and Gstd (c) that take
and then expanding these distributions for generalization.
the input c in the form of one-hot vector while Convolutional
Fig. 5 illustrates the whole end-to-end joint training process
Layer is adopted for Hm (n).
for UNVP where the generative part, i.e., Deep Generative
Flow F, is firstly employed to learn the mapping from
IV. D ISCUSSION
image space to Gaussian distributions in latent space. Then
a two-stage training process is adopted to learn the Deep As shown in Fig. 3, by exploiting the Generative Flows
Classifier M and adjust the Deep Generative Flow F for that model samples of each class as a Gaussian in semantic
generalization. feature space, the proposed UNVP can robustly maintain the
In the first stage of this process, given a training dataset, feature structure of each class while expanding and shifting
both parameters {θ, θ1 } of the mapping function F and the the domain distributions. In this way, we can generate more
classifier M are updated according to the loss function as. useful “hard” samples for the generalization process.
By introducing the noise signal n, we allow the Gaussian
(X, Y; M, F, θ, θ1 ) =CE (M(X; θ1 ), Y − log pX (X, Y; θ)
distribution of each class shifting around within a limited
where the first term is the cross-entropy loss for M and the range, i.e., Hm (n). This further enhances the robustness of
second term is the log-likelihood of F. E-UNVP against noise during the environment modeling.
In the second stage, we adapt the generalization process as To further enhance the capability of modeling the input
presented in Sec. III-B and Eqn. (9) to synthesize new “hard” signal with high-resolution, we incorporate the activation
samples. Notice that, to further constraint the perturbation normalization and invertible 1 × 1 convolution operators
in latent space, we incorporate another regularization term [25] to the structure of each sub-function fi in Eqn. (3).
to Eqn. (7) as. Particularly, the input to each fi is passed through an
src actnorm layer followed by an invertible 1 × 1 convolution
2 src 2
cost (zc , zc ) = ||μc − μc ||2 before being transformed by S and T as in Eqn. (4). The
c two transformations S and T are defined by two Residual
1/2 1/2 1/2
+ Tr(Σsrc
c + Σc − 2((Σsrc
c ) Σc (Σsrc
c ) ) ) Networks with rectifier non-linearity and skip connections.
2
+ ||M(Xc ) − M(Xsrc
c )||2 Each of them contains three residual blocks. For input image
with the resolution higher than 128×128, six residual blocks
New generated samples are then added to the training set
are set for S and T .
and used for updating both branches of UNVP.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.
Table II: Ablative experiment results (%) on the effectiveness of the parameters λ, α and β that control the distribution
separation and shitting range. MNIST is used as the only training set, MNIST-M is used as the unseen testing set.
λ α β(%)
Dataset Methods
0.01 0.1 1.0 0.01 0.1 1.0 0% 1% 10% 20% 30%
Pure-CNN 99.28
MNIST UNVP − − − 99.33 99.18 99.30 99.28 99.28 99.35 99.30 99.36
E-UNVP 99.22 99.42 99.40 99.13 99.31 99.42 99.28 99.36 99.34 99.42 99.43
Pure-CNN 55.90
MNIST-M UNVP − − − 58.18 60.76 59.44 55.90 59.99 57.24 59.44 55.11
E-UNVP 59.83 60.49 59.47 56.92 61.70 60.49 55.90 57.10 60.49 61.70 60.49

V. E XPERIMENTS various percentages of “hard” generated samples (β), i.e.,

This section first shows the effectiveness of our proposed from 0% to 30%. When β = 0, there are no new samples.
methods with comprehensive ablative experiments. In these There are two phases alternatively updated in the training
experiments, we use MNIST as the only the training set process: (1) Minimization phase to optimize the networks
and MNIST-M as the unseen testing set. The proposed and (2) Maximization (perturb) phase to generate new hard
approaches are also benchmarked on various deep network examples. We do K times of the maximization phase, for
structures, i.e. LeNet [21], AlexNet [22], VGG [23], ResNet each time, we randomly select β percent of the number
[5] and DenseNet [24]. Using the final optimal model, we of training images to generate new hard samples via deep
show in the next subsection that our approaches consistently generative models. Specifically, our maximization phase
achieve the state-of-the-art results in digit recognition on generalizes new images based on both semantic features
three-digit datasets, i.e., MNIST, SVHN [26], and MNIST- from the CNN classifier and the semantic space via the
M. Then, we show the results of our proposed approaches estimation of environment density. The experimental results
in face recognition in three databases, i.e. Extended Yale-B in Table II show that the proposed approaches consistently
[27], CMU-PIE [28] and CMU-MPIE [29]. We use facial help to improve the classifiers.
images with normal illumination as the training domain and Sample Distributions in Unseen Domains. The sample
the ones in dark illumination conditions as the testing set on class distributions with the optimal parameter set are used
the new unseen domains. Finally, we show the advantages to visually observed and demonstrated in Fig. 4. While
of UNVP and E-UNVP in the cross-domain pedestrian Pure-CNN obviously fails to model unseen domain MNIST-
recognition on the Thermal Database. M dataset, our UNVP successfully does domain shift and
cover unseen domain dataset. These sample distributions are
A. Ablation Study completely class separated when using our E-UNVP.
This experiment aims to measure the effectiveness of Backbone Deep Networks. This section evaluates the
the domain generalization and perturbation processes This robustness and the consistent improvements of UNVP and
experiment uses MNIST as the only training set and MNIST- E-UNVP with common deep networks, including LeNet,
M as the testing one. To simplify the experiment, LeNet [21] AlexNet, VGG, ResNet, and DenseNet, as in Table III. The
is used as the classifier, i.e., Pure-CNN. About the network proposed UNVP and E-UNVP consistently outperform the
hyper-parameters, we choose the learning rate and the batch
size to 0.0001 and 128, respectively.
Table III: Experimental results (%) when using UNVP and
Hyper-parameter Settings. In the GLOW learning process,
E-UNVP in various common CNNs.
the multiple Gaussian distributions are handled via the set Networks Methods MNIST MNIST-M
of scale parameters, i.e., γ and λ, to control the distribution Pure-CNN 99.06 55.90
separation and shitting range as in Eqn. (10). The contribu- LeNet UNVP 99.30 59.44
tions of the generalization process are also evaluated with E-UNVP 99.42 61.70
Pure CNN 99.17 40.12
AlexNet UNVP 98.81 39.94
E-UNVP 98.89 40.60
Pure CNN 99.43 50.67
VGG UNVP 99.42 54.41
E-UNVP 99.40 51.37
Pure CNN 98.01 35.35
ResNet UNVP 98.82 37.15
E-UNVP 98.97 40.60
Pure CNN 99.23 41.16
Figure 6. Examples in (A) MNIST, (B) MNIST-M, and (C) SVHN DenseNet UNVP 99.42 41.98
databases E-UNVP 99.14 43.72

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.
Table IV: Results (%) on three digit datasets. ADA and Table V: Results (%) on Extended Yale-B [27], CMU-PIE
ours do not require target data in training. ADDA, DANN [28] and CMU-MPIE [29] databases. ADA and ours do not
require training data from target domains in training. require target domain data during training while ADDA
Methods MNIST SVHN MNIST-M does.
ADDA 99.29 32.20 63.39 E-Yale-B CMU-PIE CMU-MPIE
Method
DANN − − 76.66 N D N D N D
Pure-CNN 99.06 31.96 55.90 ADDA 99.17 75.28 96.09 70.33 99.93 97.71
ADA 99.17 37.87 60.02 Pure-CNN 98.50 51.39 95.59 62.18 99.93 94.74
UNVP 99.30 41.23 59.45 ADA 99.00 53.08 96.49 62.69 99.92 96.08
E-UNVP 99.42 42.87 61.70 UNVP 99.17 58.24 96.32 64.88 99.83 98.25
E-UNVP 99.54 62.95 97.55 66.89 99.93 98.03
stand-alone classifier (Pure-CNN) using the same network
configuration in all experiments. Particularly, it helps to against the other baseline methods, i.e., Pure-CNN, ADA,
improve 6%, 0.5%, 4%, 5%, 2% on MNIST-M using and ADDA, on three face recognition databases, including
LeNet, AlexNet, VGG, ResNet and DenseNet respectively. Extended Yale-B, CMU-PIE, and CMU-MPIE. In each
The proposed methods can be easily integrated with database, we select the face images with normal lighting
standard CNN deep networks. Therefore, it potentially can as the source domain, i.e., Normal illumination (N), and the
be applied to improve the performance in many existed face images with dark lighting as the target domain, i.e.,
CNN-based applications, e.g., detection and recognition, that Dark illumination (D). Each database is randomly split into
are experimented in the next sections. two sets: a training set (80%) and a testing set (20%). The
experimental framework structures are similar to the one in
B. Digit Recognition on Unseen Domains digit recognition. All cropped face images are resized to
The proposed approaches have experimented in digit 64 × 64 pixels. The experimental results in Table V show
recognition on new unseen domains with two other digit that our proposed methods help to improve the recognition
databases, i.e., MNIST-M and SVHN (Fig. 6). In this ex- performance on new unseen domains where the lighting
periment, MNIST is the only database used to train the conditions are unknown. Particularly, it helps to improve
classifier. Then, two other datasets, i.e., MNIST-M and approximately 11%, 4% and 3% in dark lighting conditions
SVHN, are used as the new unseen domains to benchmark on Extended Yale-B, CMU-PIE and CMU-MPIE databases
the performance. The classifier is trained using 50,000 respectively.
images of MNIST. In order to generalize an image phase,
we use 10,000 images in this set to perturb and generalize D. Pedestrian Recognition on Unseen Domains
new samples. All digit images are resized to 32 × 32. This experiment aims to improve RGB-based pedestrian
We benchmark the learned classifiers on MNIST and two recognition on thermal images on the Thermal Dataset2 .
other unseen digit datasets, i.e., SVHN and MNIST-M. The There are two datasets organized in this experiment: (1)
results using our approach are compared against the LeNet RGB pedestrian and (2) Thermal pedestrian. The methods
classifier (Pure-CNN), and the Adversarial Data Augmenta- are trained only on the RGB pedestrian dataset and tested
tion (ADA). We also show the recognition results on these on the Thermal pedestrian dataset. In the training phase,
datasets using the Domain Adaptation methods, including we use 2, 000 images to generalize new images, and all
Adversarial Discriminative Domain Adaptation (ADDA), images of two datasets are resized to 128 × 128 pixels. The
Domain-Adversarial Training of Neural Networks (DANN) experimental results in Table VI show that our proposed
[1]. It is noticed that Pure-CNN, ADA, and our approaches methods consistently help to improve the performance of
do not require the target domain data during training. Mean- the Pure-CNN in various common deep network structures,
while, ADDA, DANN require the target domain data in the including LeNet, AlexNet, VGG, ResNet, and DenseNet.
training steps.
VI. C ONCLUSIONS
Our generalization phase synthesizes images based on
semantic space via the estimation of environment density. This paper has introduced the novel deep learning based
It helps our generated images to be more diverse than the domain generalization approach that generalizes well to
synthesized images using the ADA method. The experimen- different unseen domains. Only using training data from
tal results are shown in Table IV. The proposed methods a source domain, we propose an iterative procedure that
consistently achieve state-of-the-art performance on these augments the dataset with samples from a fictitious target
datasets. Notably, it helps to improve approximately 11% domain that is hard under the current model. It can be easily
and 6% on SVHN and MNIST-M, respectively. integrated with any other CNN based framework within an
end-to-end network to improve the performance. On digit
C. Face Recognition on Unseen Domains recognition, the proposed method has been benchmarked
In this experiment, the proposed approaches are ap-
2 [Link]
plied in unseen environment face recognition and compared

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.
Table VI: Results (%) on RGB and Thermal pedestrian [11] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker ver-
databases with various common deep network structures. iﬁcation using adapted gaussian mixture models,” in Digital
Networks Methods RGB Thermal Signal Processing, 2000.
Pure-CNN 95.45 79.72
LeNet [12] Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and
E-UNVP 97.25 90.29
K. Kim, “Image to image translation for domain adaptation,”
Pure CNN 96.64 81.38
AlexNet in CVPR, June 2018.
E-UNVP 97.04 82.98
Pure CNN 97.54 95.60 [13] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial
VGG
E-UNVP 98.64 98.38 discriminative domain adaptation,” CVPR, 2017.
Pure CNN 98.52 96.07
ResNet
E-UNVP 98.56 98.35 [14] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, “Maximum
Pure CNN 98.39 95.87 classiﬁer discrepancy for unsupervised domain adaptation,” in
DenseNet
E-UNVP 98.60 96.14 CVPR, 2018.

on three popular digit recognition datasets and consistently [15] S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi,
showed the improvement. The method is also experimented and S. Sarawagi, “Generalizing across domains via cross-
gradient training,” 2018.
in face recognition on three standard databases and out-
performs the other state-of-the-art methods. In the problem [16] R. Volpi, H. Namkoong, O. Sener, J. C. Duchi, V. Murino, and
of pedestrian recognition, we empirically observe that the S. Savarese, “Generalizing to unseen domains via adversarial
proposed method learns models that improve performance data augmentation,” NIPS, 2018.
across a priori unknown data distributions. [17] M. Ghifary, W. Bastiaan Kleijn, M. Zhang, and D. Balduzzi,
“Domain generalization for object recognition with multi-task
VII. ACKNOWLEDGEMENT autoencoders,” in ICCV, 2015.
In this project, Dat T. Truong and Minh-Triet Tran are par-
[18] H. Li, S. Jialin Pan, S. Wang, and A. C. Kot, “Domain
tially supported by Vingroup Innovation Foundation (VINIF)
generalization with adversarial feature learning,” in CVPR,
in project code VINIF.2019.DA19. 2018.
R EFERENCES [19] K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain gener-
[1] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation alization via invariant feature representation,” in ICML, 2013.
by backpropagation,” in ICML, 2015.
[20] Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and
[2] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial D. Tao, “Deep domain generalization via conditional invariant
discriminative domain adaptation,” July 2017. adversarial networks,” in ECCV, 2018.

[21] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-

[3] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A
based learning applied to document recognition,” Proceedings
unified embedding for face recognition and clustering,” in
of the IEEE, 1998.
CVPR, June 2015.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
[4] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao, “Range loss classification with deep convolutional neural networks,” in
for deep face recognition with long-tailed training data,” in NIPS, 2012.
ICCV, 2017.
[23] K. Simonyan and A. Zisserman, “Very deep convolutional
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning networks for large-scale image recognition,” in ICLR, 2015.
for image recognition,” in CVPR, 2016.
[24] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger,
[6] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simulta- “Densely connected convolutional networks,” in CVPR, 2017.
neous deep transfer across domains and tasks,” CoRR, 2015.
[25] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with
[7] O. Sener, H. O. Song, A. Saxena, and S. Savarese, “Learning invertible 1x1 convolutions,” in NIPS, 2018.
transferrable representations for unsupervised domain adap-
tation,” in NIPS, 2016. [26] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and
A. Y. Ng, “Reading digits in natural images with unsupervised
[8] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, feature learning,” in NIPSW, 2011.
“Deep domain confusion: Maximizing for domain invari-
ance,” CoRR, 2014. [27] A. Georghiades, P. Belhumeur, and D. Kriegman, “From few
to many: Illumination cone models for face recognition under
[9] M.-Y. Liu and O. Tuzel, “Coupled generative adversarial variable lighting and pose,” TPAMI, 2001.
networks,” in Advances in Neural Information Processing
[28] T. Sim, S. Baker, and M. Bsat, “The cmu pose, illumination,
Systems 29, 2016.
and expression (pie) database,” in FG, 2002.
[10] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Feature [29] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker,
transfer learning for deep face recognition with long-tail “Multi-pie,” IVC, 2010.
data,” CoRR, 2018.

100

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on July 27,2020 at [Link] UTC from IEEE Xplore. Restrictions apply.

Unsupervised Domain Adaptation with GANs
No ratings yet
Unsupervised Domain Adaptation with GANs
15 pages
Adversarial Discriminative Domain Adaptation
No ratings yet
Adversarial Discriminative Domain Adaptation
10 pages
Batch Normalization for Domain Generalization
No ratings yet
Batch Normalization for Domain Generalization
15 pages
Unsupervised Pixel-Level Domain Adaptation
No ratings yet
Unsupervised Pixel-Level Domain Adaptation
15 pages
Deep Domain Adaptation & Generalization
No ratings yet
Deep Domain Adaptation & Generalization
11 pages
A2XP: Privacy-Preserving Domain Generalization
No ratings yet
A2XP: Privacy-Preserving Domain Generalization
10 pages
GCN for Unsupervised Medical Image Adaptation
No ratings yet
GCN for Unsupervised Medical Image Adaptation
14 pages
Deep Visual Domain Adaptation Survey
No ratings yet
Deep Visual Domain Adaptation Survey
76 pages
Relation-Enhanced Framework for Object Detection
No ratings yet
Relation-Enhanced Framework for Object Detection
12 pages
Transferable Vision Transformer for UDA
No ratings yet
Transferable Vision Transformer for UDA
11 pages
Cross-Domain Visual Segmentation UDA
No ratings yet
Cross-Domain Visual Segmentation UDA
6 pages
Discovering Latent Domains in DA
No ratings yet
Discovering Latent Domains in DA
10 pages
Survey on Domain Generalization Advances
No ratings yet
Survey on Domain Generalization Advances
19 pages
Adversarial Domain Adaptation Survey
No ratings yet
Adversarial Domain Adaptation Survey
41 pages
Domain-Adversarial Neural Training
No ratings yet
Domain-Adversarial Neural Training
35 pages
Domain Adaptation in Machine Learning
No ratings yet
Domain Adaptation in Machine Learning
73 pages
Domain Adversarial Networks for Generalization
No ratings yet
Domain Adversarial Networks for Generalization
42 pages
Domain Generalization with Variational Encoding
No ratings yet
Domain Generalization with Variational Encoding
14 pages
Heterogeneous Graph Attention for UDA
No ratings yet
Heterogeneous Graph Attention for UDA
12 pages
Domain-Enriched Deep Learning Techniques
No ratings yet
Domain-Enriched Deep Learning Techniques
35 pages
bmvc14 Sun Fromvirtualtoreal
No ratings yet
bmvc14 Sun Fromvirtualtoreal
12 pages
Unsupervised Domain Adaptation Method
No ratings yet
Unsupervised Domain Adaptation Method
11 pages
Domain Generalization: Comprehensive Survey
No ratings yet
Domain Generalization: Comprehensive Survey
20 pages
Domain-Adversarial Neural Training
No ratings yet
Domain-Adversarial Neural Training
35 pages
Unsupervised Deep Domain Adaptation Survey
No ratings yet
Unsupervised Deep Domain Adaptation Survey
46 pages
Adapting Neural Networks for UDA
100% (1)
Adapting Neural Networks for UDA
89 pages
Few-Shot Adversarial Domain Adaptation
No ratings yet
Few-Shot Adversarial Domain Adaptation
11 pages
Depth-Aware Scene Adaptation Framework
No ratings yet
Depth-Aware Scene Adaptation Framework
11 pages
Continuous Domain Adaptation in Healthcare
No ratings yet
Continuous Domain Adaptation in Healthcare
5 pages
CDADA for Cross-Domain Remote Sensing
No ratings yet
CDADA for Cross-Domain Remote Sensing
5 pages
Hybrid Domain Attention for Generalization
No ratings yet
Hybrid Domain Attention for Generalization
12 pages
Visual Domain Adaptation in Deep Learning
No ratings yet
Visual Domain Adaptation in Deep Learning
21 pages
DAFormer: Advancing UDA & DG in Segmentation
No ratings yet
DAFormer: Advancing UDA & DG in Segmentation
16 pages
Pseudo-Labeling in Covariate Shift
No ratings yet
Pseudo-Labeling in Covariate Shift
31 pages
Deep Learning for Aerial Image Segmentation
No ratings yet
Deep Learning for Aerial Image Segmentation
44 pages
CVAE-GAN for Fine-Grained Image Synthesis
No ratings yet
CVAE-GAN for Fine-Grained Image Synthesis
10 pages
Associative Domain Adaptation Method
No ratings yet
Associative Domain Adaptation Method
11 pages
Efficient Defect Detection via Domain Adaptation
No ratings yet
Efficient Defect Detection via Domain Adaptation
8 pages
Enhance CNN Accuracy with Min-Max Objective
No ratings yet
Enhance CNN Accuracy with Min-Max Objective
7 pages
Deep Visual Domain Adaptation Survey
No ratings yet
Deep Visual Domain Adaptation Survey
20 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Meta-Learning for Domain Generalization
No ratings yet
Meta-Learning for Domain Generalization
44 pages
Advanced Image Data Augmentation Survey
No ratings yet
Advanced Image Data Augmentation Survey
32 pages
OVANet: One-vs-All for Domain Adaptation
No ratings yet
OVANet: One-vs-All for Domain Adaptation
12 pages
Visual Domain Prompt Generation for Adaptation
No ratings yet
Visual Domain Prompt Generation for Adaptation
20 pages
Domain Adaptation with Prompt Learning
No ratings yet
Domain Adaptation with Prompt Learning
10 pages
ICON: Invariant Consistency for UDA
No ratings yet
ICON: Invariant Consistency for UDA
17 pages
Sensor Data Transformation for Gait Detection
No ratings yet
Sensor Data Transformation for Gait Detection
8 pages
Multi Distance Metric Network For Few Shot Learning: Farong Gao Lijie Cai Zhangyi Yang Shiji Song Cheng Wu
No ratings yet
Multi Distance Metric Network For Few Shot Learning: Farong Gao Lijie Cai Zhangyi Yang Shiji Song Cheng Wu
12 pages
Disentangled Prompts for Domain Generalization
No ratings yet
Disentangled Prompts for Domain Generalization
10 pages
Advancements in Domain Generalization
No ratings yet
Advancements in Domain Generalization
27 pages
DINO Teacher for Domain Adaptive Detection
No ratings yet
DINO Teacher for Domain Adaptive Detection
16 pages
Tiny Object Recognition
No ratings yet
Tiny Object Recognition
8 pages
Jimaging 09 00046 v2
No ratings yet
Jimaging 09 00046 v2
26 pages
Multi-Source Domain Adaptation Framework
No ratings yet
Multi-Source Domain Adaptation Framework
19 pages
Contrastive Adaptation Network for UDA
No ratings yet
Contrastive Adaptation Network for UDA
10 pages
Curriculum Domain Adaptation for Segmentation
No ratings yet
Curriculum Domain Adaptation for Segmentation
12 pages
Hybrid Resampling for Imbalanced Big Data
No ratings yet
Hybrid Resampling for Imbalanced Big Data
11 pages
Enhanced (α, k)-Anonymity Model
No ratings yet
Enhanced (α, k)-Anonymity Model
6 pages
Metaheuristic Algorithms via Complex Networks
No ratings yet
Metaheuristic Algorithms via Complex Networks
6 pages
Bayesian Transfer Learning Algorithms
No ratings yet
Bayesian Transfer Learning Algorithms
45 pages
Tri-Training: Enhancing Learning with Unlabeled Data
No ratings yet
Tri-Training: Enhancing Learning with Unlabeled Data
13 pages
Measuring Strangeness in Attractors
No ratings yet
Measuring Strangeness in Attractors
4 pages
Government Schemes for Weaker Sections
No ratings yet
Government Schemes for Weaker Sections
7 pages
Class 12 Python Binary File Examples
No ratings yet
Class 12 Python Binary File Examples
4 pages
Interpersonal Skills Assessment Report
No ratings yet
Interpersonal Skills Assessment Report
7 pages
Child Development Milestones and Nursing Interventions
No ratings yet
Child Development Milestones and Nursing Interventions
4 pages
Female vs Male Performance in Language Classes
No ratings yet
Female vs Male Performance in Language Classes
3 pages
RLS and LSE Examples in Regression
No ratings yet
RLS and LSE Examples in Regression
16 pages
The River of Dreams - DLP
No ratings yet
The River of Dreams - DLP
11 pages
Stanford CA-1 Anesthesia Guide 2020
No ratings yet
Stanford CA-1 Anesthesia Guide 2020
101 pages
RPMS Portfolio for Teacher Evaluation 2021-2022
No ratings yet
RPMS Portfolio for Teacher Evaluation 2021-2022
32 pages
Literary Theories in Philippine Literature
No ratings yet
Literary Theories in Philippine Literature
20 pages
Baxter's Product Development Methodology
No ratings yet
Baxter's Product Development Methodology
1 page
2023 Performance Management Strategies
No ratings yet
2023 Performance Management Strategies
30 pages
AI Overview: Concepts and Applications
No ratings yet
AI Overview: Concepts and Applications
53 pages
CBSE English Communicative Exam Guide
No ratings yet
CBSE English Communicative Exam Guide
10 pages
ENGL 100 Course Overview and Details
No ratings yet
ENGL 100 Course Overview and Details
3 pages
Understanding Data Modeling Essentials
No ratings yet
Understanding Data Modeling Essentials
1 page
BZU Rechecking Application Form
100% (5)
BZU Rechecking Application Form
2 pages
OCR GCSE Computer Science Exam Practice
50% (2)
OCR GCSE Computer Science Exam Practice
7 pages
Mechanical Engineer Resume Nityanand Zunjur
No ratings yet
Mechanical Engineer Resume Nityanand Zunjur
2 pages
Essential Self Management Skills Guide
No ratings yet
Essential Self Management Skills Guide
9 pages
Life Cycle Stages of Organisms
No ratings yet
Life Cycle Stages of Organisms
3 pages
Polynomial Operations Answer Key
No ratings yet
Polynomial Operations Answer Key
1 page
Simple Present Tense Lesson Plan
No ratings yet
Simple Present Tense Lesson Plan
2 pages
Decision Support Systems at HUL
No ratings yet
Decision Support Systems at HUL
6 pages
Reality Therapy in Ethical Practice
No ratings yet
Reality Therapy in Ethical Practice
6 pages
Biblical Exegesis Manual Overview
No ratings yet
Biblical Exegesis Manual Overview
20 pages
UTME Registration at Motland School
No ratings yet
UTME Registration at Motland School
1 page
MAPEH Lesson Plan: Music & Puberty
No ratings yet
MAPEH Lesson Plan: Music & Puberty
8 pages
CSTP 6: Professional Educator Development
No ratings yet
CSTP 6: Professional Educator Development
9 pages
Ib Aahl - Topic 4 p3 - RV Questionbank
No ratings yet
Ib Aahl - Topic 4 p3 - RV Questionbank
5 pages

Domain Generalization with UNVP Method

Uploaded by

Domain Generalization with UNVP Method

Uploaded by

2020 17th Conference on Computer and Robot Vision (CRV)

Domain Generalization via Universal Non-volume Preserving Approach

Abstract—Recognition across domains has recently become

978-1-7281-9891-0/20/$31.00 ©2020 IEEE 93

  

Non-volume Preserving (E-UNVP) frameworks are ﬁrstly

∂F (z,y;θ) fi is a non-linear function built from a composition of CNN

V. E XPERIMENTS various percentages of “hard” generated samples (β), i.e.,

[21] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-

You might also like