0% found this document useful (0 votes)

28 views14 pages

Reaching Nirvana: Maximizing The Margin in Both Euclidean and Angular Spaces For Deep Neural Network Classification

This document presents a novel classification loss function for deep neural networks that maximizes the margin in both Euclidean and angular spaces simultaneously, improving accuracy in tasks such as open set recognition. The proposed method clusters class samples around centers chosen from the vertices of a regular simplex inscribed in a hypersphere, allowing for effective rejection of unfamiliar test samples. Experimental results demonstrate that this approach outperforms existing techniques in both classical classification and open set recognition problems.

Uploaded by

utl08909

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views14 pages

Reaching Nirvana: Maximizing The Margin in Both Euclidean and Angular Spaces For Deep Neural Network Classification

Uploaded by

utl08909

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

8178 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO.

5, MAY 2025

Reaching Nirvana: Maximizing the Margin in Both

Euclidean and Angular Spaces for Deep
Neural Network Classification
Hakan Cevikalp , Member, IEEE, Hasan Saribas , and Bedirhan Uzun

Abstract— The classification loss functions used in deep neural function is the most common function used for classification
network classifiers can be split into two categories based on in deep neural network classifiers. Although the softmax loss
maximizing the margin in either Euclidean or angular spaces. yields satisfactory accuracies for general object classification
Euclidean distances between sample vectors are used during clas-
sification for the methods maximizing the margin in Euclidean problems, its performance for discrimination of the instances
spaces whereas the Cosine similarity distance is used during the coming from the same class categories (e.g., face recognition)
testing stage for the methods maximizing the margin in the angu- or open set recognition (a classification scenario that allows
lar spaces. This article introduces a novel classification loss that the test samples to come from the novel classes) is not
maximizes the margin in both the Euclidean and angular spaces satisfactory. The performance decrease is typically attributed
at the same time. This way, the Euclidean and Cosine distances
will produce similar and consistent results and complement each to two factors: there is no mechanism for enforcing large-
other, which will in turn improve the accuracies. The proposed margin between classes and the softmax does not attempt to
loss function enforces the samples of classes to cluster around minimize the within-class scatter which is critical for obtaining
the centers that represent them. The centers approximating good accuracies in open set recognition problems.
classes are chosen from the boundary of a hypersphere, and the To improve the classification accuracies of the deep neural
pair-wise distances between class centers are always equivalent.
This restriction corresponds to choosing centers from the vertices network classifiers, many researchers focused on maximizing
of a regular simplex inscribed in a hypersphere. The proposed the margin between classes. The recent methods can be
loss function can be effortlessly applied to classical classification roughly grouped into two categories based on maximizing the
problems as there is a single hyperparameter that must be margin in either Euclidean or angular spaces. The methods tar-
set by the user, and setting this parameter is straightforward. geting margin maximization in the Euclidean spaces attempt to
Additionally, the proposed method can effectively reject test
samples from unfamiliar classes by measuring their distances minimize the Euclidean distances among the samples coming
from the known class centers, which are compactly clustered from the same classes and maximize the distances among the
around their corresponding centers. Therefore, the proposed samples coming from different classes. Euclidean distances
technique is especially suitable for open set recognition problems. are used during the testing stage after the network is trained.
Despite its simplicity, experimental studies have demonstrated In contrast, the methods maximizing the margin in the angular
that the proposed method outperforms other techniques in
both open set recognition and classical classification problems. spaces use the cosine distances for classification.
Interested individuals can access the source code for the proposed In this article, we propose a novel method that maximizes
approach at [Link] the margin in both the Euclidean and angular spaces at
Index Terms— Classification, computer vision, deep learning, the same time. The proposed methodology first selects class
neural collapse, open set recognition, simplex classifier. centers from the vertices of a regular simplex inscribed in a
hypersphere and utilizes a loss function that minimizes the
I. I NTRODUCTION distances between the samples and their corresponding class
centers.
D EEP neural network classifiers have been dominating
many fields including computer vision by achieving
the state-of-the-art accuracies in many tasks such as visual
A. Related Work
object, activity, face, and scene classification. Therefore, new
deep neural network architectures and different classification Wen et al. [1], [2] introduced the center loss for face
losses have been constantly developing. The softmax loss recognition to maximize the margin Euclidean space, and
they reported significant improvements over the method using
Manuscript received 27 February 2023; revised 17 October 2023 and 4 April
2024; accepted 29 July 2024. Date of publication 12 August 2024; date of the softmax loss function in the context of face recognition.
current version 5 May 2025. This work was supported by the Scientific and The range loss is combined with the softmax loss function
Technological Research Council of Türkiye (TUBİTAK) under Grant EEEAG- in [3] to maximize the margin in Euclidean spaces. Wei
121E390. (Corresponding author: Hakan Cevikalp.)
Hakan Cevikalp and Bedirhan Uzun are with the Department of Elec- et al. [4] proposed a classifier that combines the softmax
trical and Electronics Engineering, Eskişehir Osmangazi University, 26040 loss and center loss functions with the minimum margin
Eskişehir, Türkiye (e-mail: [Link]@[Link]). loss. A method combining the softmax loss function with
Hasan Saribas is with the AIE Department, Huawei Türkiye Research and
Development Center, 34768 Istanbul, Türkiye. the marginal loss is proposed by Deng et al. [5]. Cevikalp
Digital Object Identifier 10.1109/TNNLS.2024.3437641 et al. [6] proposed a deep neural network based open set
2162-237X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See [Link] for more information.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
CEVIKALP et al.: REACHING NIRVANA: MAXIMIZING THE MARGIN IN BOTH EUCLIDEAN AND ANGULAR SPACES 8179

recognition method that returns compact class acceptance weights. The main idea is to learn diverse deep neural network
regions for each known class. In this framework, hinge loss weights that are uniformly distributed on a hypersphere in
and polyhedral conic functions are used for the between- order to reduce the redundancy. Therefore, these methods are
class separation. The methods using contrastive loss [7] also more complex (in some sense it is also more sophisticated
return compact class acceptance regions. To this end, they since it applies the hyperspherical uniformity to all neural
minimize the Euclidean distances of the positive sample pairs network layers). Consequently, there are many hyperparame-
and penalize the negative pairs that have a distance smaller ters that must be fixed in the resulting method. Also, when
than a given margin threshold. In a similar manner, Schroff this idea is used in the classification layer, the distances
et al. [8], Hoffer and Ailon [9], Sohn [10], and Roy et al. between the resulting class representative weights are not
[11] employ triplet loss function that uses triplets including a equivalent as in our proposed method. A related study called
positive sample, a negative sample and an anchor. An anchor UniformFace [20] used the same idea in the classification layer
is also a positive sample, thus the within-class compactness is only and introduced uniform loss function to learn equidis-
achieved by minimizing the Euclidean distances between the tributed representations for face recognition. Another similar
anchor and positive samples whereas the distances between the method using class centroids is introduced in [21] for distance
anchor and negative samples are maximized for the between- metric learning. Although this study focuses on distance metric
class separation. The employment of contrastive or triplet loss learning, it uses class centers chosen as the basis vectors
functions in methods has a significant drawback, which is the of C-dimensional space as anchors. Then, as in triplet loss,
quadratic or cubic growth of the number of sample pairs or it attempts to minimize the distances between the data samples
triplets compared to the total number of samples. This leads and the corresponding class centers and to maximize the
to slow convergence and instability in the training process, distances between the samples and rival class centers. The
necessitating cautious data sampling/mining to mitigate these selected class centers are fixed as in our proposed method and
issues. Overall, the majority of the methods maximizing mar- it has a restriction that the feature dimension size must be
gin in the Euclidean spaces have shortcomings in a way that larger than or equal to the number of classes similar to our
they are too complex since the user has to set many weighting case. Compared to this method, our proposed method is much
and margin parameters. Furthermore, many of these methods simpler and the run-time complexity of the proposed method
are not suitable for open set recognition problems since they is significantly less. Additionally, there are two significant
do not return compact acceptance regions for classes. oversights made by the authors in their proposed method-
The methods that enlarge the margin in the angular spaces ology. The initial oversight concerns their choice of centers,
typically revise the classical softmax loss functions to maxi- which are selected from the surface of a unit hypersphere (a
mize the angular margins between rival classes. These methods hypersphere with a radius of 1). As expounded upon below,
use either multiplicative or additive margins for the inter- it becomes apparent that data samples tend to cluster near the
class separations in the angular spaces. Among these, the surface of an expanding hypersphere as the dimensionality
SphereFace [12], [13] and the RegularFace [14] methods increases. Consequently, establishing the hypersphere radius
employ multiplicative margins whereas the CosFace [15] and as 1 is not well-suited for high-dimensional feature spaces,
the ArcFace [16] methods use additive margins. Majority a viewpoint that is supported with findings reported in studies
of these methods normalize the feature vectors, classifier such as ArcFace [16] and CosFace [15]. The second con-
weights or both of them since the similarities are computed cern revolves around the exclusive use of a fully connected
by using the angles. We would like to point out that almost layer to increase dimensionality, particularly when the feature
all methods that maximize the margin in the angular space dimension is smaller than the number of classes. A fully
are proposed for face recognition. As indicated in [6], these connected layer just uses the linear combination of existing
methods use subspace approximations for the classes and the features and the resulting space has the same dimensionality
similarities are measured by using the angles between sample as in the original feature space in the best case scenario (this
vectors. However, subspace approximations work well for the issue is explained in more details below). As a result, the
classification settings where the number of the features is dimensionality is not increased, and this method will not work
much larger than the number of class specific samples. This is for large-scale problems where the number of classes is very
typically satisfied for the face recognition problems, but there large.
are many classification tasks that do not satisfy this criterion. There are studies using or mentioning simplex centers as
In addition to this problem, these methods are also complex in our proposed method. Among these methods, Papyan et al.
since they have many parameters that must be set by the user [22] shows that the samples of different classes cluster around
as in the methods that maximize the margin in the Euclidean the class centers forming the vertices of a regular simplex (as
spaces. we proposed in this study) at the last stages of the learning
The methods that are most closer to the proposed method- process when the linear classifiers are used with the softmax
ology are proposed in [17], [18], and [19]. These methods loss function and feature dimension is higher than the number
introduce loss functions for learning uniformly distributed of classes. They show that the lengths of the vectors of the
representations on the hypersphere manifold through potential class means (after centering by their global mean) converge
energy minimization. However, these studies consider the layer to the same length and the angles between pair-wise center
regularization problem rather than the direct classification vectors become equal during the last training stages (it is called
problem and apply hyperspherical uniformity to the learned terminal phase of the training in the study) of the deep neural

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
8180 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO. 5, MAY 2025

networks using linear classifiers. This method is different than Our proposed method has many advantages over other
our proposed method in the sense that they do not use fixed margin maximizing deep neural network classifiers. These
class centers chosen from the vertices of a simplex. Instead, advantages can be summarized as follows.
they directly use the softmax loss function and learn class 1) The proposed method is very straightforward in the
weights. In general, they simply provide theoretical arguments sense that one needs to fix only one parameter,
showing that using the softmax loss function with the linear the hypersphere radius. Prior research on classifica-
classifiers yields embeddings where the class samples cluster tion methods employing hyperspherical embeddings has
around vertices of a regular simplex after some kind of already investigated the selection of this parameter,
normalizations. Pernici et al. [23], [24], Kasarla et al. [25], with [15] offering lower bounds for its determination.
and Bytyqi et al. [26] use fixed centers chosen from the Therefore, setting this parameter is extremely easy for
vertices of a regular simplex as in our proposed method. the users. For open set recognition, the user has to set
But, all of them utilize variants of the softmax loss function two parameters if the background class samples are used
including hyperparameters that must be fixed by the user. for learning.
None of them proposes a simple loss function as in our 2) The proposed method returns compact and interpretable
proposed method. Using the softmax loss function yields radial acceptance regions for each class, thus it is very suit-
distributions as illustrated in these studies. Therefore, their able for open set recognition problems. Other methods
success is not satisfactory especially in open set recognition utilizing simplex vertices for classification purposes use
problems since the resulting embeddings are not compact as variants of the softmax loss function and return radial
in our proposed method, please see related discussion given distributions which is not compact. Therefore, their
in Section II-C below. Also, none of the studies considered accuracies are not satisfactory for open set recognition.
the case when the dimension is smaller than the number of 3) The distances between the samples and their corre-
samples and conducted experiments on this setting. For such sponding centers are minimized independently of each
cases, we need to increase the dimension of the feature space other, thus the proposed method also works well for
and we propose solutions to handle this case. In contrast, none imbalanced datasets.
of these methods propose an effective solution for this case. 4) We investigate scenarios where the utilization of cen-
Yang et al. [27] introduced an alternative loss function named ters from a regular simplex is unfeasible due to the
dot regression loss, which, like our proposed method, utilizes dimensionality of the feature space being less than the
centers selected from the vertices of a regular simplex. How- number of classes minus one (d < C − 1). In such
ever, their approach requires the selection of two parameters, instances, neural collapse does not occur, and the case
making our method comparatively simpler. Additionally, the where d < C − 1 remains largely unexplored with
loss function described in [27] mandates that feature samples no proposed efficient solutions. Here, we address this
conform to the surface of a hypersphere with a predefined issue by introducing a new module that augments the
radius, akin to the spherical embeddings used in the ArcFace dimensionality of the feature space, as elaborated upon
method [16]. In contrast, our method does not impose such below.
constraints, allowing the samples to occupy the full feature Against all these advantages, there is only one limitation of the
space for embedding. proposed method: The dimension of the CNN features must
be larger than or equal to the total number of classes minus 1.
To overcome this limitation, we introduced two solutions: The
B. Contributions first solution uses a dimension augmentation module (DAM)
The methods that maximize the margin in Euclidean or whereas the second solution revises the existing deep neural
angular spaces mentioned above have the shortcomings in the network architectures.
ways that the objective loss functions include many terms that
need to be weighted, the class acceptance regions are not com- II. M ETHOD
pact, or they need additional hard-mining algorithms. In this
study, we propose a simple yet effective method that does not A. Motivation
have these limitations. Our proposed method maximizes the In this study, we introduce a simple yet effective deep neural
margin in both the Euclidean and angular spaces. To the best network classifier that maximizes the margin in both Euclidean
of our knowledge, our proposed method is the first method and angular spaces. To this end, we propose a novel classi-
that maximizes the margin in both spaces. To accomplish this fication loss function that enforces the samples to compactly
goal, we train a deep neural network that enforces the samples cluster around the class-specific centers that are selected from
to gather in the vicinity of the class-specific centers that lie on the outer boundaries of a hypersphere. The Euclidean distances
the boundary of a hypersphere whose center is set to the origin. and angles between the centers are equivalent. Please not that
Each class is represented with a single center, and the distances in terms of margin maximization the distances between the
between the class centers are equivalent. This corresponds to class centers are the maximum values for angular distances.
selection of class centers from the vertices of a regular simplex In a similar manner, for Euclidean distances, if the class
inscribed in a hypersphere. Both the Euclidean distances and centers are enforced to lie on the boundary of a hypersphere,
angular distances between class centers are equivalent to each the distances among the classes again become the best optimal
other. solution we can get. Theoretical proofs of this fact can be

Fig. 1. In the proposed method, class samples are enforced to lie closer to the class-specific centers representing them, and the class centers are located on
the boundary of a hypersphere. All the distances between the class centers are equivalent, thus there is no need to tune any margin term. The class centers
form the vertices of a regular simplex inscribed in a hypersphere. Therefore, to separate C different classes, the dimensionality of the feature space must be
at least C − 1. The figure on the left shows the separation of two classes in 1-D space, the middle figure depicts the separation of three classes in 2-D space,
and the figure on the right illustrates the separation of four classes in 3-D space. For all cases, the centers are chosen from a regular C−simplex.

found in both [19] and [26]. Using simplex vertices as class when the data samples are mapped to Laplacian eigenspace,
centers is illustrated in Fig. 1. In this figure, the centers they concentrate on the vertices of a simplex structure. These
representing the classes are denoted by the star symbols studies are also complementary to the studies showing that the
whereas the class samples are represented with circles having high-dimensional data samples lie on the boundary of a grow-
different colors based on the class memberships. As seen in ing hypersphere. It is because, as proved in [32], normalized
the figure, all pair-wise distances between the class centers are cuts (NCuts) [33] clustering algorithm, which is presented as
equivalent, and class centers are located on the boundary of a a spectral relaxation of a graph cut problem, maps the data
hypersphere. Moreover, if the hypersphere center is set to the samples onto an infinite-dimensional feature space. Therefore,
origin, then the angles between the class centers are also same, these data samples naturally concentrate on the vertices of a
and the lengths of the centers are equivalent, i.e., ∥si ∥ = u, regular simplex due to the high-dimensionality of the feature
(u is the length of the center vectors). After learning stage, space.
if the class samples are compactly clustered around the centers There are strong arguments that verify that high-dimensional
representing them, we can classify the data samples based on data samples concentrate on the vertices of regular simplex
the Euclidean or angular distances from the class centers. Both as discussed above. Do the same arguments hold for the
distances yield the same results if the hypersphere center is high-dimensional features produced by the deep neural net-
set to the origin. work classifiers? A recent study [22] answers this question
At this point, the question of whether enforcing data sam- and reveals that the samples of different classes cluster around
ples to lie around the simplex vertices is appropriate or not the class centers forming the vertices of a regular simplex (as
comes to mind. In fact, high-dimensional spaces are quite we proposed in this study) at the last stages of the learning
different than the low-dimensional spaces, and there are many process when the feature dimension is higher than the number
studies showing that the data samples lie on the boundary of of classes. They show that the lengths of the vectors of the
a hypersphere when the feature dimensionality, d, is high and class means (after centering by their global mean) converge
the number of samples, n, is small. For example, Jimenez and to the same length and the angles between pair-wise center
Landgrebe [28] theoretically shows that the high-dimensional vectors become equal during the last training stages (it is
spaces are mostly empty and data concentrate on the outside of called terminal phase of the training in the study) of the deep
a shell (on the outer boundary of a hypersphere). The authors neural networks using linear classifiers. They also demonstrate
also show that as the number of dimensions increases, the that the within-class scatter converges to zero indicating that
shell increases its distance from the origin. More precisely, the class-specific samples gather around their corresponding
the data samples lie near the outer surface of a growing class center. A geometrical analysis of this study is given
hypersphere in high-dimensional spaces (therefore setting the in [34]. However, both studies are not complete in the sense
hypersphere radius to 1 as in [21] is not suitable for high- that they do not consider the cases when the dimension of the
dimensional spaces). A more recent study [29] explicitly feature space is smaller than the number of classes so that it
shows that the data samples lie at the vertices of a regular is impossible to fit the class centers to the vertices of a regular
simplex in high-dimensional spaces. These two studies are simplex. Also, the authors do not propose an efficient method
not contradictory and they support each other since we can as in our proposed method, instead they use the classical
always inscribe a regular simplex in a hypersphere as seen softmax loss function with the linear classifiers, and learn class
in Fig. 1. In addition to these studies, Kumar et al. [30] weights for classification. In contrast, we propose an efficient
and Weber [31] show that the eigenvectors of the Laplacian method that enforces the samples to lie closer to the vertices
matrices (the matrices computed by operating on similar- of a regular simplex directly in this article. We do not learn
ity matrices in spectral clustering analysis) form a simplex class weights, instead we use fixed class centers chosen from
structure, and they use the vertices of resulting simplex for the vertices of a regular simplex. In addition, we consider the
clustering of data samples. In other words, they prove that dimension restriction (when the number of classes is larger

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
8182 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO. 5, MAY 2025

than the feature dimension) and introduce solutions to handle between the classes are again the maximum optimal value one
this problem as explained below. can get. Therefore, there is no need of using a loss term for the
interclass separation. Now, let us assume that the deep neural
network features of training samples are given in the form
B. Maximizing Margin in Euclidean and Angular Spaces
(fi , yi ), i = 1, . . . , n, fi ∈ IRd , yi ∈ { j} where j = 1, . . . , C.
Here, we propose a novel and simple method that enforces Here, C is the total number of known classes, and we assume
the samples of classes to cluster around the centers chosen that the feature dimension d is larger than or equal to C − 1,
from the vertices of a regular simplex. As shown in [22], i.e., d ≥ C − 1. Under these assumptions, the loss function of
all class samples cluster around the class centers forming the the proposed method can be written as
vertices of a regular simplex when the dimension of the feature n
space is larger than the number of classes. Therefore, there is 1X 2
L= fi − s yi . (4)
no need to use complicated classifier layers, and the same n i=1
effect can be accomplished by using much simpler classi-
fication layers as in our proposed method. In the proposed The loss function includes a single term that targets to mini-
method, instead of using more complicated linear classifiers mize the within-class variations by minimizing the distances
and learning class weights for each class, we directly enforce between the samples and their corresponding class centers,
the class samples to compactly cluster around the fixed class which are set to the vertices of a regular simplex. There is
centers chosen from the vertices of a regular simplex. All no need for another loss term for the between-class separa-
the pair-wise distances between the selected class centers are tion since the selected centers have the maximum possible
equivalent. Euclidean and angular distances among them. As a result, there
Let us assume that there are C classes in our dataset. In this is no hyperparameter that must be fixed, and the proposed
case, we first need to create a C-simplex (some researchers call method is extremely easy for the users. Moreover, the data
it C −1 simplex considering the feature dimension, but we will samples compactly cluster around their class centers, therefore
prefer C-simplex definition). The vertices of a regular simplex the proposed method results in compact acceptance regions
inscribed in a hypersphere with radius 1 can be defined as for classes, which is crucial for the success in the context
follows: of the open set recognition. It should be noted that our
( proposed method is quite different than the methods using
(C − 1)−1/2 1, j = 1 vertices of a regular simplex as in our proposed method. It is
vj = (1)
κ1 + ηe j−1 , 2≤ j ≤C because, all these methods use variants of the softmax loss
function that typically require setting margin parameters for
where the interclass separation. Furthermore, these methods return
√ r
1+ C C noncompact radial distributions (see [24, Figs. 2, 4, 5, and
κ=− , η= . (2) 8] and [26, Fig. 2]). Therefore, their performance will not be
(C − 1)3/2 C −1
satisfactory for open-set recognition problems. We call our
Here, 1 is an appropriately sized vector whose elements are proposed method as deep simplex classifier (DSC).
all 1, e j is the natural basis vector in which the j−th entry is The running time of the proposed method will be more
1 and all other entries are 0. Such a C−simplex is in fact a efficient compared to the methods using the softmax loss func-
C−dimensional polyhedron where the distances between the tion and its variants, Arcface [16], Cosface [15], and regular
vertices are equivalent. It must be noted that the distances polytope networks [24]. Because, these methods require to
between the vertices do not change even if the simplex apply exponential function to each logit,(w⊤c fi + bc ), followed
is rotated or translated. But, the dimension of the feature by a normalization by dividing with the sum of all these
space must be at least C − 1 in order to define such a exponentials as seen in the softmax loss function given below
regular C−simplex. Next, we must define the radius, u, of the ⊤
n
hypersphere. This term is similar to the scaling parameter 1X ew yi fi +b yi
L=− log PC . (5)
used in methods such as ArcFace [16] and CosFace [15], n i=1 w⊤j fi +b j
j=1 e
that maximize the margin in angular spaces. As the dimension
increases, it must be also increased since the studies [28] show On the other hand, we just need to extract the CNN features of
that the hypersphere whose outer shells include the data also the test samples during training and testing stages. Then, these
grows as the dimension is increased. Wang et al. [15] provided features are compared to precomputed centers by using the
a lower bound for the determination of this parameter. Then, Euclidean distances. Therefore, the proposed method is more
we set the class centers that will represent the classes as efficient in terms of computational complexity. However, this
does not affect testing times much since the most of the time
s j = uv j , j = 1, . . . , C. (3) is spent on convolutional layers of the deep neural network
classifier during the testing stage.
The order of selection of centers does not matter since the
distances among all centers are equivalent. These distances
are the best optimal values that we can get when the cosine C. Including Background Class for Open Set Recognition
distances are used as theoretically proved in [19] and [26]. In open-set recognition scenarios, the training of classifiers
In a similar manner, when the class centers are restricted to commences by exclusively utilizing samples of known classes.
lie on the boundary of a hypersphere, the Euclidean distances Subsequently, both known and unknown class samples are
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
CEVIKALP et al.: REACHING NIRVANA: MAXIMIZING THE MARGIN IN BOTH EUCLIDEAN AND ANGULAR SPACES 8183

Our proposed method returns compact acceptance regions as

illustrated in Fig. 2. The background samples are pushed away
from the known class centers. Therefore, the unknown class
samples can be easily rejected based on the Euclidean dis-
tances from the test samples to the known class centers. Here,
using Euclidean distances is important since it is impossible
to reject background samples far from the centers if these
samples have the similar orientations as in selected centers.
This is the main reason why the state-of-the-art classifiers such
as ArcFace [16] and CosFace [15], fail for open-set recog-
nition problems. Compared to other related methods using
simplex vertices and variants of the softmax loss function,
our proposed method is more suitable for open-set recognition
tasks since the proposed methodology returns compact class
acceptance regions. Please note that almost all methods using
the fixed simplex vertices and the variants of softmax loss
function return noncompact radial distributions. Since the class
acceptance regions are not compact as in our proposed method,
their performances are not satisfactory for open-set recognition
Fig. 2. If the cosine distances are used for measuring dissimilarities between
the samples and centers as in SOTA methods, all the samples closer to the problems (e.g., our proposed method achieve much better
dashed lines will be very close to the known class centers, therefore they open-set accuracies compared to [25] on open-set recognition).
will not be rejected. As a result, the samples coming from the unknown
classes denoted by the black circles (the ones closer to the origin and the
ones closer to the end of the dashed lines outside the hypersphere), will be
assigned to the known classes even though they lie very far from the known D. Dimensionality Restriction and Solution Techniques
class regions. Therefore, using cosine distances and radially distributed CNN The major limitation of the proposed method is the restric-
feature embeddings are not suitable for open-set recognition settings.
tion that the dimension of the feature space must be larger
than or equal to C − 1, i.e., d ≥ C − 1. A similar restriction
exists in [21], and their proposed method requires d ≥ C since
employed in testing the resulting classifiers. The primary they choose the class centers as the standard basis vectors of
objective in this task is to ensure accurate classification of C-dimensional space as opposed to our proposed method that
known class samples, while also detecting and rejecting sam- selects the centers from the vertices of a regular simplex. The
ples from unknown classes [35]. Prior methods for open-set typical feature dimension size returned by the classical deep
recognition relied solely on the use of the known class samples neural network classifiers is 2048 or 4096. In this case, the
during training. However, recent investigations [36], [37], [38], number of classes in our training set cannot exceed 2049 or
[39] have shown that augmenting the training dataset with 4097. However, the number of classes can be larger than these
the background dataset with samples from classes that differ values for some classification tasks, and we cannot use the
from the known classes can greatly enhances accuracy. Let us proposed method in such cases.
represent the deep neural network features of the background There are several procedures to solve this problem: As a first
samples by fk ∈ IRd , k = 1, . . . , K . In order to incorporate solution, we can use a method similar to [40] that returns more
the background samples, we add an additional loss term that centers where the distances between centers are approximately
pushes the background samples away from the fixed known equivalent. In this case, the number of centers is increased
class centers as follows: to 2d + 4 for d−dimensional feature spaces. However, this
n procedure may not solve the problem if the number of classes
1X 2 is still larger than 2d + 4. For more complete solutions,
L= fi − s yi
n i=1 we can revise the existing CNN architectures so that they yield
n X
K the desired feature size or integrate a module that increases
2 2
X
+λ max 0, m + fi − s yi − fk − s yi (6) the feature dimension. This is illustrated in Fig. 3. These two
i=1 k=1 procedures are explained below.
1) Dimension Augmentation Module: To solve the dimen-
where m is the selected threshold, and λ is the weighting sion restriction, we first introduce a plug and play module
term. The second loss term imposes a constraint on the called the DAM that increases the feature dimension size
distances between known class samples and their respective to any desired value. The module is visualized in Fig. 4,
class centers, mandating that they be smaller than the distances and it includes several fully connected layers supported with
between background class samples and the known class centers nonlinear activation functions. The first fully connected layer
by a minimum margin of m. In contrast to our first proposed maps the d−dimensional feature space onto a higher (C −1)/2
loss function, this loss function includes two terms that must (half of the desired feature dimension size) dimensional space.
be set by the users. But, this is necessary only if we use the Then, we apply parametric rectified linear unit (PReLU) acti-
background class samples. vation functions [41] followed by the second fully connected

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
8184 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO. 5, MAY 2025

increase for face verification experiments conducted in this

study.

III. E XPERIMENTS
A. Illustrations and Ablation Studies
Here, we first conducted some experiments to visualize the
Fig. 3. Illustration of the proposed method: we use well-known architectures
embedding spaces returned by the various loss functions using
(such as ResNet-19 and ResNet-101) as backbones and we only change the the vertices of the regular simplex. To this end, we utilized
classification loss layer. If the dimension of the CNN feature space is smaller a small deep neural network that yields 2D CNN features.
than C −1, we increase the dimension to desired size by using DAM module or
revising the network architecture, and then apply the proposed loss function.
As training data, we selected three classes from the Cifar-10
dataset since the maximum number of classes is bounded by
3 in 2-D spaces in the proposed method. We would like to
point out that we can use different loss functions in addition
layer. The second fully connected layer increases the dimen-
to our default loss function given in (4) once we determine
sion to desired feature space size, (C − 1). Then, we apply
the vertices of the simplex that will represent the classes. For
another PReLU function followed by the last fully connected
this experiment, we used two other loss functions: The first
layer. It should be noted that, following the ReLU (or PReLU)
one is the hinge loss that minimizes the distances between the
operation, the majority of values may become positive, despite
samples and their corresponding class center if the distance is
their corresponding centers having negative values. Therefore,
larger than a selected threshold
the last layer in the module includes a fully connected layer
n
that maps (C − 1) dimensional feature space back to (C − 1) 1X
2

dimensional feature space so that the sample features may have Lhinge = max 0, fi − s yi − m . (7)
n i=1
negative values. The proposed module increases the dimension
in two steps as explained above. The dimension can be directly This loss function does not minimize the distances between
increased from d to C −1 in the first fully connected layer. In a the samples and their corresponding centers if the distances
similar manner, we can increase the dimension in more than are already smaller than the selected threshold, m. This way
two steps if desired.1 The main idea of the proposed DAM is class-specific samples are compactly clustered in a hyper-
similar to kernel mapping idea used in kernel methods [42], sphere with radius, m. For the second loss function, we used
[43] in the spirit with the exception that we explicitly map the the variant of the softmax loss function where the weights are
data to higher dimensional feature space as in [44] and [45]. fixed to the simplex vertices as in
It should be noted that Do et al. [21] proposed to use a fully 1X
n ⊤
es yi fi +b yi
connected layer alone for increasing the dimensionality of the Lsoftmax =− log PC . (8)
n i=1 s⊤j fi +b j
feature space. However, a fully connected layer just uses the j=1 e
linear combination of existing features and the resulting space For the softmax loss, we fix the classifier weights to the
has the dimensionality which is lower than or equal to the predefined class centers and we only update features of the
original feature space dimension. Therefore, one has to use samples by using backpropagation. We set the hypersphere
activation functions to introduce nonlinearity and increase the radius to, u = 5, since this is a simple dataset.
dimension as in our proposed module. The embeddings returned by the deep neural networks
2) Revising Network Architecture: We can also solve the using different loss functions are plotted in Fig. 5. The first
dimension problem by slightly changing the existing CNN figure on the left is obtained by our default loss function that
architectures instead of using our proposed plug and play does not need any parameter selection. All data samples are
DAM. To this end, we can avoid the fully connected layers compactly clustered around their class means as expected. The
that are used for dimension reduction in the last layers of second loss function using the hinge loss returns spherical
deep CNNs. For example, in the ResNet architectures we distributions based on the selected margin, m, and the classes
used for face recognition in our experiments, the dimension are still separable by a margin. In contrast, when the softmax
of the feature space is 25 088 just before the fully connected is used with the simplex vertices, the data samples are very
layers, and it is reduced to 512 after fully connected layers. close and they overlap since there is no margin among the
Instead of reducing the dimension to 512, we can reduce it classes. Therefore, our default loss function seems to be the
to values that solve the current problem. If the number of best choice among all tested variants since it does not need
classes is much larger than 25088, we can use more filters fixing any parameter and returns compact class regions.
at the last layers to increase this number. In this study, We also conducted tests on imbalanced datasets. In our
we used 25 088 dimensional feature space and reduced the proposed method, the distances between the samples and their
feature size to 12 500 by using a fully connected linear layer corresponding class centers are minimized independently of
(without PReLU) for training the large-scale dataset sampled each other. Therefore, we expect that the proposed method
from MS1MV3 dataset [46] without any need for dimension will be more robust against to imbalanced datasets. To verify
this, we conducted experiments on the same three classes
1 Our shared software allows to select any desired number of steps for used before. We used the same deep neural network classifier
increasing dimensionality. yielding 2-D feature spaces for this experiment. The number

Fig. 4. Plug and play module that will be used for increasing feature dimension. It maps d−dimensional feature vectors onto a much higher
(C − 1)− dimensional space. The DAM module was specifically designed to allow users to choose any desired number of steps for increasing dimensionality.
It is possible to increase the dimension in a single step or gradually increase it using multiple steps. This figure depicts the case when two steps are used for
increasing the dimension.

Fig. 5. Outputs of the deep neural network classifiers trained by using

different simplex loss functions: (a) 2D CNN features returned by the proposed
method trained with the default loss function given in (4), (b) 2D CNN features
returned by the proposed method trained with the hinge loss, and (c) 2D
CNN features returned by the proposed method trained with the softmax loss
function.

of training samples per class is 5000 for the selected classes

and we first trained the proposed method by using the same
amount of samples for each class. Then, we extracted the
CNN features of test samples. After that, we decreased the
number of samples of the blue colored classes to 500 (which
Fig. 6. Learned feature representations of image samples. (a) Embeddings
is 10% of the original size) to create an imbalanced training of the training samples returned by the proposed method trained with the
set. We trained another network by using this imbalanced balanced dataset. (b) Embeddings of the test samples returned by the proposed
dataset and extracted the CNN features of the testing samples. method trained with the balanced dataset. (c) Embeddings of the training
samples returned by the proposed method trained with the imbalanced dataset.
The visualization of the extracted features is shown in Fig. 6, (d) Embeddings of the test samples returned by the proposed method trained
where the first row shows the CNN features of the training with the imbalanced dataset.
and test samples extracted by using the network trained with
the balanced dataset and the second row shows the extracted
features by using the network trained with imbalanced dataset.
As seen in the figure, the extracted features of the test samples 1) Datasets: Mnist, Cifar-10, SVHN: These datasets are
obtained by using the imbalanced dataset are similar to the split randomly into six known and four unknown classes by
ones obtained by using the balanced dataset. This verifies that using the common testing setting. The 80 Million Tiny Images
the proposed method is more robust against to imbalanced dataset [47] is used as the background class.
datasets as expected. Cifar + 10, Cifar + 50: For Cifar + N experiments,
four randomly chosen classes from Cifar-10 dataset are used
for training, and N nonoverlapping classes chosen from the
B. Open-Set Recognition Experiments
Cifar-100 dataset are used as unknown classes as in [37], [48],
The datasets are split into known and unknown classes [49], and [50]. The 80 Million Tiny Images dataset [47] is used
in open-set recognition settings. By following the standard as the background class.
settings, we split the datasets into known and unknown classes TinyImageNet: For TinyImageNet [51] experiments,
five times, trained our classifiers and computed the accuracies. 20 classes are randomly chosen as known classes and
The final accuracies are obtained by averaging the accuracies 180 classes as unknown classes by following the standard
obtained in each trial. The details of the each dataset are given setting. The 80 Million Tiny Images dataset [47] is used as
below: the background class.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
8186 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO. 5, MAY 2025

TABLE I
AUC S CORES (%) OF O PEN S ET R ECOGNITION M ETHODS ON T ESTED DATASETS (n.r. S TANDS FOR N OT R EPORTED ). T HE B EST ACCURACIES
A RE S HOWN W ITH R ED F ONTS W HEREAS S TATISTICALLY S IMILAR P ERFORMANCES A RE S HOWN W ITH B LUE F ONTS . T HE M ETHODS T HAT
S TATISTICALLY P ERFORM P OORLY A RE S HOWN W ITH S TANDARD B LACK F ONT. T HE S TANDARD D EVIATION OF
O BJECTTOSPHERE METHOD I S A SSUMED AS 1 FOR THE C IFAR -10 DATASET

TABLE II
C LOSED -S ET ACCURACIES (%) OF O PEN -S ET R ECOGNITION M ETHODS ON T ESTED DATASETS

2) Results: The main goal of open set recognition is to test utilizing the t-distribution. If the obtained significance
detect and reject the samples that come from the novel classes. falls below the predefined significance threshold (set at 0.05),
The performance of open set recognition is often measured we reject the null hypothesis, indicating that there is a sta-
using area under the ROC curve (AUC) scores. Additionally, tistically significant difference in performance between the
the closed set accuracy is also reported to evaluate classifi- two methods. The highest accuracy scores are highlighted in
cation performance on known data by disregarding unknown bold red text, while methods exhibiting statistically similar
samples, as demonstrated in previous works such as [48] performance are indicated in bold blue. Results for methods
and [52]. We trained our proposed method by using the loss that perform poorly from a statistical perspective are pre-
function given at (6), which is especially designed for the sented in standard black font. Notably, there were significant
open-set recognition settings. Our proposed method, DSC, is performance differences observed for the Mnist, Cifar + 10,
compared against to other state-of-the-art open set recogni- Cifar + 50, and TinyImageNet datasets. Our proposed method
tion methods including maximally separating matrix method achieves significantly better accuracies compared to other
of [25] using simplex vertices, C2AE [53], Softmax, Open- tested methods. For the Cifar-10 dataset, our proposed method
Max [35], OSRCI [52], CAC [37], RPL [50], CROSR [49], performs statistically similar to the best performing method
ROSR [49], generative-discriminative feature representations whereas all tested methods perform worse compared to the
(GDFRs) [54], and Objecttosphere [55] methods. Except best performing method for the SVHN dataset. Closed-set
for the TinyImageNet dataset, we employed the identical accuracies for open-set recognition methods were reported
network backbone as in [52] for all datasets. To achieve in Table II, where the proposed method achieved the best
higher accuracies for the TinyImageNet dataset, we utilized accuracies among the tested methods, with the exception of the
a deeper Resnet-50 architecture. The hypersphere radius is SVHN dataset. Obtaining the best accuracies in terms of AUC
set to u = 64 as in ArcFace method. The proposed methods scores and closed-set accuracies indicates that our proposed
demonstrated accuracies that are directly comparable to those method can easily identify and reject the novel class samples
reported in [52] for most of the tested datasets, as the network and correctly classify the known class samples as expected.
weights were randomly initialized during the training stage.
AUC scores were summarized in Table I, which showed that
the proposed method achieved the best accuracies across all C. Closed-Set Recognition Experiments
datasets except for the Cifar-10 and SVHN. We also conducted 1) Experiments on Moderate Sized Datasets: Here, we con-
statistical significance tests to assess the variances in accuracy ducted closed-set recognition experiments on moderate sized
between the proposed method and its competitors listed in datasets. Our proposed method did not need DAM since the
Table I. This examination employs a null hypothesis statistical feature dimension is much larger than the number of classes

TABLE III TABLE V

C LASSIFICATION ACCURACIES (%) ON M ODERATE S IZED DATASETS V ERIFICATION R ATES (%) ON D IFFERENT DATASETS

TABLE IV
C LASSIFICATION ACCURACIES (%) ON I MAGE N ET DATASET
is utilized for this purpose has been trained on the MS1MV3
dataset [46], which is a refined variant of the MS-Celeb-1M
dataset [58], and incorporates the proposed loss function. The
MS1MV3 dataset includes approximately 91K individuals.
We have used the first 12K individuals having the most sam-
ples per class in our experiments (using more classes yielded
memory problems with the GPUs we have used for the exper-
in the training set for these experiments. We compared our iments). The ResNet-101 architecture is used as backbone,
results to the methods that maximize the margin in Euclidean and this backbone yields CNN features whose dimension is
or angular spaces. We implemented the compared methods d = 512. Therefore, the number of classes is much larger than
by using provided source codes by their authors, and we the feature dimension, d = 512. For both proposed classifiers,
used the ResNet-18 architecture [56] as backbone for all we mapped the feature dimension to 12 500 rather than C−1 =
tested methods. Therefore, our results are directly comparable. 11999. For DSCDAM , we used only one layer with PReLU
We set the hypersphere radius to u = 64 as before. activation functions, which required to estimate additional
Classification accuracies are given in Table III. For the 512 × 12500 + (12500)2 − 512 × 12000 weight parameters
Mnist dataset, majority of the tested methods yield the same for the utilized network. We also applied batch normalization
accuracy, but our proposed DSC method outperforms all after PReLU layer. For DSCRNA , we first removed the original
tested methods on the Cifar-10 and Cifar-100 datasets. The fully connected layer that maps the 25 088 dimensional CNN
performance difference is significant, especially on the Cifar- features to 512 dimensional space. Then, we added a fully
100 dataset. These results verify the superiority of the margin connected layer (without PReLU) that maps 25 088 dimen-
maximization in both Euclidean and angular spaces. Achieving sional CNN features to 12 500 dimensional feature space.
the best accuracies is encouraging, because our proposed Therefore, this revision is required to estimation of additional
method is very simple and does not need any parameter tuning, 25088 × 12500 − [25088 × 512 + 512 × 12000] weights.
yet it outperforms more complex methods. The hypersphere radius is set to 2000. Training the network,
We also conducted tests on ImageNet dataset [57]. We used DSCRNA , using the revised network architecture took 11 444 s
a deeper architecture ResNet-101 since this dataset is a (3.178 h) to finish one epoch whereas the network using
large-scale dataset including 1000 classes. The results are DAM, DSCDAM , completed an epoch in 11 137 s (3.093 h).
given in Table IV. We compared our results to the method In contrast, a network that uses the 512-D CNN feature space
using the softmax loss function, large-margin softmax loss with the classical softmax loss function finishes an epoch
function [13], and a very related method, maximally separating in 8962 s (2.489 h). Therefore, DSCRNA is approximately
matrix method of [25], using simplex vertices as fixed class 1.28 times slower and DSCDAM is 1.24 times slower compared
centers as in our proposed method. As seen in the table, our to a classical network that uses the softmax loss function. Once
proposed method outperforms all methods and achieve the best the networks are trained, we used the resulting architectures
accuracies for both top-1 and top-5 accuracies. to extract deep CNN features of the face images coming from
2) Experiments on Large-Scale Datasets: We also tested the the test datasets.
proposed method in the classification setting where the number As test datasets, we used labeled faces in the wild (LFW)
of classes is much larger than the feature dimensionality. [59], celebrities in frontal-profile dataset (CFP-FP) [60], cross-
As stated earlier, the dimension restriction occurs in such age LFW (CALFW) [61], AgeDB [60], and cross-pose LFW
settings. To overcome this, we utilized DAM and revised (CPLFW) [62]. For evaluation, the standard protocol of
network architecture as explained in Section II-D. DSCDAM unrestricted with labeled outside data [59] is used and the
represents the classifier using DAM, and DSCRNA represents accuracies are obtained by using 6000 pair testing images on
the classifier using the revised network architecture. We tested LFW, CALFW, AgeDB, and CPLFW. For CFP-FP dataset, the
the proposed methods on face verification and recognition accuracies are obtained by using 7000 pairs of testing images
problems. by following the standard testing setting. Table V reports the
To conduct every face verification test, the standard pro- accuracies. As seen in the results, the proposed method using
cedure is followed by employing the same network that has DAM achieves the best accuracies for four datasets among all
been trained on a large-scale face dataset. The network that tested five datasets. DSCRNA method also obtains competitive

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
8188 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO. 5, MAY 2025

TABLE VI
I DENTIFICATION ACCURACIES (%) ON THE IJB-B AND IJB-C B ENCHMARKS

accuracies, but its accuracies are lower than DSCDAM . These best accuracies in terms of TPIR accuracies on the IJB-B
results verify that the proposed techniques for solving dimen- dataset.
sion problem successfully resolve this problem. However, the
weight parameters of the networks are greatly increased. IV. S UMMARY AND C ONCLUSION
We also conducted identification (recognition) tests on the
challenging IJB-B and IJB-C datasets [63]. These datasets This article proposed a neural network classifier that aims
present considerable difficulties due to their inclusion of full to maximize the margin in both the Euclidean and angular
pose variations and wide-ranging imaging conditions. The spaces. Specifically, the method generates embeddings such
IJB-B dataset is characterized by its template-based approach, that class-specific samples cluster around the class centers
encompassing 1845 subjects with 11 754 images and 55 025 chosen from the vertices of a regular simplex. The technique
frames from 7011 videos. Images and videos were sourced is particularly straightforward as it requires to fix a simple
from the web, showcasing significant variations in pose, illu- parameter for classical closed set recognition settings. Despite
mination, and image quality, among other factors. The IJB-C its simplicity, the proposed method achieves state-of-the-art
dataset serves as an extension of the IJB-B dataset, featuring accuracies on open-set recognition problems by rejecting
3531 unique subjects in unconstrained environments. This samples of unknown classes based on their distances from
mixed media set-based dataset comprises 31 334 still images, the class-specific centers. Additionally, the proposed method
averaging approximately six images per subject, and 117 542 outperforms other current classification methods on closed set
video frames, averaging about 33 frames per video. Each recognition settings, particularly with moderate-sized datasets.
subject is represented by a template consisting of multiple Nonetheless, the method exhibits a limitation in learning
images, rendering the set-based face recognition approach large-scale datasets, which can be addressed by introducing a
ideal for subject identification. These datasets are widely DAM and revising existing deep neural network architectures.
recognized as benchmark datasets for evaluating state-of-the- The proposed classifier using the DAM achieves state-of-the-
art face recognition methodologies. art accuracies on face verification problems, but the weight
For reporting accuracies, we follow the standard benchmark parameters of the deep neural network classifier greatly
procedure for IJB-B and IJB-C to evaluate the proposed increase. In summary, the proposed method is an ideal choice
methods on “search” protocol for 1:N face identification. Here, for open set recognition and classical classification problems,
the Rank-N classification accuracies are reported for identifi- particularly when the feature dimension is larger than the num-
cation, and the classification rate is the percentage of probe ber of classes, and the proposed classifier is straightforward
searches, which correctly finds the probe’s gallery mate in to use with a simple hyperparameter that requires setting. For
the gallery set within top N rank-ordered results. In addition, large-scale datasets with many classes, the proposed method
we also report the true positive identification rate (TPIR) using DAM still yields good accuracies, but it increases the
accuracies obtained for different false positive identification complexity of the deep neural network architectures.
rate (FPIR) values. The results are given in Table VI, where A PPENDIX
the red and blue fonts successively denote the best and the
second best accuracies. As seen in the table, the proposed I MPLEMENTATION D ETAILS
method using DAM module achieves the best accuracies in The learning rate is set to 0.1 for the proposed
all metrics on the IJB-C dataset, whereas it obtains the second DSC in open-set recognition experiments. We set

TABLE VII
C LASSIFICATION ACCURACIES (%) FOR D IFFERENT u
VALUES ON C IFAR -100 DATASET

λ = (1/2 × batch_size2 ), and m = u/2, where u is

the hypersphere radius.
We do not need λ and m parameters for closed-
set recognition. For closed-set recognition experiments, we
used the ResNet-18 architecture as backbone for moderate
sized datasets, and the ResNet-101 architecture is used for Fig. 7. Distance matrix is computed by using the centers of the testing classes.
large-scale face recognition dataset. For updating network The four classes that are not used in training are closer to their semantically
weights, we used Adam optimization strategy for large-scale related classes in the learned embedding space.
face recognition whereas stochastic gradient descent (SGD) is
used for moderate size datasets. The learning rate is set to 10−3
for face recognition and to 0.5 for moderate-sized datasets. are not preserved for the training classes since the Euclidean
Regarding the scale parameter u, we have conducted exper- and angular distances between the class centers are equivalent.
iments by selecting different values. Experiments verify that However, if the proposed method returns good CNN features,
the selection of u is not very important as long as it is not we expect the samples belonging to classes not used in training
fixed to small values such as 1. Theoretically, the data samples to lie closer to their semantically related training classes.
lie on the surface of a growing hypersphere as the dimension To verify this, we trained our proposed method by using six
increases. For smaller dimensions, we can choose smaller classes from the Cifar-10 dataset: airplane, automobile, bird,
values of u as we did for illustrations experiments (we fixed u cat, deer, and frog. Then, we extracted the CNN features of
to 5 for 2-D inputs). But, for larger dimensions, we need higher all testing data coming from ten classes by using the trained
values. Also, after some value, increasing u value does not network. Then, we computed the average CNN feature vector
change the results much. The accuracies that are obtained for of each class, and computed the distances between them. Fig. 7
Cifar-100 dataset for various u values are given in Table VII. illustrates the computed distances between the centers. The
The threshold, m, parameter is only used for open set distances between the classes used for training are similar and
recognition problems. Moreover, we did not have any trouble they change between 5.8 and 6.7. The four classes, the dog,
for fixing it since our centers are fixed to certain positions. horse, ship, and truck classes, that are not used for training are
We already know the distances between the class centers represented with red color in the figure. As seen in the figure,
chosen as simplex vertices. All distances are equal, we simply the dog class is closest to its semantically similar cat class,
checked the largest intraclass distances within classes and the truck class is closer to its semantically similar automobile
determined a margin based on this. Setting margin term to class, the horse class is closest to the deer class, and the ship
half of the hypersphere radius worked well for all cases. For class is closer to the visually similar airplane class (since the
all experiments except for face verifications ones, we did not backgrounds—blue sky and sea—are mostly similar for these
fine-tune our classification network from a pretrained network two classes). This clearly shows that the proposed method
and started the network weights from scratch by initializing returns semantically meaningful embeddings.
with random weights which is the common practice used for
initializing network weights. To train on the large-scale face
recognition dataset, we fine-tuned our backbone network from R EFERENCES
a pretrained ArcFace network. To this end, we first froze the [1] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A comprehensive study on center
backbone network weights and updated only DAM parameters loss for deep face recognition,” Int. J. Comput. Vis., vol. 127, nos. 6–7,
for DSCDAM method and fully connected layer weights for pp. 668–683, Jun. 2019.
DSCRNA method. Once these weights are learned, we stopped [2] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning
approach for deep face recognition,” in Proc. Eur. Conf. Comput. Vis.,
the freezing the backbone network weights and trained the full 2016, pp. 499–515.
architectures end-to-end manner. [3] X. Zhang, Z. Fang, Y. Wen, Z. Li, and Y. Qiao, “Range loss for deep
face recognition with long-tailed training data,” in Proc. IEEE Int. Conf.
Comput. Vis. (ICCV), Oct. 2017, pp. 5419–5428.
S EMANTICALLY R ELATED F EATURE E MBEDDINGS
[4] X. Wei, H. Wang, B. Scotney, and H. Wan, “Minimum margin loss
Experiments were also conducted to evaluate if the proposed for deep face recognition,” Pattern Recognit., vol. 97, Jan. 2020,
method yields feature embeddings that effectively cluster Art. no. 107012.
[5] J. Deng, Y. Zhou, and S. Zafeiriou, “Marginal loss for deep face recog-
semantically and visually similar classes in open-set recogni- nition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops
tion settings. It should be noted that the semantic relationships (CVPRW), Jul. 2017, pp. 2006–2014.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
8190 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO. 5, MAY 2025

[6] H. Cevikalp, B. Uzun, O. Köpüklü, and G. Ozturk, “Deep compact [31] M. Weber, “Clustering by using a simplex structure,” ZIB, Berlin,
polyhedral conic classifier for open and closed set recognition,” Pattern Germany, ZIB-Rep. 04-03, 2003.
Recognit., vol. 119, Nov. 2021, Art. no. 108080. [32] A. Rahimi and B. Recht, “Clustering with normalized cuts is clustering
[7] C. Qi and F. Su, “Contrastive-center loss for deep neural networks,” in with a hyperplane,” in Proc. Stat. Learn. Comput. Vis., 2004, pp. 56–69.
Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2017, pp. 2851–2855. [33] J. Shi and J. Malik, “Normalized cuts and image segmentation,”
[8] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905,
embedding for face recognition and clustering,” in Proc. IEEE Conf. Aug. 2000.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 815–823. [34] Z. Zhu et al., “A geometric analysis of neural collapse with uncon-
[9] E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” in strained features,” in Proc. Adv. Neural Inf. Process. Syst., 2021,
Proc. Int. Conf. Learn. Recognit. (ICLR) Workshops, 2015, pp. 84–92. pp. 1–15.
[10] K. Sohn, “Improved deep metric learning with multi-class N-pair loss [35] W. J. Scheirer, A. Rocha, A. Sapkota, and T. E. Boult, “Towards
objective,” in Proc. Neural Inf. Process. Syst. (NIPS), 2016, pp. 1–9. open set recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35,
[11] S. Roy, M. Harandi, R. Nock, and R. Hartley, “Siamese networks: The pp. 1757–1772, 2013.
tale of two manifolds,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. [36] A. R. Dhamija, M. Gunther, and T. E. Boult, “Reducing network
(ICCV), Oct. 2019, pp. 3046–3055. agnostophobia,” in Proc. Neural Inf. Process. Syst. (NeurIPS), 2018,
[12] W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “SphereFace: pp. 1–12.
Deep hypersphere embedding for face recognition,” in Proc. IEEE Conf. [37] D. Miller, N. Sünderhauf, M. Milford, and F. Dayoub, “Class anchor
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6738–6746. clustering: A loss for distance-based open set recognition,” in Proc. IEEE
[13] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax loss Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 3569–3577.
for convolutional neural networks,” in Proc. Int. Conf. Mach. Learn. [38] H. Cevikalp, B. Uzun, Y. Salk, H. Saribas, and O. Köpüklü, “From
(ICML), 2016, pp. 1–10. anomaly detection to open set recognition: Bridging the gap,” Pattern
[14] K. Zhao, J. Xu, and M.-M. Cheng, “RegularFace: Deep face recognition Recognit., vol. 138, Jun. 2023, Art. no. 109385.
via exclusive regularization,” in Proc. IEEE/CVF Conf. Comput. Vis. [39] C. Geng, S.-J. Huang, and S. Chen, “Recent advances in open set
Pattern Recognit. (CVPR), Jun. 2019, pp. 1136–1144. recognition: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43,
no. 10, pp. 3614–3631, Oct. 2021.
[15] H. Wang et al., “CosFace: Large margin cosine loss for deep face
recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., [40] M. Balko, A. Pór, M. Scheucher, K. Swanepoel, and P. Valtr, “Almost-
Jun. 2018, pp. 5265–5274. equidistant sets,” Graphs Combinatorics, vol. 36, no. 3, pp. 729–754,
May 2020.
[16] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “ArcFace: Additive angular
[41] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
margin loss for deep face recognition,” in Proc. IEEE/CVF Conf.
Surpassing human-level performance on ImageNet classification,” in
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4685–4694.
Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034.
[17] W. Liu et al., “Learning towards minimum hyperspherical energy,” in
[42] C. Cortes and V. Vapnik, “Support vector networks,” Mach. Learn.,
Proc. Neural Inf. Process. Syst. (NeurIPS), 2018, pp. 1–12.
vol. 20, pp. 273–297, Sep. 1995.
[18] R. Lin et al., “Regularizing neural networks via minimizing hyperspher-
[43] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. R. Mullers,
ical energy,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
“Fisher discriminant analysis with kernels,” in Proc. Neural Netw. Signal
(CVPR), Jun. 2020, pp. 6916–6925.
Process., IEEE Signal Process. Soc. Workshop, Aug. 1999, pp. 41–48.
[19] W. Liu, R. Lin, Z. Liu, L. Xiong, B. Scholkopf, and A. Weller, “Learning [44] A. Vedaldi and A. Zisserman, “Efficient additive kernels via explicit
with hyperspherical uniformity,” in Proc. Int. Conf. Artif. Intell. Statist. feature maps,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3,
(AISTATS), 2021, pp. 1–13. pp. 480–492, Mar. 2012.
[20] Y. Duan, J. Lu, and J. Zhou, “UniformFace: Learning deep equidis- [45] A. Rahimi and B. Recht, “Random features for large-scale kernel
tributed representation for face recognition,” in Proc. IEEE/CVF Conf. machines,” in Proc. NIPS, 2007, pp. 1–8.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3415–3424.
[46] J. Deng, J. Guo, T. Liu, M. Gong, and S. Zafeiriou, “Sub-center arcface:
[21] T.-T. Do, T. Tran, I. Reid, V. Kumar, T. Hoang, and G. Carneiro, Boosting face recognition by large-scale noisy web faces,” in Proc. Eur.
“A theoretically sound upper bound on the triplet loss for improving the Conf. Comput. Vis., 2020, pp. 741–757.
efficiency of deep distance metric learning,” in Proc. IEEE/CVF Conf. [47] A. Torralba, R. Fergus, and W. T. Freeman, “80 million tiny images: A
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10396–10405. large data set for nonparametric object and scene recognition,” IEEE
[22] V. Papyan, X. Y. Han, and D. L. Donoho, “Prevalence of neural collapse Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 1958–1970,
during the terminal phase of deep learning training,” Proc. Nat. Acad. Nov. 2008.
Sci. USA, vol. 117, no. 40, pp. 24652–24663, Oct. 2020. [48] H.-M. Yang, X.-Y. Zhang, F. Yin, Q. Yang, and C.-L. Liu, “Convolu-
[23] F. Pernici, M. Bruni, C. Baecchi, and A. D. Bimbo, “Maximally compact tional prototype network for open set recognition,” IEEE Trans. Pattern
and separated features with regular polytope networks,” in Proc. CVPR Anal. Mach. Intell., vol. 44, no. 5, pp. 2358–2370, May 2022.
Workshops, 2019, pp. 46–53. [49] R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and
[24] F. Pernici, M. Bruni, C. Baecchi, and A. D. Bimbo, “Regular polytope T. Naemura, “Classification-reconstruction learning for open-set recog-
networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, nition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
pp. 4373–4387, Sep. 2022. (CVPR), Jun. 2019, pp. 4011–4020.
[25] T. Kasarla, G. J. Burghouts, M. van Spengler, E. van der Pol, [50] G. Chen et al., “Learning open set network with discriminative reciprocal
R. Cucchiara, and P. Mettes, “Maximum class separation as inductive points,” in Proc. ECCV, 2020, pp. 507–522.
bias in one matrix,” in Proc. Adv. Neural Inf. Process. Syst., 2022, [51] O. Russakovsky et al., “ImageNet large scale visual recognition chal-
pp. 1–14. lenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
[26] Q. Bytyqi, N. Wolpert, E. Schomer, and U. Schwanecke, “Prototype [52] L. Neal, M. Olson, X. Fern, W.-K. Wong, and F. Li, “Open set learning
softmax cross entropy: A new perspective on softmax cross entropy,” in with counterfactual images,” in Proc. ECCV, 2018, pp. 1–16.
Proc. Scandin. Conf. Image Anal., 2023, pp. 16–31. [53] P. Oza and V. M. Patel, “C2AE: Class conditioned auto-encoder for
[27] Y. Yang, S. Chen, X. Li, L. Xie, Z. Lin, and D. Tao, “Inducing neural open-set recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
collapse in imbalanced learning: Do we really need a learnable classifier Recognit. (CVPR), Jun. 2019, pp. 2302–2311.
at the end of deep neural network?” in Proc. Adv. Neural Inf. Process. [54] P. Perera et al., “Generative-discriminative feature representations for
Syst., 2022, pp. 37991–38002. open-set recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
[28] L. O. Jimenez and D. A. Landgrebe, “Supervised classification in high- Recognit. (CVPR), Jun. 2020, pp. 11811–11820.
dimensional space: Geometrical, statistical, and asymptotical properties [55] A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proc.
of multivariate data,” IEEE Trans. Syst., Man Cybern., C, vol. 28, no. 1, CVPR, 2016, pp. 1563–1572.
pp. 39–54, Feb. 1998. [56] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
[29] P. Hall, J. S. Marron, and A. Neeman, “Geometric representation of image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
high dimension, low sample size data,” J. Roy. Stat. Soc. Ser. B, Stat. (CVPR), Jun. 2016, pp. 770–778.
Methodol., vol. 67, no. 3, pp. 427–444, Jun. 2005. [57] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
[30] P. Kumar, L. Niveditha, and B. Ravindran, “Spectral clustering as A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
mapping to a simplex,” in Proc. ICML Workshops, 2013, pp. 1–9. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.

[58] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, “MS-Celeb-1M: A dataset Hasan Saribas received the bachelor’s degree in
and benchmark for large-scale face recognition,” in Computer Vision— electrical and electronics engineering from Ataturk
ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Cham, University, Erzurum, Türkiye, in 2011, the master’s
Switzerland: Springer, 2016, pp. 87–102. degree from the Department of Avionics, Anadolu
[59] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces University, Eskişehir, Türkiye, in 2015, and the
in the wild: A database forstudying face recognition in unconstrained Ph.D. degree from the Department of Avionics,
environments,” in Proc. Workshop Faces ‘Real-Life’ Images, Detection, Eskişehir Technical University, Eskişehir, in 2020.
Alignment, Recognit., 2008, pp. 1–15. He is currently employed as a Senior AI Research
[60] S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and Engineer with Huawei Türkiye Research and Devel-
S. Zafeiriou, “AgeDB: The first manually collected, in-the-wild age opment Center, Istanbul, Türkiye. His research
database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Work- interests include recommendation systems, image
shops (CVPRW), Jul. 2017, pp. 1997–2005. processing, machine learning, deep learning, and the control of unmanned
[61] T. Zheng, W. Deng, and J. Hu, “Cross-age LFW: A database for aerial vehicles.
studying cross-age face recognition in unconstrained environments,”
2017, arXiv:1708.08197.
[62] T. Zheng and W. Deng, “Cross-pose LFW: A database for studying
cross-pose face recognition in unconstrained environments,” Dept. Posts
Telecommun., Beijing Univ., Beijing, China, Tech. Rep. 18.01, 2018.
[63] B. Maze et al., “IARPA Janus benchmark-C: Face dataset and protocol,”
in Proc. Int. Conf. Biometrics (ICB), Feb. 2018, pp. 158–165.
[64] F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni,
“FacePoseNet: Making a case for landmark-free face alignment,” in
Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017,
pp. 1599–1608.
[65] B.-N. Kang, Y. Kim, and D. Kim, “Pairwise relational networks for
face recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
pp. 628–645.
[66] B.-N. Kang, Y. Kim, B. Jun, and D. Kim, “Attentional feature-pair
relation networks for accurate face recognition,” in Proc. IEEE/CVF
Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5471–5480.
[67] H. Cevikalp and G. G. Dordinejad, “Video based face recognition by
using discriminatively learned convex models,” Int. J. Comput. Vis.,
vol. 128, no. 12, pp. 3000–3014, Dec. 2020.

Hakan Cevikalp (Member, IEEE) received the

M.S. degree from the Department of Electrical and
Electronics Engineering, Eskişehir Osmangazi Uni-
versity, Eskişehir, Türkiye, in 2001, and the Ph.D. Bedirhan Uzun received the Master of Science
degree from the Department of Electrical Engineer- degree from the Department of Electrical and
ing and Computer Science, Vanderbilt University, Electronics Engineering, Eskişehir Osmangazi Uni-
Nashville, TN, USA, in 2005. versity, Eskişehir, Türkiye, in 2020, where he is
He worked as a Post-Doctoral Researcher with currently pursuing the Ph.D. degree.
the Lear Team, INRIA Rhoene-Alpes, Montbonnot- He served as a Research Assistant with the Depart-
Saint-Martin, France, in 2007, and at Rowan ment of Electrical and Electronics Engineering,
University, Glassboro, NJ, USA, in 2008. He is Eskişehir Osmangazi University. Currently, he is
currently working as a Full Professor with the Department of Electrical working as a Blockchain Developer with API3,
and Electronics Engineering, Eskişehir Osmangazi University. His research George Town, Grand Cayman. His research interests
interests include pattern recognition, machine learning, neural networks, image lie in the fields of computer vision and blockchain
and signal processing, optimization, and computer vision. technologies.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on May 21,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.

ArcFace - Additive Angular Margin Loss For Deep Face Recognition
No ratings yet
ArcFace - Additive Angular Margin Loss For Deep Face Recognition
17 pages
Arcface: Additive Angular Margin Loss For Deep Face Recognition
No ratings yet
Arcface: Additive Angular Margin Loss For Deep Face Recognition
10 pages
Gico Face
No ratings yet
Gico Face
14 pages
Linear Classifiers and Perceptron Algorithm
No ratings yet
Linear Classifiers and Perceptron Algorithm
5 pages
ARPL
No ratings yet
ARPL
17 pages
CosFace: Enhancing Face Recognition with Large Margin Cosine Loss
No ratings yet
CosFace: Enhancing Face Recognition with Large Margin Cosine Loss
11 pages
Understanding Linear Classifiers Basics
No ratings yet
Understanding Linear Classifiers Basics
9 pages
Large Margin Loss for Deep Learning
No ratings yet
Large Margin Loss for Deep Learning
16 pages
ArcFace: Angular Margin Loss for Face Recognition
No ratings yet
ArcFace: Angular Margin Loss for Face Recognition
11 pages
SVM Geometry and Feature Selection Guide
No ratings yet
SVM Geometry and Feature Selection Guide
8 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
Enhancing Discriminative Learning with Virtual Softmax
No ratings yet
Enhancing Discriminative Learning with Virtual Softmax
11 pages
Support Vector Machines For Histogram-Based Image Classification
No ratings yet
Support Vector Machines For Histogram-Based Image Classification
10 pages
Conic Section Classifier for Image Data
No ratings yet
Conic Section Classifier for Image Data
6 pages
Main
No ratings yet
Main
5 pages
Introduction of Support Vector Machines
No ratings yet
Introduction of Support Vector Machines
16 pages
1 s2.0 S0168927423000429 Main
No ratings yet
1 s2.0 S0168927423000429 Main
20 pages
Interface: Adjustable Angular Margin Inter-Class Loss For Deep Face Recognition
No ratings yet
Interface: Adjustable Angular Margin Inter-Class Loss For Deep Face Recognition
9 pages
3DOS: Towards 3D Open Set Learning - Benchmarking and Understanding Semantic Novelty Detection On Point Clouds
No ratings yet
3DOS: Towards 3D Open Set Learning - Benchmarking and Understanding Semantic Novelty Detection On Point Clouds
13 pages
Data Analysis with Mapper & Homology
No ratings yet
Data Analysis with Mapper & Homology
71 pages
SFace Sigmoid-Constrained Hypersphere Loss For Robust Face Recognition
No ratings yet
SFace Sigmoid-Constrained Hypersphere Loss For Robust Face Recognition
12 pages
ν-Support Vector Machines Tutorial
No ratings yet
ν-Support Vector Machines Tutorial
29 pages
Ring Loss: Face Recognition Optimization
No ratings yet
Ring Loss: Face Recognition Optimization
9 pages
Supervised and Unsupervised SVM Techniques
No ratings yet
Supervised and Unsupervised SVM Techniques
78 pages
Data Analysis and Learning Techniques
No ratings yet
Data Analysis and Learning Techniques
24 pages
1501589527da Mod14 Q1 e Text
No ratings yet
1501589527da Mod14 Q1 e Text
12 pages
Understanding Hyperplanes in SVM
No ratings yet
Understanding Hyperplanes in SVM
26 pages
Perceptron
No ratings yet
Perceptron
23 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
8 pages
Pattern Classification and Scene Analysis PDF
100% (1)
Pattern Classification and Scene Analysis PDF
8 pages
Simon Foucart - Mathematical Pictures at A Data Science Exhibition (2022, Cambridge University Press) - Libgen - Li
No ratings yet
Simon Foucart - Mathematical Pictures at A Data Science Exhibition (2022, Cambridge University Press) - Libgen - Li
339 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
53 pages
SVM PPT
No ratings yet
SVM PPT
32 pages
Robust One-Class Classification Using Deep Kernel Spectral Regression
No ratings yet
Robust One-Class Classification Using Deep Kernel Spectral Regression
11 pages
Online Learning With Kernels: Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson, Member, IEEE
No ratings yet
Online Learning With Kernels: Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson, Member, IEEE
12 pages
Gaudioso Et Al 2020
No ratings yet
Gaudioso Et Al 2020
7 pages
Wang 20 K
No ratings yet
Wang 20 K
11 pages
Maximizing Margin in Linear Classification
No ratings yet
Maximizing Margin in Linear Classification
28 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Linear Distance Metric Learning With Noisy Labels: Meysam Alishahi
No ratings yet
Linear Distance Metric Learning With Noisy Labels: Meysam Alishahi
53 pages
Math of Machine Learning
No ratings yet
Math of Machine Learning
103 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
SVM for Image Classification Techniques
No ratings yet
SVM for Image Classification Techniques
32 pages
15 SVM
No ratings yet
15 SVM
61 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
52 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
Large Margin Classification with Perceptron
No ratings yet
Large Margin Classification with Perceptron
19 pages
K-Means Clustering Paper
No ratings yet
K-Means Clustering Paper
28 pages
Integral Invariants for Shape Matching
No ratings yet
Integral Invariants for Shape Matching
17 pages
HyperSpaceX Final
No ratings yet
HyperSpaceX Final
24 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Image Classification with CNNs Guide
No ratings yet
Image Classification with CNNs Guide
78 pages
Perceptron Algorithm and Convergence
No ratings yet
Perceptron Algorithm and Convergence
5 pages
Understanding Perceptrons and Deep Learning
No ratings yet
Understanding Perceptrons and Deep Learning
36 pages
Support Vector Machine (SVM) Explained
No ratings yet
Support Vector Machine (SVM) Explained
16 pages
On Clustering Binary Data: Tao Li Shenghuo Zhu
No ratings yet
On Clustering Binary Data: Tao Li Shenghuo Zhu
5 pages
Creusabro 4800: Industeel
No ratings yet
Creusabro 4800: Industeel
6 pages
Ratio&Proportion WS 1
No ratings yet
Ratio&Proportion WS 1
3 pages
Research Proposals
No ratings yet
Research Proposals
4 pages
NITK Surathkal Contact Directory
No ratings yet
NITK Surathkal Contact Directory
22 pages
ECEG-6306 - : Power System Planning and Management
100% (3)
ECEG-6306 - : Power System Planning and Management
21 pages
Occupational Stress and Job Satisfaction
No ratings yet
Occupational Stress and Job Satisfaction
5 pages
Experiment # 01: Lab Report
No ratings yet
Experiment # 01: Lab Report
12 pages
ME-251 Engineering Design and Graphics
No ratings yet
ME-251 Engineering Design and Graphics
23 pages
Psychology: Free Will vs Determinism
No ratings yet
Psychology: Free Will vs Determinism
28 pages
Hsslive Xii Bs CH 2 Study Notes Binoy
No ratings yet
Hsslive Xii Bs CH 2 Study Notes Binoy
17 pages
Time Series Decomposition Guide
No ratings yet
Time Series Decomposition Guide
9 pages
Grand Vitara SZ 2.4 Ta 4X2
No ratings yet
Grand Vitara SZ 2.4 Ta 4X2
7 pages
Realism and Naturalism
No ratings yet
Realism and Naturalism
9 pages
HYDROCARBON EXPLORATION in UNCONVENTIONAL RESERVOIR
No ratings yet
HYDROCARBON EXPLORATION in UNCONVENTIONAL RESERVOIR
16 pages
Structural Analysis Course Overview
No ratings yet
Structural Analysis Course Overview
8 pages
GAER Swing Check Valve Eng
No ratings yet
GAER Swing Check Valve Eng
1 page
Challenges in Audiovisual Translation
No ratings yet
Challenges in Audiovisual Translation
7 pages
Eddy-Current Testing - Wikipedia PDF
No ratings yet
Eddy-Current Testing - Wikipedia PDF
5 pages
Online Transaction Fraud Detection Using Backlogging On Ecommerce Website
No ratings yet
Online Transaction Fraud Detection Using Backlogging On Ecommerce Website
20 pages
Recruiter Mangalore-3
No ratings yet
Recruiter Mangalore-3
2 pages
Conveying Messages Through Written Communication
No ratings yet
Conveying Messages Through Written Communication
41 pages
Psychics and Seances
No ratings yet
Psychics and Seances
51 pages
The Circle of Care For Older Adults With Hearing Loss and Comorbidities: A Case Study of A Geriatric Audiology Clinic
No ratings yet
The Circle of Care For Older Adults With Hearing Loss and Comorbidities: A Case Study of A Geriatric Audiology Clinic
21 pages
NTU EE2010 Signals and Systems Course
No ratings yet
NTU EE2010 Signals and Systems Course
5 pages
Pharmaceutical Preformulation and Formulation A Practical Guide From Candidate Drug Selection To Commercial Dosage Form 2nd Edition
No ratings yet
Pharmaceutical Preformulation and Formulation A Practical Guide From Candidate Drug Selection To Commercial Dosage Form 2nd Edition
14 pages
MATLAB Signal Processing Lab
No ratings yet
MATLAB Signal Processing Lab
2 pages
Pharmaceutical Packaging
100% (1)
Pharmaceutical Packaging
19 pages
Specimen Shipping Guidelines
No ratings yet
Specimen Shipping Guidelines
2 pages
Philippine Science and Technology History
No ratings yet
Philippine Science and Technology History
52 pages
Mukesh Sahani Account Statement 2025
No ratings yet
Mukesh Sahani Account Statement 2025
6 pages

Reaching Nirvana: Maximizing The Margin in Both Euclidean and Angular Spaces For Deep Neural Network Classification

Uploaded by

Reaching Nirvana: Maximizing The Margin in Both Euclidean and Angular Spaces For Deep Neural Network Classification

Uploaded by

8178 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 36, NO.

Reaching Nirvana: Maximizing the Margin in Both

Our proposed method returns compact acceptance regions as

increase for face verification experiments conducted in this

Fig. 5. Outputs of the deep neural network classifiers trained by using

of training samples per class is 5000 for the selected classes

TABLE III TABLE V

λ = (1/2 × batch_size2 ), and m = u/2, where u is

Hakan Cevikalp (Member, IEEE) received the

You might also like