loss函数之CosineEmbeddingLoss，HingeEmbeddingLoss

最新推荐文章于 2025-06-29 22:10:20 发布

旺旺棒棒冰

最新推荐文章于 2025-06-29 22:10:20 发布

阅读量3.4w

点赞数 14

CC 4.0 BY-SA版权

分类专栏：深度学习理论文章标签： loss 余弦loss 余弦损失函数

本文链接：https://blog.csdn.net/ltochange/article/details/118071383

深度学习理论专栏收录该内容

28 篇文章

订阅专栏

本文介绍了PyTorch中CosineEmbeddingLoss和HingeEmbeddingLoss两种损失函数，它们常用于非线性词向量学习和半监督学习。CosineEmbeddingLoss衡量向量间余弦相似度，判断样本是否相似，而HingeEmbeddingLoss则基于向量间的距离。两者都允许设置margin参数，调整分类边界。通过实例展示了如何使用这两种损失函数，并解释了不同reduction参数对损失计算的影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

CosineEmbeddingLoss

余弦相似度损失函数，用于判断输入的两个向量是否相似。常用于非线性词向量学习以及半监督学习。

对于包含 $N$ 个样本的batch数据 $D (a, b, y)$ 。 $a$ , $b$ 代表输入的两个向量， $y$ 代表真实的类别标签，属于 ${1,-1\}$ ，分别表示相似与不相似。第 $i$ 个样本对应的 $l o s s$ ，如下：

$yi=−1l_{i}=\left\{\begin{array}{ll}1-\cos \left(a_{i},b_{i}\right), & \text { if } y_{i}=1 \\ \max \left(0, \cos \left(a_{i}, b_{i}\right)-\operatorname{margin}\right), & \text { if } y_{i}=-1\end{array}\right.$

$cos⁡(ai,bi)\cos \left(a_{i},b_{i}\right)$ 代表计算向量 $a_{i}$ 和 $b_{i}$ 夹角的余弦值。

当标签 $y_{i}=-1$ 且 $cos⁡(ai,bi)<margin\cos \left(a_{i},b_{i}\right) < margin$ ， $l_{i}=0$ 。

此时输入样本不相似且 $cos⁡(ai,bi)\cos \left(a_{i},b_{i}\right)$ 比较小，属于易分类样本，不计入 $l o s s$ .

当标签 $y_{i}=-1$ 且 $cos⁡(ai,bi)>margin\cos \left(a_{i},b_{i}\right) > margin$ ， $li=cos⁡(ai,bi)−margin⁡l_{i}=\cos \left(a_{i},b_{i}\right)-\operatorname{margin}$ 。

当 $y_{i}=1$ 时， $li=1−cos⁡(ai,bi)l_{i}=1-\cos \left(a_{i},b_{i}\right)$ 。特别的，当 $a_{i}$ 和 $b_{i}$ 夹角为0时， $l_{i}=0$ 。

class CosineEmbeddingLoss(_Loss):
    __constants__ = ['margin', 'reduction']
    def __init__(self, margin=0., size_average=None, reduce=None, reduction='mean'):
        super(CosineEmbeddingLoss, self).__init__(size_average, reduce, reduction)
        self.margin = margin
    def forward(self, input1, input2, target):
        return F.cosine_embedding_loss(input1, input2, target, margin=self.margin, reduction=self.reduction)

pytorch中通过torch.nn.CosineEmbeddingLoss类实现，也可以直接调用F.cosine_embedding_loss 函数，代码中的size_average与reduce已经弃用。reduction有三种取值mean, sum, none，对应不同的返回 $ℓ(x,y)\ell(x, y)$ 。默认为mean，对应于上述 $l o s s$ 的计算

$L={l1,…,lN}L=\left\{l_{1}, \ldots, l_{N}\right\}$

$\ell(x, y)=\left\{\begin{array}{ll}L, & \text { if reduction }=\text { 'none' } \\ \frac1{N}\sum_{i=1}^{N} l_{i}, & \text { if reduction }=\text { 'mean' } \\ \sum_{i=1}^{N} l_{i} & \text { if reduction }=\text { 'sum' }\end{array} \right.$

$m a r g i n$ 取值为 $[- 1, 1]$ ，建议取值 $[0, 0.5]$

例子：

import torch
import torch.nn as nn

torch.manual_seed(20)
cosine_loss = nn.CosineEmbeddingLoss(margin=0.2)
a = torch.randn(100, 128, requires_grad=True)
b = torch.randn(100, 128, requires_grad=True)
print(a.size())
print(b.size())
y = 2 * torch.empty(100).random_(2) - 1
output = cosine_loss(a, b, y)
print(output.item())

# none 失效？
triplet_loss = nn.CosineEmbeddingLoss(margin=0.2, reduction="none")
output = cosine_loss(a, b, y)
print(output.item())

输出结果：

torch.Size([100, 128])
torch.Size([100, 128])
0.49418219923973083
0.49418219923973083

似乎reduction='none'也是返回一个标量，为什么？可能是函数虽然定义了，但是未实现

HingeEmbeddingLoss

用于判断两个向量是否相似，输入是两个向量之间的距离。常用于非线性词向量学习以及半监督学习。

对于包含 $N$ 个样本的batch数据 $D (x, y)$ 。 $x$ 代表两个向量的距离， $y$ 代表真实的标签， $y$ 中元素的值属于 ${1,-1\}$ ，分别表示相似与不相似。第 $i$ 个样本对应的 $l o s s$ ，如下：

$yi=−1l_{i}=\left\{\begin{array}{ll}x_{i}, & \text { if }y_{i}=1 \\ max(0,\text{margin} -x_{i}), & \text { if }y_{i}=-1\end{array}\right.$

当 $y_{i}=-1$ , 即两个向量不相似时，若距离 $x_{i}$ 大于margin，则属于易判断样本，不计入loss， $l_{i}=0$

class HingeEmbeddingLoss(_Loss):
    __constants__ = ['margin', 'reduction']
    def __init__(self, margin=1.0, size_average=None, reduce=None, reduction='mean'):
        super(HingeEmbeddingLoss, self).__init__(size_average, reduce, reduction)
        self.margin = margin
    def forward(self, input, target):
        return F.hinge_embedding_loss(input, target, margin=self.margin, reduction=self.reduction)

pytorch中通过torch.nn.HingeEmbeddingLoss类实现，也可以直接调用F.hinge_embedding_loss 函数，代码中的size_average与reduce已经弃用。reduction有三种取值mean, sum, none，对应不同的返回 $ℓ(x,y)\ell(x, y)$ 。默认为mean，对应于上述 $l o s s$ 的计算。margin默认为1

$L={l1,…,lN}L=\left\{l_{1}, \ldots, l_{N}\right\}$

$\ell(x, y)=\left\{\begin{array}{ll}\operatorname L, & \text { if reduction }=\text { 'none' } \\ \frac1{N}\sum_{i=1}^{N} l_{i}, & \text { if reduction }=\text { 'mean' } \\ \sum_{i=1}^{N} l_{i} & \text { if reduction }=\text { 'sum' }\end{array} \right.$

例子：

import torch
import torch.nn as nn

torch.manual_seed(20)
hinge_loss = nn.HingeEmbeddingLoss(margin=0.2)
a = torch.randn(100, 128, requires_grad=True)
b = torch.randn(100, 128, requires_grad=True)
x = 1 - torch.cosine_similarity(a, b)
# 定义a与b之间的距离为x
print(x.size())
y = 2 * torch.empty(100).random_(2) - 1
output = hinge_loss(x, y)
print(output.item())

hinge_loss = nn.HingeEmbeddingLoss(margin=0.2, reduction="none")
output = hinge_loss(x, y)
print(output)

输出：

torch.Size([100])
0.4938560426235199
tensor([0.0000, 1.0821, 0.0000, 1.0337, 1.0798, 0.0000, 1.0582, 0.0000, 0.8795,
        0.0000, 1.1377, 0.0000, 0.9727, 1.0088, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.9941, 1.0539, 0.0000, 0.0000, 0.0000, 1.1907, 0.9647, 0.8875,
        0.8585, 0.9471, 0.0000, 0.0000, 0.9677, 0.0000, 0.0000, 0.0000, 0.8393,
        0.0000, 0.9900, 1.1510, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.9491, 0.9202, 0.0000, 0.9338, 1.0044, 0.0000, 1.1716, 1.0480, 0.8654,
        0.8302, 0.0000, 0.8969, 0.0000, 0.0000, 1.0293, 0.0000, 1.1107, 0.8257,
        0.9162, 1.0796, 1.0330, 0.0000, 0.9933, 0.0000, 0.0000, 1.0066, 0.0000,
        0.0000, 0.0000, 0.0000, 0.9410, 0.8609, 1.0060, 0.0000, 0.8454, 0.0000,
        1.0362, 0.0000, 1.0253, 1.0560, 1.0759, 0.9888, 0.0000, 1.0147, 0.8566,
        0.9453, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.9874, 0.0000, 0.0000,
        1.0352], grad_fn=<AddBackward0>)