CosineEmbeddingLoss
余弦相似度损失函数,用于判断输入的两个向量是否相似。 常用于非线性词向量学习以及半监督学习。
对于包含NNN个样本的batch数据 D(a,b,y)D(a,b,y)D(a,b,y)。aaa,bbb 代表输入的两个向量,yyy代表真实的类别标签,属于{1,−1}\{1,-1\}{1,−1},分别表示相似与不相似。 第iii个样本对应的losslossloss,如下:
li={1−cos(ai,bi), if yi=1max(0,cos(ai,bi)−margin), if yi=−1l_{i}=\left\{\begin{array}{ll}1-\cos \left(a_{i},b_{i}\right), & \text { if } y_{i}=1 \\ \max \left(0, \cos \left(a_{i}, b_{i}\right)-\operatorname{margin}\right), & \text { if } y_{i}=-1\end{array}\right.li={1−cos(ai,bi),max(0,cos(ai,bi)−margin), if yi=1 if yi=−1
cos(ai,bi)\cos \left(a_{i},b_{i}\right)cos(ai,bi)代表计算向量aia_{i}ai 和bib_{i}bi夹角的余弦值。
当标签yi=−1y_{i}=-1yi=−1且cos(ai,bi)<margin\cos \left(a_{i},b_{i}\right) < margincos(ai,bi)<margin,li=0l_{i}=0li=0。
此时输入样本不相似且cos(ai,bi)\cos \left(a_{i},b_{i}\right)cos(ai,bi)比较小,属于易分类样本,不计入losslossloss.
当标签yi=−1y_{i}=-1yi=−1且cos(ai,bi)>margin\cos \left(a_{i},b_{i}\right) > margincos(ai,bi)>margin,li=cos(ai,bi)−marginl_{i}=\cos \left(a_{i},b_{i}\right)-\operatorname{margin}li=cos(ai,bi)−margin。
当yi=1y_{i}=1yi=1时,li=1−cos(ai,bi)l_{i}=1-\cos \left(a_{i},b_{i}\right)li=1−cos(ai,bi)。 特别的,当aia_{i}ai 和bib_{i}bi夹角为0时,li=0l_{i}=0li=0。
class CosineEmbeddingLoss(_Loss):
__constants__ = ['margin', 'reduction']
def __init__(self, margin=0., size_average=None, reduce=None, reduction='mean'):
super(CosineEmbeddingLoss, self).__init__(size_average, reduce, reduction)
self.margin = margin
def forward(self, input1, input2, target):
return F.cosine_embedding_loss(input1, input2, target, margin=self.margin, reduction=self.reduction)
pytorch中通过torch.nn.CosineEmbeddingLoss
类实现,也可以直接调用F.cosine_embedding_loss
函数,代码中的size_average
与reduce
已经弃用。reduction有三种取值mean
, sum
, none
,对应不同的返回ℓ(x,y)\ell(x, y)ℓ(x,y)。 默认为mean
,对应于上述losslossloss的计算
L={l1,…,lN}L=\left\{l_{1}, \ldots, l_{N}\right\}L={l1,…,lN}
ℓ(x,y)={L, if reduction = ’none’ 1N∑i=1Nli, if reduction = ’mean’ ∑i=1Nli if reduction = ’sum’ \ell(x, y)=\left\{\begin{array}{ll}L, & \text { if reduction }=\text { 'none' } \\ \frac1{N}\sum_{i=1}^{N} l_{i}, & \text { if reduction }=\text { 'mean' } \\ \sum_{i=1}^{N} l_{i} & \text { if reduction }=\text { 'sum' }\end{array} \right.ℓ(x,y)=⎩⎨⎧L,N1∑i=1Nli,∑i=1Nli if reduction = ’none’ if reduction = ’mean’ if reduction = ’sum’
marginmarginmargin取值为[−1,1][-1,1][−1,1],建议取值[0,0.5][0,0.5][0,0.5]
例子:
import torch
import torch.nn as nn
torch.manual_seed(20)
cosine_loss = nn.CosineEmbeddingLoss(margin=0.2)
a = torch.randn(100, 128, requires_grad=True)
b = torch.randn(100, 128, requires_grad=True)
print(a.size())
print(b.size())
y = 2 * torch.empty(100).random_(2) - 1
output = cosine_loss(a, b, y)
print(output.item())
# none 失效?
triplet_loss = nn.CosineEmbeddingLoss(margin=0.2, reduction="none")
output = cosine_loss(a, b, y)
print(output.item())
输出结果:
torch.Size([100, 128])
torch.Size([100, 128])
0.49418219923973083
0.49418219923973083
似乎reduction='none'
也是返回一个标量,为什么?可能是函数虽然定义了,但是未实现
HingeEmbeddingLoss
用于判断两个向量是否相似,输入是两个向量之间的距离。 常用于非线性词向量学习以及半监督学习。
对于包含NNN个样本的batch数据 D(x,y)D(x,y)D(x,y)。xxx代表两个向量的距离,yyy代表真实的标签,yyy中元素的值属于{1,−1}\{1,-1\}{1,−1},分别表示相似与不相似。 第iii个样本对应的losslossloss,如下:
li={xi, if yi=1max(0,margin−xi), if yi=−1l_{i}=\left\{\begin{array}{ll}x_{i}, & \text { if }y_{i}=1 \\ max(0,\text{margin} -x_{i}), & \text { if }y_{i}=-1\end{array}\right.li={xi,max(0,margin−xi), if yi=1 if yi=−1
当yi=−1y_{i}=-1yi=−1, 即两个向量不相似时,若距离xix_{i}xi大于margin
,则属于易判断样本,不计入loss,li=0l_{i}=0li=0
class HingeEmbeddingLoss(_Loss):
__constants__ = ['margin', 'reduction']
def __init__(self, margin=1.0, size_average=None, reduce=None, reduction='mean'):
super(HingeEmbeddingLoss, self).__init__(size_average, reduce, reduction)
self.margin = margin
def forward(self, input, target):
return F.hinge_embedding_loss(input, target, margin=self.margin, reduction=self.reduction)
pytorch中通过torch.nn.HingeEmbeddingLoss
类实现,也可以直接调用F.hinge_embedding_loss
函数,代码中的size_average
与reduce
已经弃用。reduction有三种取值mean
, sum
, none
,对应不同的返回ℓ(x,y)\ell(x, y)ℓ(x,y)。 默认为mean
,对应于上述losslossloss的计算。margin
默认为1
L={l1,…,lN}L=\left\{l_{1}, \ldots, l_{N}\right\}L={l1,…,lN}
ℓ(x,y)={L, if reduction = ’none’ 1N∑i=1Nli, if reduction = ’mean’ ∑i=1Nli if reduction = ’sum’ \ell(x, y)=\left\{\begin{array}{ll}\operatorname L, & \text { if reduction }=\text { 'none' } \\ \frac1{N}\sum_{i=1}^{N} l_{i}, & \text { if reduction }=\text { 'mean' } \\ \sum_{i=1}^{N} l_{i} & \text { if reduction }=\text { 'sum' }\end{array} \right.ℓ(x,y)=⎩⎨⎧L,N1∑i=1Nli,∑i=1Nli if reduction = ’none’ if reduction = ’mean’ if reduction = ’sum’
例子:
import torch
import torch.nn as nn
torch.manual_seed(20)
hinge_loss = nn.HingeEmbeddingLoss(margin=0.2)
a = torch.randn(100, 128, requires_grad=True)
b = torch.randn(100, 128, requires_grad=True)
x = 1 - torch.cosine_similarity(a, b)
# 定义a与b之间的距离为x
print(x.size())
y = 2 * torch.empty(100).random_(2) - 1
output = hinge_loss(x, y)
print(output.item())
hinge_loss = nn.HingeEmbeddingLoss(margin=0.2, reduction="none")
output = hinge_loss(x, y)
print(output)
输出:
torch.Size([100])
0.4938560426235199
tensor([0.0000, 1.0821, 0.0000, 1.0337, 1.0798, 0.0000, 1.0582, 0.0000, 0.8795,
0.0000, 1.1377, 0.0000, 0.9727, 1.0088, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.9941, 1.0539, 0.0000, 0.0000, 0.0000, 1.1907, 0.9647, 0.8875,
0.8585, 0.9471, 0.0000, 0.0000, 0.9677, 0.0000, 0.0000, 0.0000, 0.8393,
0.0000, 0.9900, 1.1510, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.9491, 0.9202, 0.0000, 0.9338, 1.0044, 0.0000, 1.1716, 1.0480, 0.8654,
0.8302, 0.0000, 0.8969, 0.0000, 0.0000, 1.0293, 0.0000, 1.1107, 0.8257,
0.9162, 1.0796, 1.0330, 0.0000, 0.9933, 0.0000, 0.0000, 1.0066, 0.0000,
0.0000, 0.0000, 0.0000, 0.9410, 0.8609, 1.0060, 0.0000, 0.8454, 0.0000,
1.0362, 0.0000, 1.0253, 1.0560, 1.0759, 0.9888, 0.0000, 1.0147, 0.8566,
0.9453, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.9874, 0.0000, 0.0000,
1.0352], grad_fn=<AddBackward0>)