【论文阅读】Learning Semantically Enhanced Feature for Fine-Grained Image Classification

该研究提出了一种用于细粒度图像分类的新方法,不依赖复杂的part定位,而是通过增强全局特征的子特征语义。通过通道排列将CNN特征通道分组并用正则化引导激活在对象关键部位,实现特征增强。方法包括语义分组模块和特征增强模块,两者结合提升模型性能。实验表明,该方法与最先进的方法表现相当,并提供了开源代码。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

摘要

We aim to provide a computationally cheap yet effective approach for fine-grained image classification (FGIC) in this letter. Unlike previous methods that rely on complex part localization modules, our approach learns fine-grained features by enhancing the semantics of sub-features of a global feature. Specifically, we first achieve the sub-feature semantic by arranging feature channels of a CNN into different groups through channel permutation. Meanwhile, to enhance the discriminability of sub-features, the groups are guided to be activated on object parts with strong discriminability by a weighted combination regularization. Our approach is parameter parsimonious and can be easily integrated into the backbone model as a plug-and-play module for end-to-end training with only image-level supervision. Experiments verified the effectiveness of our approach and validated its comparable performance to the state-of-the-artmethods. Code is available at https:// github.com/ cswluo/ SEF

本文旨在为细粒度图像分类(FGIC)提供一种计算量小但效果好的方法。与以往依赖复杂part定位模块的方法不同,我们的方法通过增强全局特征子特征的语义来学习细粒度特征。具体地说,我们首先通过通道排列将CNN的特征频道分成不同的组来实现子特征语义。同时,为了提高子特征的可区分性,通过加权组合正则化,引导分组在可区分性较强的object parts被激活。我们的方法参数很少,可以很容易地集成到主干模型中,作为即插即用模块,用于端到端培训,只需图像级别的监督。实验验证了该方法的有效性,并验证了其性能可与最先进的方法相媲美。有关代码,请访问https://github.com/cswluo/sef

具体实现

图1.整体框架
图1.整体框架.
语义分组模块将CNN的最后一层卷积特征通道(用混合色块表示)分成不同的组(用不同的颜色表示)。全局特征及其子特征(分组特征)通过平均池化从排列的特征通道中获得。灰色块中的淡黄色块表示对应的子特征的预测类分布,这些子特征通过knowledge distillation(知识蒸馏)得到的全局特征的输出进行正则化。所有灰块只在训练阶段有效,而在测试阶段去除。为了清楚起见,省略了CNN的细节。

语义分组模块

在CNN的高层中需要使用多个filters来表示语义概念。因此,作者开发了一种正则化方法,将具有不同属性的filters分成不同的组来捕获语义概念。
X L ′ = A X L = A B X L − 1 = W X L − 1 \mathbf{X}^{L^{\prime}}=\mathbf{A} \mathbf{X}^{L}=\mathbf{A} \mathbf{B} \mathbf{X}^{L-1}=\mathbf{W} \mathbf{X}^{L-1} XL=AXL=ABXL1=WXL1
X L ′ \mathbf{X}^{L^{\prime}} XL denotes the feature map with its feature channels arranged by a permutation operation
A ∈ R C × C \mathbf{A} \in \mathbb{R}^{C \times C} ARC×C is a permutation matrix
B ∈ R C × Ω \mathbf{B} \in \mathbb{R}^{C \times \Omega} BRC×Ω denotes the reshaped filters of layer L \mathbf{L} L,
X L − 1 ∈ R Ω × Ψ \mathbf{X}^{L-1} \in \mathbb{R}^{\Omega \times \Psi} XL1RΩ×Ψ denotes the reshaped feature of layer L − 1 \mathbf{L-1} L1
W \mathbf{W} W is a permutation of B \mathbf{B} B.
要获得具有语义的组, A \mathbf{A} A应该学会发现B的过滤器(行)之间的相似性。然而,要直接学习排列矩阵并不是一件容易的事。因此,作者直接通过约束 X L ′ \mathbf{X}^{L^{\prime}} XL的特征通道之间的关系来学习 W \mathbf{W} W,从而绕过了学习 A \mathbf{A} A的困难。为了达到效果,作者最大化了同一组中的特征通道之间的相关性,同时解除了不同组中的特征通道之间的相关性,依靠损失函数 LocalMaxGlobalMin loss:
L group  = 1 2 ( ∥ D ∥ F 2 − 2 ∥ diag ⁡ ( D ) ∥ 2 2 ) \mathcal{L}_{\text {group }}=\frac{1}{2}\left(\|\mathbf{D}\|_{F}^{2}-2\|\operatorname{diag}(\mathbf{D})\|_{2}^{2}\right) Lgroup =21(DF22diag(D)22)
X ~ i L ′ ← X i L ′ / ∥ X i L ′ ∥ 2 \tilde{\mathbf{X}}_{i}^{L^{\prime}} \leftarrow \mathbf{X}_{i}^{L^{\prime}} /\left\|\mathbf{X}_{i}^{L^{\prime}}\right\|_{2} X~iLXiL/XiL2作为一个normalized channel
d i j = X ~ i L ′ T X ~ j L ′ d_{i j}=\tilde{\mathbf{X}}_{i}^{L^{\prime T}} \tilde{\mathbf{X}}_{j}^{L^{\prime}} dij=X~iLTX~jL
D ∈ R G × G \mathrm{D} \in \mathbb{R}^{G \times G} DRG×G
D \mathbf{D} D中的元素 D m n = 1 C m C n ∑ i ∈ m , j ∈ n d i j \mathbf{D}_{m n}=\frac{1}{C_{m} C n} \sum_{i \in m, j \in n} d_{i j} Dmn=CmCn1im,jndij

LocalMaxGlobalMin loss 实现代码

class LocalMaxGlobalMin(nn.Module):

    def __init__(self, rho, nchannels, nparts=1, device='cpu'):
        super(LocalMaxGlobalMin, self).__init__()
        self.nparts = nparts
        self.device = device
        self.nchannels = nchannels
        self.rho = rho

        
        nlocal_channels_norm = nchannels // self.nparts
        reminder = nchannels % self.nparts
        nlocal_channels_last = nlocal_channels_norm
        if reminder != 0:
            nlocal_channels_last = nlocal_channels_norm + reminder
        
        # seps records the indices partitioning feature channels into separate parts
        seps = []
        sep_node = 0
        for i in range(self.nparts):
            if i != self.nparts-1:
                sep_node += nlocal_channels_norm                
                #seps.append(sep_node)
            else:
                sep_node += nlocal_channels_last                
            seps.append(sep_node)
        self.seps = seps
        
    def forward(self, x):  
        x = x.pow(2)
        intra_x = []
        inter_x = []
        for i in range(self.nparts):
            if i == 0:        
                intra_x.append((1 - x[:, :self.seps[i], :self.seps[i]]).mean()) 
            else:              
                intra_x.append((1 - x[:, self.seps[i-1]:self.seps[i], self.seps[i-1]:self.seps[i]]).mean())
                inter_x.append(x[:, self.seps[i-1]:self.seps[i], :self.seps[i-1]].mean())
        
        loss = self.rho * 0.5 * (sum(intra_x) / self.nparts + sum(inter_x) / (self.nparts*(self.nparts-1)/2)) 
                 

        return loss

特征增强模块

语义分组可以驱动不同组的特征在不同的语义(对象)部分上被激活。然而,这些部分的可识别性可能得不到保证。因此,需要引导这些语义组在具有很强区分度的对象部分上被激活。实现此效果的一种简单方法是匹配对象及其部分之间的预测分布(即知识蒸馏,我理解成全局和局部之间的分布学习),匹配分布可以利用KL散度。
L K L ( P w ∥ P a ) = − H ( P w ) + H ( P w , P a ) \mathcal{L}_{\mathrm{KL}\left(\mathbf{P}_{w} \| \mathbf{P}_{a}\right)}=-\mathrm{H}\left(\mathbf{P}_{w}\right)+\mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}\right) LKL(PwPa)=H(Pw)+H(Pw,Pa)
P w \mathbf{P}_{w} Pw and P a \mathbf{P}_{a} Pa are the prediction distributions of an object and its part
(即全局特征和局部特征)
H ( P w ) = − ∑ P w log ⁡ P w \mathrm{H}\left(\mathbf{P}_{w}\right)=-\sum \mathbf{P}_{w} \log \mathbf{P}_{w} H(Pw)=PwlogPw
H ( P w , P a ) = − ∑ P w log ⁡ P a \mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}\right)=-\sum \mathbf{P}_{w} \log \mathbf{P}_{a} H(Pw,Pa)=PwlogPa

因此得到这一模块得损失函数:
L = L c r − λ H ( P w ) + λ H ( P w , P a ) \mathcal{L}=\mathcal{L}_{c r}-\lambda \mathrm{H}\left(\mathbf{P}_{w}\right)+\lambda \mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}\right) L=LcrλH(Pw)+λH(Pw,Pa)
L c r \mathcal{L}_{c r} Lcr是全局特征预测的交叉熵损失

将两个模块的损失函数加权相加得到最终的损失:
L = E x ( L c r − λ H ( P w ) + γ G ∑ H ( P w , P a g ) + ϕ L group  ) \mathcal{L}=\mathbb{E}_{\mathbf{x}}\left(\mathcal{L}_{c r}-\lambda \mathrm{H}\left(\mathbf{P}_{w}\right)+\frac{\gamma}{G} \sum \mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}^{g}\right)+\phi \mathcal{L}_{\text {group }}\right) L=Ex(LcrλH(Pw)+GγH(Pw,Pag)+ϕLgroup )

代码解读

自定义nparts的大小,nparts表示分组的个数,以resnet50主干为例,将layer4输出的特征根据channel均分为nparts份。假设nparts=4,每份channel大小为512。将得到的nparts个特征图分别输入到不同的fc中,得到局部部分的预测xlocal,size为torch.Size([nparts, batchsize, num_classes]。生成一个排列矩阵xcos,输出后依赖此矩阵进行LocalMaxGlobalMin loss计算。
此外,对copy一份layer4输出的特征正常操作,得到全局的预测xglobal,size为torch.Size([batchsize, num_classes])

# 添加在Resnet类__init__方法里面
        if self.attention:            
            nfeatures = 512 * block.expansion            
            nlocal_channels_norm = nfeatures // self.nparts
            reminder = nfeatures % self.nparts
            nlocal_channels_last = nlocal_channels_norm
            if reminder != 0:
                nlocal_channels_last = nlocal_channels_norm + reminder
            fc_list = []
            separations = []
            sep_node = 0
            for i in range(self.nparts):
                if i != self.nparts-1:
                    sep_node += nlocal_channels_norm
                    fc_list.append(nn.Linear(nlocal_channels_norm, num_classes))
                    #separations.append(sep_node)
                else:
                    sep_node += nlocal_channels_last
                    fc_list.append(nn.Linear(nlocal_channels_last, num_classes))
                separations.append(sep_node)
            self.fclocal = nn.Sequential(*fc_list)
            self.separations = separations 
            self.fc = nn.Linear(512*block.expansion, num_classes) 
  —————————————————————————————————————————————————————————————————————————————————
    # Resnet类的forward
    def forward(self, x):
        x = self.conv1(x)  # [4,64,224,224]
        x = self.bn1(x)  # [4,64,224,224]
        x = self.relu(x)
        x = self.maxpool(x)   # [4,64,112,112]
  
        x = self.layer1(x)   # [4,256,112,112]
        x = self.layer2(x)   # [4,512,56,56]
        x = self.layer3(x)   # [4,1024,28,28]
        x = self.layer4(x)   # [4,2048,14,14]

        if self.attention:

            nsamples, nchannels, height, width = x.shape
        
            xview = x.view(nsamples, nchannels, -1)  # torch.Size([4, 2048, 196])
            xnorm = xview.div(xview.norm(dim=-1, keepdim=True)+eps)  # torch.Size([4, 2048, 196])
            xcosin = torch.bmm(xnorm, xnorm.transpose(-1, -2))  # torch.Size([4, 2048, 2048])
            

            attention_scores = []
            for i in range(self.nparts):
                if i == 0:
                    xx = x[:, :self.separations[i]]  # torch.Size([4, 512, 14, 14])
                else:
                    xx = x[:, self.separations[i-1]:self.separations[i]]
                xx_pool = self.avgpool(xx).flatten(1)  # torch.Size([4, 512])
                attention_scores.append(self.fclocal[i](xx_pool))
            xlocal = torch.stack(attention_scores, dim=0)  # torch.Size([4, 4, num_classes])

            xmaps = x.clone().detach()
            
            # for global
            xpool = self.avgpool(x)
            xpool = torch.flatten(xpool, 1)
            xglobal = self.fc(xpool)  # torch.Size([4, num_classes])

            
            return xglobal, xlocal, xcosin, xmaps

train,val

xglobal, xlocal, xcosin, _ = model(inputs)
	probs = softmax(xglobal)                    
cls_loss = criterion[0](xglobal, labels)

############################################################## prediction

# prediction of every  branch
probl, predl, logprobl = [], [], []
for i in range(nparts):
	probl.append(softmax(torch.squeeze(xlocal[i])))
	predl.append(torch.max(probl[i], 1)[-1])
	logprobl.append(logsoftmax(torch.squeeze(xlocal[i])))


############################################################### regularization

logprobs = logsoftmax(xglobal)
entropy_loss = penalty['entropy_weights'] * torch.mul(probs, logprobs).sum().div(inputs.size(0))
soft_loss_list = []
for i in range(nparts):
	soft_loss_list.append(torch.mul(torch.neg(probs), logprobl[i]).sum().div(inputs.size(0)))
soft_loss = penalty['soft_weights'] * sum(soft_loss_list).div(nparts)
# regularization loss
lmgm_reg_loss = criterion[1](xcosin)
reg_loss = lmgm_reg_loss + entropy_loss + soft_loss

实验效果

在这里插入图片描述

消融实验

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

个人网站 https://aydenfan.github.io/

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值