【论文阅读】Learning Semantically Enhanced Feature for Fine-Grained Image Classification

最新推荐文章于 2022-10-20 19:20:47 发布

干饭的程序猿

最新推荐文章于 2022-10-20 19:20:47 发布

阅读量1.1k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：论文阅读文章标签：深度学习 pytorch 神经网络

本文链接：https://blog.csdn.net/qq_41118968/article/details/115425957

论文阅读专栏收录该内容

7 篇文章

订阅专栏

该研究提出了一种用于细粒度图像分类的新方法，不依赖复杂的part定位，而是通过增强全局特征的子特征语义。通过通道排列将CNN特征通道分组并用正则化引导激活在对象关键部位，实现特征增强。方法包括语义分组模块和特征增强模块，两者结合提升模型性能。实验表明，该方法与最先进的方法表现相当，并提供了开源代码。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

【论文阅读】 Learning Semantically Enhanced Feature for Fine-Grained Image Classification

摘要
具体实现
- 语义分组模块
- 特征增强模块
代码解读
实验效果
消融实验
个人网站 [https://aydenfan.github.io/](https://aydenfan.github.io/)

摘要

We aim to provide a computationally cheap yet effective approach for fine-grained image classification (FGIC) in this letter. Unlike previous methods that rely on complex part localization modules, our approach learns fine-grained features by enhancing the semantics of sub-features of a global feature. Specifically, we first achieve the sub-feature semantic by arranging feature channels of a CNN into different groups through channel permutation. Meanwhile, to enhance the discriminability of sub-features, the groups are guided to be activated on object parts with strong discriminability by a weighted combination regularization. Our approach is parameter parsimonious and can be easily integrated into the backbone model as a plug-and-play module for end-to-end training with only image-level supervision. Experiments verified the effectiveness of our approach and validated its comparable performance to the state-of-the-artmethods. Code is available at https:// github.com/ cswluo/ SEF

本文旨在为细粒度图像分类(FGIC)提供一种计算量小但效果好的方法。与以往依赖复杂part定位模块的方法不同，我们的方法通过增强全局特征子特征的语义来学习细粒度特征。具体地说，我们首先通过通道排列将CNN的特征频道分成不同的组来实现子特征语义。同时，为了提高子特征的可区分性，通过加权组合正则化，引导分组在可区分性较强的object parts被激活。我们的方法参数很少，可以很容易地集成到主干模型中，作为即插即用模块，用于端到端培训，只需图像级别的监督。实验验证了该方法的有效性，并验证了其性能可与最先进的方法相媲美。有关代码，请访问https://github.com/cswluo/sef

具体实现

图1.整体框架.
语义分组模块将CNN的最后一层卷积特征通道(用混合色块表示)分成不同的组(用不同的颜色表示)。全局特征及其子特征(分组特征)通过平均池化从排列的特征通道中获得。灰色块中的淡黄色块表示对应的子特征的预测类分布，这些子特征通过knowledge distillation（知识蒸馏）得到的全局特征的输出进行正则化。所有灰块只在训练阶段有效，而在测试阶段去除。为了清楚起见，省略了CNN的细节。

语义分组模块

在CNN的高层中需要使用多个filters来表示语义概念。因此，作者开发了一种正则化方法，将具有不同属性的filters分成不同的组来捕获语义概念。
$\mathbf{X}^{L^{\prime}}=\mathbf{A} \mathbf{X}^{L}=\mathbf{A} \mathbf{B} \mathbf{X}^{L-1}=\mathbf{W} \mathbf{X}^{L-1}$
$\mathbf{X}^{L^{\prime}}$ denotes the feature map with its feature channels arranged by a permutation operation
$\mathbf{A} \in \mathbb{R}^{C \times C}$ is a permutation matrix
$\mathbf{B} \in \mathbb{R}^{C \times \Omega}$ denotes the reshaped filters of layer $\mathbf{L}$ ,
$\mathbf{X}^{L-1} \in \mathbb{R}^{\Omega \times \Psi}$ denotes the reshaped feature of layer $\mathbf{L-1}$
$\mathbf{W}$ is a permutation of $\mathbf{B}$ .
要获得具有语义的组， $\mathbf{A}$ 应该学会发现B的过滤器(行)之间的相似性。然而，要直接学习排列矩阵并不是一件容易的事。因此，作者直接通过约束 $\mathbf{X}^{L^{\prime}}$ 的特征通道之间的关系来学习 $\mathbf{W}$ ，从而绕过了学习 $\mathbf{A}$ 的困难。为了达到效果，作者最大化了同一组中的特征通道之间的相关性，同时解除了不同组中的特征通道之间的相关性，依靠损失函数 LocalMaxGlobalMin loss：
$\mathcal{L}_{\text {group }}=\frac{1}{2}\left(\|\mathbf{D}\|_{F}^{2}-2\|\operatorname{diag}(\mathbf{D})\|_{2}^{2}\right)$
$\tilde{\mathbf{X}}_{i}^{L^{\prime}} \leftarrow \mathbf{X}_{i}^{L^{\prime}} /\left\|\mathbf{X}_{i}^{L^{\prime}}\right\|_{2}$ 作为一个normalized channel
$d_{i j}=\tilde{\mathbf{X}}_{i}^{L^{\prime T}} \tilde{\mathbf{X}}_{j}^{L^{\prime}}$
$\mathrm{D} \in \mathbb{R}^{G \times G}$
$\mathbf{D}$ 中的元素 $\mathbf{D}_{m n}=\frac{1}{C_{m} C n} \sum_{i \in m, j \in n} d_{i j}$

LocalMaxGlobalMin loss 实现代码

class LocalMaxGlobalMin(nn.Module):

    def __init__(self, rho, nchannels, nparts=1, device='cpu'):
        super(LocalMaxGlobalMin, self).__init__()
        self.nparts = nparts
        self.device = device
        self.nchannels = nchannels
        self.rho = rho

        
        nlocal_channels_norm = nchannels // self.nparts
        reminder = nchannels % self.nparts
        nlocal_channels_last = nlocal_channels_norm
        if reminder != 0:
            nlocal_channels_last = nlocal_channels_norm + reminder
        
        # seps records the indices partitioning feature channels into separate parts
        seps = []
        sep_node = 0
        for i in range(self.nparts):
            if i != self.nparts-1:
                sep_node += nlocal_channels_norm                
                #seps.append(sep_node)
            else:
                sep_node += nlocal_channels_last                
            seps.append(sep_node)
        self.seps = seps
        
    def forward(self, x):  
        x = x.pow(2)
        intra_x = []
        inter_x = []
        for i in range(self.nparts):
            if i == 0:        
                intra_x.append((1 - x[:, :self.seps[i], :self.seps[i]]).mean()) 
            else:              
                intra_x.append((1 - x[:, self.seps[i-1]:self.seps[i], self.seps[i-1]:self.seps[i]]).mean())
                inter_x.append(x[:, self.seps[i-1]:self.seps[i], :self.seps[i-1]].mean())
        
        loss = self.rho * 0.5 * (sum(intra_x) / self.nparts + sum(inter_x) / (self.nparts*(self.nparts-1)/2)) 
                 

        return loss

特征增强模块

语义分组可以驱动不同组的特征在不同的语义(对象)部分上被激活。然而，这些部分的可识别性可能得不到保证。因此，需要引导这些语义组在具有很强区分度的对象部分上被激活。实现此效果的一种简单方法是匹配对象及其部分之间的预测分布(即知识蒸馏，我理解成全局和局部之间的分布学习)，匹配分布可以利用KL散度。
$\mathcal{L}_{\mathrm{KL}\left(\mathbf{P}_{w} \| \mathbf{P}_{a}\right)}=-\mathrm{H}\left(\mathbf{P}_{w}\right)+\mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}\right)$
$\mathbf{P}_{w}$ and $\mathbf{P}_{a}$ are the prediction distributions of an object and its part
(即全局特征和局部特征)
$\mathrm{H}\left(\mathbf{P}_{w}\right)=-\sum \mathbf{P}_{w} \log \mathbf{P}_{w}$
$\mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}\right)=-\sum \mathbf{P}_{w} \log \mathbf{P}_{a}$

因此得到这一模块得损失函数：
$\mathcal{L}=\mathcal{L}_{c r}-\lambda \mathrm{H}\left(\mathbf{P}_{w}\right)+\lambda \mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}\right)$
$\mathcal{L}_{c r}$ 是全局特征预测的交叉熵损失

将两个模块的损失函数加权相加得到最终的损失：
$\mathcal{L}=\mathbb{E}_{\mathbf{x}}\left(\mathcal{L}_{c r}-\lambda \mathrm{H}\left(\mathbf{P}_{w}\right)+\frac{\gamma}{G} \sum \mathrm{H}\left(\mathbf{P}_{w}, \mathbf{P}_{a}^{g}\right)+\phi \mathcal{L}_{\text {group }}\right)$

代码解读

自定义nparts的大小，nparts表示分组的个数，以resnet50主干为例，将layer4输出的特征根据channel均分为nparts份。假设nparts=4，每份channel大小为512。将得到的nparts个特征图分别输入到不同的fc中，得到局部部分的预测xlocal，size为torch.Size([nparts, batchsize, num_classes]。生成一个排列矩阵xcos，输出后依赖此矩阵进行LocalMaxGlobalMin loss计算。
此外，对copy一份layer4输出的特征正常操作，得到全局的预测xglobal，size为torch.Size([batchsize, num_classes])

# 添加在Resnet类__init__方法里面
        if self.attention:            
            nfeatures = 512 * block.expansion            
            nlocal_channels_norm = nfeatures // self.nparts
            reminder = nfeatures % self.nparts
            nlocal_channels_last = nlocal_channels_norm
            if reminder != 0:
                nlocal_channels_last = nlocal_channels_norm + reminder
            fc_list = []
            separations = []
            sep_node = 0
            for i in range(self.nparts):
                if i != self.nparts-1:
                    sep_node += nlocal_channels_norm
                    fc_list.append(nn.Linear(nlocal_channels_norm, num_classes))
                    #separations.append(sep_node)
                else:
                    sep_node += nlocal_channels_last
                    fc_list.append(nn.Linear(nlocal_channels_last, num_classes))
                separations.append(sep_node)
            self.fclocal = nn.Sequential(*fc_list)
            self.separations = separations 
            self.fc = nn.Linear(512*block.expansion, num_classes) 
  —————————————————————————————————————————————————————————————————————————————————
    # Resnet类的forward
    def forward(self, x):
        x = self.conv1(x)  # [4,64,224,224]
        x = self.bn1(x)  # [4,64,224,224]
        x = self.relu(x)
        x = self.maxpool(x)   # [4,64,112,112]
  
        x = self.layer1(x)   # [4,256,112,112]
        x = self.layer2(x)   # [4,512,56,56]
        x = self.layer3(x)   # [4,1024,28,28]
        x = self.layer4(x)   # [4,2048,14,14]

        if self.attention:

            nsamples, nchannels, height, width = x.shape
        
            xview = x.view(nsamples, nchannels, -1)  # torch.Size([4, 2048, 196])
            xnorm = xview.div(xview.norm(dim=-1, keepdim=True)+eps)  # torch.Size([4, 2048, 196])
            xcosin = torch.bmm(xnorm, xnorm.transpose(-1, -2))  # torch.Size([4, 2048, 2048])
            

            attention_scores = []
            for i in range(self.nparts):
                if i == 0:
                    xx = x[:, :self.separations[i]]  # torch.Size([4, 512, 14, 14])
                else:
                    xx = x[:, self.separations[i-1]:self.separations[i]]
                xx_pool = self.avgpool(xx).flatten(1)  # torch.Size([4, 512])
                attention_scores.append(self.fclocal[i](xx_pool))
            xlocal = torch.stack(attention_scores, dim=0)  # torch.Size([4, 4, num_classes])

            xmaps = x.clone().detach()
            
            # for global
            xpool = self.avgpool(x)
            xpool = torch.flatten(xpool, 1)
            xglobal = self.fc(xpool)  # torch.Size([4, num_classes])

            
            return xglobal, xlocal, xcosin, xmaps

train,val

xglobal, xlocal, xcosin, _ = model(inputs)
	probs = softmax(xglobal)                    
cls_loss = criterion[0](xglobal, labels)

############################################################## prediction

# prediction of every  branch
probl, predl, logprobl = [], [], []
for i in range(nparts):
	probl.append(softmax(torch.squeeze(xlocal[i])))
	predl.append(torch.max(probl[i], 1)[-1])
	logprobl.append(logsoftmax(torch.squeeze(xlocal[i])))


############################################################### regularization

logprobs = logsoftmax(xglobal)
entropy_loss = penalty['entropy_weights'] * torch.mul(probs, logprobs).sum().div(inputs.size(0))
soft_loss_list = []
for i in range(nparts):
	soft_loss_list.append(torch.mul(torch.neg(probs), logprobl[i]).sum().div(inputs.size(0)))
soft_loss = penalty['soft_weights'] * sum(soft_loss_list).div(nparts)
# regularization loss
lmgm_reg_loss = criterion[1](xcosin)
reg_loss = lmgm_reg_loss + entropy_loss + soft_loss