模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳 智东西公开课
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳 智东西公开课
研究方向:
教育经历:
https://jinyangguo.github.io/
Contents
• Existing Methods
• Introduction of Pruning
• Conclusion
Background and challenges
Smart retail
Mobile devices
+ =
Self-driving
Convolutional neural networks
Personalized healthcare
Background and challenges
Out of memory
Background and challenges
Out of memory
Background and challenges
• Existing Methods
• Introduction of Pruning
• Conclusion
Existing methods
Connected
Model Compression Model Acceleration
Compact (Knowledge
network distillation) Pruning Quantization
design
Contents
• Existing Methods
• Introduction of Pruning
• Conclusion
Introduction of Pruning
Unstructured pruning: inefficient for practical acceleration Structured pruning: easy to use
Introduction of Pruning
…
2016: First to compress deep models Many research works:
and active research interests He et al.: AMC
Luo et al.: ThiNet
Liu et al.: Slimming
Guo et al.: MDP
Introduction of Pruning
Revisting convolutional layer and channel pruning
Pruned filters
Filters
Pruned filters
Filters
Pruned filters
Filters
*
Output tensor 𝑌0
*
Channel Pruning Guided by Classification Loss and Feature Importance
1
* *
Output tensor 𝑌0 Output tensor 𝑌
* * 0
The supervision from output tensor may have limited influence on final loss
Pruned filters
Filters
The supervision from output tensor may have limited influence on final loss
Pruned filters
Filters
Influence on final
loss?
The supervision from output tensor may be removed when pruning the next layer
Filters
Supervision
Channel Pruning Guided by Classification Loss and Feature Importance
The supervision from output tensor may be removed when pruning the next layer
Filters
Filters
Supervision
Current layer
Next layer
Channel Pruning Guided by Classification Loss and Feature Importance
The supervision from output tensor may be removed when pruning the next layer
Filters
Filters
Supervision
Current layer
Next layer
Channel Pruning Guided by Classification Loss and Feature Importance
Pruned filters
Filters
Output tensor
Input tensor (Input tensor of next layer)
%&
argmin | 𝑌 " − 𝑌 |$# -> argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! ! %'
Channel Pruning Guided by Classification Loss and Feature Importance
Output tensor
Input tensor (Input tensor of next layer)
𝜕𝐶
%& 𝜕𝑌 Gradients
argmin | 𝑌 " − 𝑌 |$# -> argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! ! %'
𝜕𝐶 "
argmin | 𝑌 − 𝑌 |$#
! 𝜕𝑌
Channel Pruning Guided by Classification Loss and Feature Importance
Output tensor
Input tensor (Input tensor of next layer)
%&
argmin | 𝑌 " − 𝑌 |$# -> argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! ! %'
Weights from
Weights from final feature importance
loss
𝜕𝐶 "
argmin | 𝑌 − 𝑌 |$# argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! 𝜕𝑌 !
= argmin ||𝑌 ∗ 𝑌 " − 𝑌 + (1 − 𝑌 ∗ )(𝑌 " − 0)||
!
Channel Pruning Guided by Classification Loss and Feature Importance
DCP 92
93.4 ThiNet
91.6
93.2 CP
WM 91.2
93 WM
ThiNet
ResNet-56 90.8 GAL
92.8 ResNet-50
CP
92.6 90.4
20 30 40 50 60 70 20 30 40 50 60 70
#FLOPs (%) #FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance
95.2
66.8
95
66.4 DCP
94.8
MobileNet-V2 MobileNet-V2
66
94.6
ThiNet
WM WM
94.4 65.6
60 65 70 75 80 40 45 50 55 60
#FLOPs (%) #FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance
78 Ours
RBP
76
Clip Accuracy (%)
FP
74
72
70
C3D TP
68
66
40 45 50 55 60
#FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance
Pre-trained
~2x reduction
Half #channels
2x reduction
Half #frames
4x reduction
Half resolution
0 20 40 60 80 100
#FLOPs (%)
Multi-Dimensional Pruning (MDP)
Downsampling
Multi-Dimensional Pruning (MDP)
Ø How to find the optimal combination of the features along different dimensions?
Multi-Dimensional Pruning (MDP)
• A unified framework that can prune both 2D CNNs and 3D CNNs along multiple dimensions
Multi-Dimensional Pruning (MDP)
Conv
height
depth
width
Input tensor Output tensor
(three channels) (two channels)
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
𝑏𝑟𝑎𝑛𝑐ℎ&
𝑏𝑟𝑎𝑛𝑐ℎ$
…
Input tensor
(three channels)
𝑏𝑟𝑎𝑛𝑐ℎ'
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
𝑏𝑟𝑎𝑛𝑐ℎ&
Average pooling
Spatial downsampling ratio = 2
Temporal downsampling ratio = 1
𝑏𝑟𝑎𝑛𝑐ℎ$
…
Input tensor Average pooling
(three channels) Spatial downsampling ratio = 4
Temporal downsampling ratio = 2
𝑏𝑟𝑎𝑛𝑐ℎ'
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
Conv
𝑏𝑟𝑎𝑛𝑐ℎ&
𝑏𝑟𝑎𝑛𝑐ℎ$
…
Input tensor Average pooling
(three channels) Conv
𝑏𝑟𝑎𝑛𝑐ℎ'
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
𝑏𝑟𝑎𝑛𝑐ℎ$ 𝑔$,$
…
Input tensor Average pooling
(three channels) Conv 𝑔',&
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
𝑏𝑟𝑎𝑛𝑐ℎ$ 𝑔$,$
…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑆(𝜆& )
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
…
…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling
𝑆(𝜆' )
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- over-parameterized network construction
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑆(𝜆& )
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
…
…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling
𝑆(𝜆' )
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- objective function
Multi-Dimensional Pruning (MDP)
Ø The pruning stage
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑆(𝜆& )
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
…
…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling
𝑆(𝜆' )
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
…
Multi-Dimensional Pruning (MDP)
Ø The pruning stage
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑆(𝜆& )
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
…
…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling
𝑆(𝜆' )
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
…
Multi-Dimensional Pruning (MDP)
Ø The pruning stage
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv
𝑆(𝜆& )
𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$
…
…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling
𝑆(𝜆' )
𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
…
Multi-Dimensional Pruning (MDP)
Ø Results when compressing VGGNet (pre-trained: 93.99%)
Multi-Dimensional Pruning (MDP)
Ø Results when compressing ResNet (pre-trained: 93.47% on CIFAR-10; 92.94% on ImageNet)
94.8
88
94.6 DCP 87.5
94.4 87
ThiNet
94.2 86.5
DCP
MobileNet-V2 86
94 MobileNet-V2
WM 85.5 WM
93.8
71 71.5 72 72.5 73 73.5 74 85
30 35 40 45 50 55 60
FLOPs (%)
FLOPs (%)
Multi-Dimensional Pruning (MDP)
Ø Results when compressing C3D (pre-trained: 82.10% on UCF-101; 47.39% on HMDB-51)
42
FP TP DCP
DCP 40
75
38
70 TP
36
34
65 C3D
32 C3D
60 30
49.4 49.6 49.8 50 50.2 50.4 50.6 46 47 48 49 50
FLOPs (%) FLOPs (%)
Multi-Dimensional Pruning (MDP)
Ø Results when compressing I3D (pre-trained: 93.47% on UCF-101; 69.41% on HMDB-51)
TP 55 DCP
80
50
75
45
70
40
65 I3D I3D
35
60 30
46 47 48 49 50 51 46 47 48 49 50 51
FLOPs (%) FLOPs (%)
Multi-Dimensional Pruning (MDP)
Ø Ablation study on 2D CNN – ResNet56 on CIFAR-10
• Existing Methods
• Introduction of Pruning
• Conclusion
Conclusion
Model compression and acceleration is important for CNN deployment:
• How to select informative channel? Channel selection criterion? – Channel Pruning Guided by
Classification Loss and Feature Importance (AAAI2020)