0% found this document useful (0 votes)

54 views70 pages

模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课

This document discusses model compression and acceleration for efficient deep learning. It introduces Jinyang Guo, a PhD student researching efficient deep learning systems and algorithms, with a focus on applying model pruning algorithms to computer vision and exploring model deployment in resource-constrained scenarios. It provides background on the large computational and storage costs of convolutional neural networks and the need for model compression and acceleration techniques for mobile AI and tiny AI applications.

Uploaded by

jiahao li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views70 pages

模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课

Uploaded by

jiahao li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Model Compression and Acceleration

for Efficient Deep Learning

Jinyang Guo
Ph.D. student at the University of Sydney
About Me

研究方向:

高效的深度学习系统与算法; 研究模型剪枝算法在计算机视觉中的应用; 致力于探索计算及存储资源受

限场景下的深度模型的部署.

教育经历:

博士: 悉尼大学软件工程专业 2018.03 – 2021.08

本科: 新南威尔士大学电气工程专业 2014.07 – 2017.12

https://jinyangguo.github.io/
Contents

• Backgrounds and Challenges

• Existing Methods

• Introduction of Pruning

• Channel Pruning Guided by Classification Loss and Feature Importance (AAAI2020)

• Multi-Dimensional Pruning: A Unified Framework for Model Compression (CVPR2020 Oral)

• Conclusion
Background and challenges

The combination of mobile devices and AI has a large number of applications

Smart retail

Mobile devices

+ =
Self-driving
Convolutional neural networks

Personalized healthcare
Background and challenges

Convolutional neural networks has large computational and storage costs

Background and challenges

Convolutional neural networks has large computational and storage costs

Long inference time

Unsafe for real-time Bad customer experience

applications
Background and challenges

Convolutional neural networks has large computational and storage costs

Long inference time High energy cost

AlphaGo: 1202 CPU, 176 GPU

US$3,000 electricity bill per game

Unsafe for real-time Bad customer experience Training?
applications
Background and challenges

Convolutional neural networks has large computational and storage costs

Long inference time High energy cost

AlphaGo: 1202 CPU, 176 GPU

US$3,000 electricity bill per game

Unsafe for real-time Bad customer experience Training?
applications

High memory cost

Out of memory
Background and challenges

Convolutional neural networks has large computational and storage costs

Long inference time High energy cost

AlphaGo: 1202 CPU, 176 GPU

US$3,000 electricity bill per game

Unsafe for real-time Bad customer experience Training?
applications

High memory cost Large storage requirment

Out of memory
Background and challenges

Convolutional neural networks has large computational and storage costs

-- Long training time, CO2 emission
…
Background and challenges

Cloud AI Mobile AI Tiny AI

1,000x smaller 1,000x smaller

Speed: 1012 FLOPs/s Speed: 109 FLOPs/s Speed: 106 FLOPs/s

Storage: 100TB Storage: 100GB Storage: 100KB
Background and challenges

Cloud AI Mobile AI Tiny AI

1,000x smaller 1,000x smaller

Model compression Model compression

and acceleration and acceleration

Speed: 1012 FLOPs/s Speed: 109 FLOPs/s Speed: 106 FLOPs/s

Storage: 100TB Storage: 100GB Storage: 100KB
Background and challenges

Model compression and acceleration is attracting increasing attention

Image from https://www.jiqizhixin.com/articles/2021-04-29-9

Contents

• Backgrounds and Challenges

• Existing Methods

• Introduction of Pruning

• Channel Pruning Guided by Classification Loss and Feature Importance (AAAI2020)

• Multi-Dimensional Pruning: A Unified Framework for Model Compression (CVPR2020 Oral)

• Conclusion
Existing methods

Model Compression and Acceleration

Connected
Model Compression Model Acceleration

• Reduce number of parameters • Reduce number of FLOPs

• Focus on fully-connected layer • Focus on convolutional layers

(more than 90% parameters) (more than 90% FLOPs)
Existing methods

● Tensor factorization -- decompose weight matrix into light-weight matrices

○ Accelerating very deep convolutional networks for classification and detection, T-PAMI

● Distillation -- distill knowledge from a large model to a small model

○ Distilling the Knowledge in a Neural Network, NIPS’14

● Compact network design -- design efficient networks

○ MobileNet, ShuffleNet, etc.

● Quantization -- quantize activation/weights in networks

○ Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,
ICLR’16

● Pruning -- remove the connection between neurons

○ Channel pruning guided by classification loss and feature importance, AAAI’20
○ Multi-Dimensional pruning: A unified framework for model compression, CVPR’20
Existing methods

● Tensor factorization -- decompose weight matrix into light-weight matrices

○ Accelerating very deep convolutional networks for classification and detection, T-PAMI

● Distillation -- distill knowledge from a large model to a small model

○ Distilling the Knowledge in a Neural Network, NIPS’14

● Compact network design -- design efficient networks

○ MobileNet, ShuffleNet, etc.

● Quantization -- quantize activation/weights in networks

○ Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,
ICLR’16

● Pruning -- remove the connection between neurons

○ Channel pruning guided by classification loss and feature importance, AAAI’20
○ Multi-Dimensional pruning: A unified framework for model compression, CVPR’20
Existing methods

Compact (Knowledge
network distillation) Pruning Quantization
design
Contents

• Backgrounds and Challenges

• Existing Methods

• Introduction of Pruning

• Channel Pruning Guided by Classification Loss and Feature Importance (AAAI2020)

• Multi-Dimensional Pruning: A Unified Framework for Model Compression (CVPR2020 Oral)

• Conclusion
Introduction of Pruning

Unstructured pruning: inefficient for practical acceleration Structured pruning: easy to use
Introduction of Pruning

1990: First pruning method 2017: First structured pruning method

…
2016: First to compress deep models Many research works:
and active research interests He et al.: AMC
Luo et al.: ThiNet
Liu et al.: Slimming
Guo et al.: MDP
Introduction of Pruning
Revisting convolutional layer and channel pruning

Challenge: How to select informative channels?

Pruned filters
Filters

Output tensor Pruned input tensor Pruned output tensor

Input tensor (three channels)
(two channels) (three channels) (one channel)
Channel Pruning Guided by Classification Loss and Feature Importance

Channel Pruning Guided by Classification Loss

and Feature Importance
Jinyang Guo, Wanli Ouyang, Dong Xu
(AAAI 2020)
Channel Pruning Guided by Classification Loss and Feature Importance

Revisting convolutional layer and channel pruning

Pruned filters
Filters

Output tensor Pruned input tensor Pruned output tensor

Input tensor (three channels)
(two channels) (three channels) (one channel)

Mean square loss

Channel Pruning Guided by Classification Loss and Feature Importance

Revisting convolutional layer and channel pruning

Pruned filters
Filters

Output tensor Pruned input tensor Pruned output tensor

Input tensor (three channels)
(two channels) (three channels) (one channel)

Mean square loss

Channel Pruning Guided by Classification Loss and Feature Importance

Revisting convolutional layer and channel pruning

Input tensor Filters

*
Output tensor 𝑌0

*
Channel Pruning Guided by Classification Loss and Feature Importance

Revisting convolutional layer and channel pruning

Input tensor Filters Input tensor Filters 𝛽

1
* *
Output tensor 𝑌0 Output tensor 𝑌

* * 0

argmin | 𝑌 " − 𝑌 |$#

!
Channel Pruning Guided by Classification Loss and Feature Importance

The supervision from output tensor may have limited influence on final loss

Pruned filters
Filters

Output tensor Pruned input tensor Pruned output tensor

Input tensor (three channels)
(two channels) (three channels) (one channel)

Mean square loss

Channel Pruning Guided by Classification Loss and Feature Importance

The supervision from output tensor may have limited influence on final loss

Pruned filters
Filters

Output tensor Pruned input tensor Pruned output tensor

Input tensor (three channels)
(two channels) (three channels) (one channel)

Influence on final
loss?

Mean square loss

Channel Pruning Guided by Classification Loss and Feature Importance

The supervision from output tensor may be removed when pruning the next layer

Filters

Input tensor Output tensor

Supervision
Channel Pruning Guided by Classification Loss and Feature Importance

The supervision from output tensor may be removed when pruning the next layer
Filters
Filters

Output tensor Output tensor

Input tensor (Input tensor of next layer)

Supervision

Current layer
Next layer
Channel Pruning Guided by Classification Loss and Feature Importance

The supervision from output tensor may be removed when pruning the next layer
Filters
Filters

Output tensor Output tensor

Input tensor (Input tensor of next layer)

Supervision

Current layer
Next layer
Channel Pruning Guided by Classification Loss and Feature Importance

Add weights on MSE loss for supervision

Pruned filters
Filters

Output tensor Pruned input tensor Pruned output tensor

Input tensor (three channels)
(two channels) (three channels) (one channel)

Weighted mean square loss

1. Weights from gradients w.r.t. final loss
2. Weights from magnitude of features
Channel Pruning Guided by Classification Loss and Feature Importance

Add weights on MSE loss for supervision Filters

Output tensor
Input tensor (Input tensor of next layer)

%&
argmin | 𝑌 " − 𝑌 |$# -> argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! ! %'
Channel Pruning Guided by Classification Loss and Feature Importance

Add weights on MSE loss for supervision Filters

Output tensor
Input tensor (Input tensor of next layer)

𝜕𝐶
%& 𝜕𝑌 Gradients
argmin | 𝑌 " − 𝑌 |$# -> argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! ! %'

Weights from final

loss

𝜕𝐶 "
argmin | 𝑌 − 𝑌 |$#
! 𝜕𝑌
Channel Pruning Guided by Classification Loss and Feature Importance

Add weights on MSE loss for supervision Filters

Output tensor
Input tensor (Input tensor of next layer)

%&
argmin | 𝑌 " − 𝑌 |$# -> argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! ! %'
Weights from
Weights from final feature importance
loss

𝜕𝐶 "
argmin | 𝑌 − 𝑌 |$# argmin | 𝑌 " − 𝑌 ∗ 𝑌 |$#
! 𝜕𝑌 !
= argmin ||𝑌 ∗ 𝑌 " − 𝑌 + (1 − 𝑌 ∗ )(𝑌 " − 0)||
!
Channel Pruning Guided by Classification Loss and Feature Importance

Results when compressing VGGNet (pre-trained: 93.99%)

Top-1 Accuracies (%) on CIFAR-10

95.2
Ours
95
94.8
Top-1 Accuracy (%) DCP
94.6
94.4
94.2
94 ThiNet
93.8
VGGNet Slimming CP
93.6
WM
93.4
0 10 20 30 40 50 60
#FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance

Results when compressing ResNet (pre-trained: 93.74% on CIFAR-10; 92.87% on ImageNet)

Top-1 Accuracies (%) on CIFAR-10 Top-5 Accuracies (%) on ImageNet

94 92.8
Ours
93.8 92.4 Ours
DCP

Top-5 Accuracy (%)

93.6
Top-1 Accuracy (%)

DCP 92
93.4 ThiNet
91.6
93.2 CP
WM 91.2
93 WM
ThiNet
ResNet-56 90.8 GAL
92.8 ResNet-50
CP
92.6 90.4
20 30 40 50 60 70 20 30 40 50 60 70
#FLOPs (%) #FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance

Results when compressing MobileNet-V2 (pre-trained: 95.02% on CIFAR-10; 72.19% on ImageNet)

Top-1 Accuracies (%) on CIFAR-10 Top-1 Accuracies (%) on ImageNet

95.6 67.6
Ours
95.4 Ours
67.2

Top-1 Accuracy (%)

DCP
Top-1 Accuracy (%)

95.2
66.8
95
66.4 DCP
94.8
MobileNet-V2 MobileNet-V2
66
94.6
ThiNet
WM WM
94.4 65.6
60 65 70 75 80 40 45 50 55 60
#FLOPs (%) #FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance

Results when compressing C3D (pre-trained: 79.93%)

Clip Accuracies (%) on UCF-101

78 Ours
RBP
76
Clip Accuracy (%)
FP
74

70
C3D TP
68

66
40 45 50 55 60
#FLOPs (%)
Channel Pruning Guided by Classification Loss and Feature Importance

Ablation study – compressing VGGNet on CIFAR-10

Methods Accuracy (%)

CPLI (Ours) 94.95

CPLI without final loss 94.03

CPLI without feature importance 94.46

Multi-Dimensional Pruning (MDP)

Multi-Dimensional Pruning: A Unified

Framework for Model Compression
Jinyang Guo, Wanli Ouyang, Dong Xu
(CVPR2020 Oral)
Multi-Dimensional Pruning (MDP)

Challenge: Spatial/Spatial-temporal redundancies are not explored

Multi-Dimensional Pruning (MDP)

Challenge: Spatial/Spatial-temporal redundancies are not explored

Number of FLOPs (%) when reducing redundancies along different dimensions

Pre-trained

~2x reduction
Half #channels

2x reduction
Half #frames

4x reduction
Half resolution

0 20 40 60 80 100
#FLOPs (%)
Multi-Dimensional Pruning (MDP)

Observation: downsampling operation can reduce spatial/temporal redundancy

Downsampling
Multi-Dimensional Pruning (MDP)

Ø How to simultaneously reduce the spatial/spatial-temporal and channel redundancies?

Ø How to find the optimal combination of the features along different dimensions?
Multi-Dimensional Pruning (MDP)

We proposed Multi-dimensional Pruning (MDP) framework:

• Simultaneously reduce spatial/spatial-temporal and channel redundancies

• A unified framework that can prune both 2D CNNs and 3D CNNs along multiple dimensions
Multi-Dimensional Pruning (MDP)

The searching stage The pruning stage The fine-tuning stage

Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network

The original layer

Conv

height
depth
width
Input tensor Output tensor
(three channels) (two channels)
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network

𝑏𝑟𝑎𝑛𝑐ℎ&

𝑏𝑟𝑎𝑛𝑐ℎ$
…
Input tensor
(three channels)

𝑏𝑟𝑎𝑛𝑐ℎ'
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network

Spatial downsampling ratio = 1

Temporal downsampling ratio = 1

𝑏𝑟𝑎𝑛𝑐ℎ&

Average pooling
Spatial downsampling ratio = 2
Temporal downsampling ratio = 1
𝑏𝑟𝑎𝑛𝑐ℎ$
…
Input tensor Average pooling
(three channels) Spatial downsampling ratio = 4
Temporal downsampling ratio = 2
𝑏𝑟𝑎𝑛𝑐ℎ'
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network

Conv

𝑏𝑟𝑎𝑛𝑐ℎ&

Average pooling Conv

𝑏𝑟𝑎𝑛𝑐ℎ$
…
Input tensor Average pooling
(three channels) Conv

𝑏𝑟𝑎𝑛𝑐ℎ'
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv

𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$

Average pooling Conv 𝑔$,&

𝑏𝑟𝑎𝑛𝑐ℎ$ 𝑔$,$

…
Input tensor Average pooling
(three channels) Conv 𝑔',&

𝑏𝑟𝑎𝑛𝑐ℎ' 𝑔',$
…
Multi-Dimensional Pruning (MDP)
Ø The Searching Stage -- Construct an over-parameterized network
The corresponding layer in the over-parameterized network
Gates
𝑔&,&
Conv

𝑏𝑟𝑎𝑛𝑐ℎ& 𝑔&,$

Average pooling Conv Upsampling

𝑔$,&

𝑏𝑟𝑎𝑛𝑐ℎ$ 𝑔$,$

…
Input tensor Average pooling
(three channels) Conv 𝑔',& Upsampling