4 CNN PDF
4 CNN PDF
Samuel Cheng
School of ECE
University of Oklahoma
Spring, 2017
1 Review
4 CNN basic
5 Case study
7 Conclusions
Review
Today
Debugging optimizer
crank up regularization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 75 20 Jan 2016
Debugging optimizer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 76 20 Jan 2016
Debugging optimizer
Debugging optimizer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 78 20 Jan 2016
Debugging optimizer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 79 20 Jan 2016
Debugging optimizer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 80 20 Jan 2016
Debugging optimizer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 81 20 Jan 2016
Debugging optimizer
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 82 20 Jan 2016
Debugging optimizer
Debugging optimizer
Hyperparameter optimization
Cross-validation strategy
I like to do coarse -> fine cross-validation in stages
First stage: only a few epochs to get rough idea of what params work
Second stage: longer running time, finer search
… (repeat as necessary)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 86 20 Jan 2016
Hyperparameter optimization
nice
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 87 20 Jan 2016
Hyperparameter optimization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 88 20 Jan 2016
Hyperparameter optimization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 89 20 Jan 2016
Hyperparameter optimization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 5 - 90 20 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 65 25 Jan 2016
CNN history
A bit of history:
1962
RECEPTIVE FIELDS, BINOCULAR
INTERACTION
AND FUNCTIONAL ARCHITECTURE IN
THE CAT'S VISUAL CORTEX
1968...
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 66 25 Jan 2016
CNN history
A bit of history
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 68 25 Jan 2016
CNN history
Hierarchical organization
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 69 25 Jan 2016
CNN history
Neurocognitron
[Fukushima 1980]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 70 25 Jan 2016
CNN history
A bit of history:
Gradient-based learning
applied to document
recognition
[LeCun, Bottou, Bengio, Haffner
1998]
LeNet-5
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 71 25 Jan 2016
CNN today
A bit of history:
ImageNet Classification with Deep
Convolutional Neural Networks
[Krizhevsky, Sutskever, Hinton, 2012]
“AlexNet”
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 72 25 Jan 2016
CNN today
[Krizhevsky 2012]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 73 25 Jan 2016
CNN today
[Faster R-CNN: Ren, He, Girshick, Sun 2015] [Farabet et al., 2012]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 74 25 Jan 2016
CNN today
NVIDIA Tegra X1
self-driving cars
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 75 25 Jan 2016
CNN today
[Goodfellow 2014]
[Simonyan et al. 2014]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 76 25 Jan 2016
CNN today
[Mnih 2013]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 77 25 Jan 2016
CNN today
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 78 25 Jan 2016
CNN today
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 79 25 Jan 2016
CNN today
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 80 25 Jan 2016
CNN today
Image
Captioning
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 81 25 Jan 2016
CNN today
reddit.com/r/deepdream
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 82 25 Jan 2016
CNN today
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 83 25 Jan 2016
CNN today
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition
[Cadieu et al., 2014]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 6 - 85 25 Jan 2016
Motivation of CNN
Convolution Layer
32x32x3 image
32 height
32 width
3 depth
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 10 27 Jan 2016
Convolution Layer
32x32x3 image
5x5x3 filter
32
32
3
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 11 27 Jan 2016
5x5x3 filter
32
32
3
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 12 27 Jan 2016
Convolution Layer
32x32x3 image
5x5x3 filter
32
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
32 (i.e. 5*5*3 = 75-dimensional dot product + bias)
3
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 13 27 Jan 2016
Convolution Layer
activation map
32x32x3 image
5x5x3 filter
32
28
32 28
3 1
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 14 27 Jan 2016
28
32 28
3 1
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 15 27 Jan 2016
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
activation maps
32
28
Convolution Layer
32 28
3 6
32 28
CONV,
ReLU
e.g. 6
5x5x3
32 filters 28
3 6
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 17 27 Jan 2016
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
32 filters 28 filters 24
3 6 10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 18 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 20 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 21 27 Jan 2016
preview:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 22 27 Jan 2016
28
32 28
3 1
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 23 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 24 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 25 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 26 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 28 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 29 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 30 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
=> 3x3 output!
7
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 31 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 32 27 Jan 2016
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 33 27 Jan 2016
N
Output size:
(N - F) / stride + 1
F
e.g. N = 7, F = 3:
F N
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33 :\
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 34 27 Jan 2016
(recall:)
(N - F) / stride + 1
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 35 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 36 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 37 27 Jan 2016
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
32 filters 28 filters 24
3 6 10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 38 27 Jan 2016
Examples time:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 39 27 Jan 2016
Examples time:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 40 27 Jan 2016
Examples time:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 41 27 Jan 2016
Examples time:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 42 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 43 27 Jan 2016
Common settings:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 44 27 Jan 2016
1x1 CONV
56 with 32 filters
56
(each filter has size
1x1x64, and performs a
64-dimensional dot
56 product)
56
64 32
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 45 27 Jan 2016
Pooling layer
- makes the representations smaller and more manageable
- operates over each activation map independently:
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 54 27 Jan 2016
MAX POOLING
3 2 1 0 3 4
1 2 3 4
y
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 55 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 56 27 Jan 2016
Common settings:
F = 2, S = 2
F = 3, S = 2
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 57 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 58 27 Jan 2016
Demo
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 60 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 61 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 62 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 63 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 64 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 65 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 66 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 67 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 68 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 69 27 Jan 2016
AlexNet but:
CONV1: change from (11x11 stride 4) to (7x7 stride 2)
CONV3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512
ImageNet top 5 error: 15.4% -> 14.8%
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 70 27 Jan 2016
best model
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 71 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 72 27 Jan 2016
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 73 27 Jan 2016
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 74 27 Jan 2016
Inception module
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 75 27 Jan 2016
number of
filters
1x1 Filter
concatenation
5x5
Previous layer
Naive idea
Filter
concatenation
Previous
layer
Filter
concatenation
Previous
layer
Inception module
Filter
concatenation
Previous
layer
Compared to AlexNet:
- 12X less params
- 2x more compute
- 6.67% (vs. 16.4%)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 76 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 77 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 78 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 79 27 Jan 2016
at runtime: faster
than a VGGNet!
(even though it has
8x more layers)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 80 27 Jan 2016
Case Study:
ResNet 224x224x3
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 81 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 82 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 83 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 84 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 85 27 Jan 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 87 27 Jan 2016
policy network:
[19x19x48] Input
CONV1: 192 5x5 filters , stride 1, pad 2 => [19x19x192]
CONV2..12: 192 3x3 filters, stride 1, pad 1 => [19x19x192]
CONV: 1 1x1 filter, stride 1, pad 0 => [19x19] (probability map of promising moves)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 88 27 Jan 2016
Data augmentation
Transfer learning
Use of small filters
Implementing CNN efficiently
Use of GPUs
About floating point precision
Data Augmentation
“cat”
Load image
and label
Compute
loss
CNN
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 12 17 Feb 2016
Data Augmentation
“cat”
Load image
and label
Compute
loss
CNN
Transform image
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 13 17 Feb 2016
Data Augmentation
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 14 17 Feb 2016
Data Augmentation
1. Horizontal flips
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 15 17 Feb 2016
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 16 17 Feb 2016
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 17 Feb 2016
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 18 17 Feb 2016
Data Augmentation
2. Random crops/scales
Training: sample random crops / scales
ResNet:
1. Pick random L in range [256, 480]
2. Resize training image, short side = L
3. Sample random 224 x 224 patch
Data Augmentation
3. Color jitter
Simple:
Randomly jitter contrast
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 20 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 21 17 Feb 2016
Data Augmentation
4. Get creative!
Random mix/combinations of :
- translation
- rotation
- stretching
- shearing,
- lens distortions, … (go crazy)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 22 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 24 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 27 17 Feb 2016
Freeze these
Train this
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 28 17 Feb 2016
Freeze these
Freeze these
Train this
Train this
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 29 17 Feb 2016
DeCAF: A Deep
Convolutional Activation
Feature for Generic Visual
Recognition
[Donahue*, Jia*, et al.,
2013]
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 31 17 Feb 2016
quite a lot of ? ?
data
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 32 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 33 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 34 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 38 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 41 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 42 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 43 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 44 17 Feb 2016
X
Answer: 7 x 7
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 45 17 Feb 2016
X
Three 3 x 3 conv
X gives similar
Answer: 7 x 7 representational
power as a single
7 x 7 convolution
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 46 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 47 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 48 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 49 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 50 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 51 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 52 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 53 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 54 17 Feb 2016
H x W x (C / 2)
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 55 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 56 17 Feb 2016
HxWxC
Conv 1x1, C/2 filters Bottleneck
sandwich HxWxC
H x W x (C / 2)
Conv 3x3, C/2 filters Conv 3x3, C filters
H x W x (C / 2) Single
3 x 3 conv HxWxC
Conv 1x1, C filters
HxWxC
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 58 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 60 17 Feb 2016
HxWxC
Conv 1x3, C filters
HxWxC
Conv 3x1, C filters
HxWxC
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 61 17 Feb 2016
HxWxC
Conv 1x3, C filters 6 C2 HxWxC
parameters
HxWxC Conv 3x3, C filters
Conv 3x1, C filters 9C 2
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 62 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 63 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 64 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 66 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
Reshape K x K x C
receptive field to column
with K2C elements
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
Matrix multiply
D x N result;
(K2C) x N matrix D x (K2C) matrix reshape to output tensor
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
Case study:
CONV forward in Caffe
library
im2col
bias offset
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 73 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 75 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 76 17 Feb 2016
Vasilache et al, Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 77 17 Feb 2016
From Wikipedia
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 78 17 Feb 2016
Lavin and Gray (2015) work out special cases for 3x3
convolutions:
Lavin and Gray, “Fast Algorithms for Convolutional Neural Networks”, 2015
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 79 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 80 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 81 17 Feb 2016
CEO of NVIDIA:
Jen-Hsun Huang
(Stanford EE Masters
1992)
GTC 2015:
Introduced new Titan X
GPU by bragging about
AlexNet benchmarks
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 90 17 Feb 2016
CPU
Few, fast cores (1 - 16)
Good at sequential processing
GPU
Many, slower cores (thousands)
Originally for graphics
Good at parallel computation
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 91 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 92 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 93 17 Feb 2016
All comparisons are against a 12-core Intel E5-2679v2 CPU @ 2.4GHz running Caffe with Intel MKL 11.1.3.
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 94 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 95 17 Feb 2016
Alex Krizhevsky, “One weird trick for parallelizing convolutional neural networks”
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 96 17 Feb 2016
Data parallelism
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 97 17 Feb 2016
Data parallelism
Model parallelism
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 98 17 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 99 17 Feb 2016
Bottlenecks
to be aware of
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
0
while
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
1
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
2
e.g.
AlexNet: ~3GB needed with batch size 256
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
3
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
5
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
6
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
7
CNNs on MNIST
Gupta et al, “Deep Learning with Limited Numerical Precision”, ICML 2015
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
8
Courbariaux et al, “Training Deep Neural Networks with Low Precision Multiplications”, ICLR 2015
10
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
9
Courbariaux et al, “BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1”, arXiv 2016
11
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
0
11
Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 11 - 17 Feb 2016
1
Conclusions
Conclusions
Conclusions