0% found this document useful (0 votes)

7 views60 pages

Tutorial On DNN 6 of 9 Network and Hardware Co Design

The document discusses various approaches to optimize Deep Neural Network (DNN) models through hardware co-design, focusing on reducing operand size and the number of operations. It highlights techniques such as quantization, network pruning, and the use of compact architectures to enhance efficiency while maintaining accuracy. Additionally, it presents energy costs associated with different operations and the impact of precision on model performance.

Uploaded by

Tirth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views60 pages

Tutorial On DNN 6 of 9 Network and Hardware Co Design

Uploaded by

Tirth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

DNN Model and

Hardware Co-Design

ISCA Tutorial (2017)

Website: http://eyeriss.mit.edu/tutorial.html
Joel Emer, Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang 1
Approaches

• Reduce size of operands for storage/compute

– Floating point à Fixed point
– Bit-width reduction
– Non-linear quantization

• Reduce number of operations for storage/compute

– Exploit Activation Statistics (Compression)
– Network Pruning
– Compact Network Architectures

2
Cost of Operations
Operation: Energy Relative Energy Cost Area Relative Area Cost
(pJ) (µm2)
8b Add 0.03 36
16b Add 0.05 67
32b Add 0.1 137
16b FP Add 0.4 1360
32b FP Add 0.9 4184
8b Mult 0.2 282
32b Mult 3.1 3495
16b FP Mult 1.1 1640
32b FP Mult 3.7 7700
32b SRAM Read (8KB) 5 N/A
32b DRAM Read 640 N/A
1 10 102 103 104 1 10 102 103
[Horowitz, “Computing’s Energy Problem (and what we can do about it)”, ISSCC 2014]
3
Number Representation

23 Range Accuracy
1 8
FP32 S E M 10-38 – 1038 .000006%
1 5 10
FP16 S E M 6x10-5 - 6x104 .05%

1 31
Int32 S M 0 – 2x109 ½

1 15
Int16 S M 0 – 6x104 ½

1 7
Int8 S M 0 – 127 ½

Image Source: B. Dally 4

Floating Point à Fixed Point
Floating Point
sign exponent (8-bits) mantissa (23-bits)

32-bit float 10100101000000000101000000000100

-1.42122425 x 10-13 s=1 e = 70 m = 20482

Fixed Point
sign mantissa (7-bits)

8-bit 01100110
fixed
integer fractional
(4-bits) (3-bits)
12.75 s=0 m=102

5
N-bit Precision
For no loss in precision, M is determined based on largest
filter size (in the range of 10 to 16 bits for popular DNNs)

2N+M-bits
Weight
(N-bits)
2N-bits Quantize Output
+ Accumulate to N-bits (N-bits)
Activation NxN
(N-bits) multiply

6
Dynamic Fixed Point
Floating Point
sign exponent (8-bits) mantissa (23-bits)

32-bit float 10100101000000000101000000000100

-1.42122425 x 10-13 s=1 e = 70 m = 20482

Fixed Point
sign mantissa (7-bits) sign mantissa (7-bits)

8-bit 01100110 8-bit 0 1100110

dynamic dynamic
fixed integer fractional fixed fractional
([7-f ]-bits) (f-bits) (f-bits)
12.75 s=0 m=102 f=3 0.19921875 s=0 m=102 f=9

Allow f to vary based on data type and layer 7

Impact on Accuracy

Top-1 accuracy
on of CaffeNet
on ImageNet

w/o fine tuning

[Gysel et al., Ristretto, ICLR 2016] 8

Avoiding Dynamic Fixed Point

Batch normalization ‘centers’ dynamic range

AlexNet Image Source: Moons

(Layer 6) et al, WACV 2016

‘Centered’ dynamic ranges might reduce need for

dynamic fixed point
9
Nvidia PASCAL

“New half-precision, 16-bit

floating point instructions
deliver over 21 TeraFLOPS for
unprecedented training
performance. With 47 TOPS
(tera-operations per second)
of performance, new 8-bit
integer instructions in Pascal
allow AI algorithms to deliver
real-time responsiveness for
deep learning inference.”

– Nvidia.com (April 2016)

10
Google’s Tensor Processing Unit (TPU)

“ With its TPU Google has

seemingly focused on delivering
the data really quickly by cutting
down on precision. Specifically,
it doesn’t rely on floating point
precision like a GPU
….
Instead the chip uses integer
math…TPU used 8-bit integer.”

- Next Platform (May 19, 2016)

[Jouppi et al., ISCA 2017] 11

Precision Varies from Layer to Layer

[Judd et al., ArXiv 2016] [Moons et al., WACV 2016] 12

Bitwidth Scaling (Speed)
Bit-Serial Processing: Reduce Bit-width à Skip Cycles
Speed up of 2.24x vs. 16-bit fixed

[Judd et al., Stripes, CAL 2016] 13

Bitwidth Scaling (Power)

Reduce Bit-width à
Shorter Critical Path
à Reduce Voltage

Power reduction of
2.56x vs. 16-bit fixed
On AlexNet Layer 2

[Moons et al., VLSI 2016] 14

Binary Nets
Binary Filters
• Binary Connect (BC)
– Weights {-1,1}, Activations 32-bit float
– MAC à addition/subtraction
– Accuracy loss: 19% on AlexNet
[Courbariaux, NIPS 2015]

• Binarized Neural Networks (BNN)

– Weights {-1,1}, Activations {-1,1}
– MAC à XNOR
– Accuracy loss: 29.8% on AlexNet
[Courbariaux, arXiv 2016]
15
Scale the Weights and Activations
• Binary Weight Nets (BWN)
– Weights {-α, α} à except first and last layers are 32-bit float
– Activations: 32-bit float
– α determined by the l1-norm of all weights in a layer
– Accuracy loss: 0.8% on AlexNet
Hardware needs to support
• XNOR-Net both activation precisions
– Weights {-α, α}
– Activations {-βi, βi} à except first and last layers are 32-bit float
– βi determined by the l1-norm of all activations across channels
for given position i of the input feature map
– Accuracy loss: 11% on AlexNet

Scale factors (α, βi) can change per layer or position in filter

[Rastegari et al., BWN & XNOR-Net, ECCV 2016] 16

XNOR-Net

[Rastegari et al., BWN & XNOR-Net, ECCV 2016] 17

Ternary Nets

• Allow for weights to be zero

– Increase sparsity, but also increase number of bits (2-bits)

• Ternary Weight Nets (TWN) [Li et al., arXiv 2016]

– Weights {-w, 0, w} à except first and last layers are 32-bit float
– Activations: 32-bit float
– Accuracy loss: 3.7% on AlexNet
• Trained Ternary Quantization (TTQ) [Zhu et al., ICLR 2017]
– Weights {-w1, 0, w2} à except first and last layers are 32-bit float
– Activations: 32-bit float
– Accuracy loss: 0.6% on AlexNet

18
Non-Linear Quantization
• Precision refers to the number of levels
– Number of bits = log2 (number of levels)

• Quantization: mapping data to a smaller set of levels

– Linear, e.g., fixed-point
– Non-linear
• Computed
• Table lookup

Objective: Reduce size to improve speed and/or reduce energy

while preserving accuracy

19
Computed Non-linear Quantization

Log Domain Quantization

Product = X * W Product = X << W

[Lee et al., LogNet, ICASSP 2017] 20

Log Domain Computation

Only activation
in log domain

Both weights
and activations
in log domain

max, bitshifts, adds/subs

[Miyashita et al., arXiv 2016]

21
Log Domain Quantization
• Weights: 5-bits for CONV, 4-bit for FC; Activations: 4-bits
• Accuracy loss: 3.2% on AlexNet

Shift and Add

[Miyashita et al., arXiv 2016],

[Lee et al., LogNet, ICASSP 2017]

22
Reduce Precision Overview

• Learned mapping of data to quantization levels

(e.g., k-means)

Implement with
look up table

[Han et al., ICLR 2016]

• Additional Properties
– Fixed or Variable (across data types, layers, channels, etc.)
23
Non-Linear Quantization Table Lookup
Trained Quantization: Find K weights via K-means clustering
to reduce number of unique weights per layer (weight sharing)
Example: AlexNet (no accuracy loss)
256 unique weights for CONV layer
16 unique weights for FC layer

Smaller Weight
Memory Weight Overhead Does not reduce
index Weight precision of MAC
Weight Weight
Memory (log2U-bits) Decoder/ (16-bits) MAC
CRSM x Dequant Output
log2U-bits U x 16b Activation
(16-bits)
Input
Activation
(16-bits)

Consequences: Narrow weight memory and second access from (small) table

24
[Han et al., Deep Compression, ICLR 2016]
Summary of Reduce Precision
Category Method Weights Activations Accuracy Loss vs.
(# of bits) (# of bits) 32-bit float (%)
Dynamic Fixed w/o fine-tuning 8 10 0.4
Point w/ fine-tuning 8 8 0.6
Reduce weight Ternary weights 2* 32 3.7
Networks (TWN)
Trained Ternary 2* 32 0.6
Quantization (TTQ)
Binary Connect (BC) 1 32 19.2
Binary Weight Net 1* 32 0.8
(BWN)
Reduce weight Binarized Neural Net 1 1 29.8
and activation (BNN)
XNOR-Net 1* 1 11
Non-Linear LogNet 5(conv), 4(fc) 4 3.2
Weight Sharing 8(conv), 4(fc) 16 0
* first and last layers are 32-bit float

Full list @ [Sze et al., arXiv, 2017] 25

Reduce Number of Ops and Weights

• Exploit Activation Statistics

• Network Pruning
• Compact Network Architectures
• Knowledge Distillation

26
Sparsity in Fmaps
Many zeros in output fmaps after ReLU
ReLU
9 -1 -3 9 0 0
1 -5 5 1 0 5
-2 6 -1 0 6 0

# of activations # of non-zero activations

1
0.8
0.6
(Normalized) 0.4

0.2
0
1 2 3 4 5
CONV Layer 27
I/O Compression in Eyeriss
Link Clock Core Clock DCNN Accelerator
14×12 PE Array
Filter Filt
…
Run-Length Compression (RLC)
Input Image Buffer Img
…
Example:
SRAM
Decomp Psum
Input: 0, 0, 12, 0, 0, 0, 0, 53, 0, 0, 22, …
…
108KB (64b): RunLevelRunLevel RunLevelTerm
Output Image Output
Psum 2 12 4 53 2 22 0

…
Comp ReLU
5b 16b 5b 16b 5b 16b 1b
…

Off-Chip DRAM
64 bits
[Chen et al., ISSCC 2016] 28
Compression Reduces DRAM BW

Uncompressed Compressed
1.2×
66 1.4×
DRAM Access (MB)

5 1.7×
DRAM 44 1.8× Uncompressed
Access 3 1.9× Fmaps + Weights
(MB) 22
1 RLE Compressed
00 Fmaps + Weights
11 22 33 44 55
AlexNet Conv Layer
AlexNet Conv Layer

Simple RLC within 5% - 10% of theoretical entropy limit

[Chen et al., ISSCC 2016] 29

Data Ga&ng / Zero Skipping in Eyeriss
Skip MAC and mem reads
Image when image data is zero.
Img Scratch Pad Reduce PE power by 45%
(12x16b REG)
2-stage
== 0 Zero Enable pipelined Accumulate
Buffer multiplier Input Psum
Filt Filter Output
Scratch Pad 0 Psum
(225x16b SRAM)

1
Input
Psum 0 1
Partial Sum
Scratch Pad 0
(24x16b REG) Reset

[Chen et al., ISSCC 2016] 30

Cnvlutin
• Process Convolution Layers
• Built on top of DaDianNao (4.49% area overhead)
• Speed up of 1.37x (1.52x with activation pruning)

[Albericio et al., ISCA 2016] 31

Pruning Activations
Remove small activation values
Speed up 11% (ImageNet) Reduce power 2x (MNIST)
Minerva
Cnvlutin

[Albericio et al., ISCA 2016] [Reagen et al., ISCA 2016] 32

Pruning – Make Weights Sparse

• Optimal Brain Damage

1. Choose a reasonable network
architecture
2. Train network until reasonable
solution obtained
3. Compute the second derivative
for each weight retraining
4. Compute saliencies (i.e. impact
on training error) for each weight
5. Sort weights by saliency and
delete low-saliency weights
6. Iterate to step 2

[Lecun et al., NIPS 1989] 33

Pruning – Make Weights Sparse
Prune based on magnitude of weights

Example: AlexNet
Weight Reduction: CONV layers 2.7x, FC layers 9.9x
(Most reduction on fully connected layers)
Overall: 9x weight reduction, 3x MAC reduction

[Han et al., NIPS 2015] 34

Speed up of Weight Pruning on CPU/GPU
On Fully Connected Layers Only
Average Speed up of 3.2x on GPU, 3x on CPU, 5x on mGPU

Intel Core i7 5930K: MKL CBLAS GEMV, MKL SPBLAS CSRMV

NVIDIA GeForce GTX Titan X: cuBLAS GEMV, cuSPARSE CSRMV
NVIDIA Tegra K1: cuBLAS GEMV, cuSPARSE CSRMV

Batch size = 1

[Han et al., NIPS 2015] 35

Key Metrics for Embedded DNN

• Accuracy à Measured on Dataset

• Speed à Number of MACs
• Storage Footprint à Number of Weights
• Energy à ?

36
Energy-Aware Pruning

• # of Weights alone is not a good metric for

energy
– Example (AlexNet):
• # of Weights (FC Layer) > # of Weights (CONV layer)
• Energy (FC Layer) < Energy (CONV layer)

• Use energy evaluation method to estimate DNN

energy
– Account for data movement

[Yang et al., CVPR 2017] 37

Energy-Evaluation Methodology

CNN Shape Configuration Hardware Energy Costs of each

(# of channels, # of filters, etc.) MAC and Memory Access

# acc. at mem. level 1

Memory # acc. at mem. level 2
Accesses

…
Optimization # acc. at mem. level n Edata

# of MACs # of MACs Ecomp

Calculation

CNN Weights and Input Data Energy

[0.3, 0, -0.4, 0.7, 0, 0, 0.1, …]
L1 L2 L3 …
CNN Energy Consumption
Evaluation tool available at http://eyeriss.mit.edu/energy.html 38
Key Observations

• Number of weights alone is not a good metric for energy

• All data types should be considered
Computa&on
10% Input Feature Map
25%

Weights
Energy Consump&on 22%
of GoogLeNet

Output Feature Map

43%

[Yang et al., CVPR 2017] 39

Energy Consumption of Existing DNNs
93%
91% ResNet-50
VGG-16
Top-5 Accuracy

89% GoogLeNet
87%
85%
83%
81% AlexNet SqueezeNet
79%
77%
5E+08 5E+09 5E+10
Normalized Energy Consump&on
Original DNN

Deeper CNNs with fewer weights do not necessarily consume less

energy than shallower CNNs with more weights
[Yang et al., CVPR 2017] 40
Magnitude-based Weight Pruning
93%
91% ResNet-50
VGG-16
Top-5 Accuracy

89% GoogLeNet
87%
85%
83% SqueezeNet
81% AlexNet SqueezeNet
79% AlexNet

77%
5E+08 5E+09 5E+10
Normalized Energy Consump&on
Original DNN Magnitude-based Pruning [Han
[6] et al., NIPS 2015]

Reduce number of weights by removing small magnitude weights

41
Energy-Aware Pruning
93%
91% ResNet-50
VGG-16
Top-5 Accuracy

89% GoogLeNet
87% GoogLeNet

85%
83% 1.74x SqueezeNet
81% AlexNet SqueezeNet
AlexNet AlexNet SqueezeNet
79%
77%
5E+08 5E+09 5E+10
Normalized Energy Consump&on
Original DNN Magnitude-based Pruning [6] Energy-aware Pruning (This Work)

Remove weights from layers in order of highest to lowest energy

3.7x reduction in AlexNet / 1.6x reduction in GoogLeNet
DNN Models available at http://eyeriss.mit.edu/energy.html 42
Energy Estimation Tool
Website: https://energyestimation.mit.edu/
Input DNN Configuration File

Output DNN energy breakdown across layers

[Yang et al., CVPR 2017] 43

Compression of Weights & Activations
• Compress weights and activations between DRAM
and accelerator
• Variable Length / Huffman Coding
Example:
Value: 16’b0 à Compressed Code: {1’b0}
Value: 16’bx à Compressed Code: {1’b1, 16’bx}
• Tested on AlexNet à 2× overall BW Reduction

[Moons et al., VLSI 2016; Han et al., ICLR 2016] 44

Sparse Matrix-Vector DSP
• Use CSC rather than CSR for SpMxV
Compressed Sparse Row (CSR) Compressed Sparse Column (CSC)
N

Reduce memory bandwidth (when not M >> N)

For DNN, M = # of filters, N = # of weights per filter
[Dorrance et al., FPGA 2014] 45
EIE: A Sparse Linear Algebra Engine
• Process Fully Connected Layers (after Deep Compression)
• Store weights column-wise in Run Length format
• Read relative column when input is non-zero

Supports Fully Connected Layers Only

Input ~a 0 a 1 0 a3 Dequantize Weight

⇥ Output
~b
0 1 0 1 0 1
P E0 w0,0 w0,1 0 w0,3 b0 b0
B C B C B C
P E1 B 0 0 w1,2 0 C B b1 C B b1 C
B C B C B C
P E2 B 0 w2,1 0 w2,3 C
Weights
B B
C B b2 C
C
B
B 0 C
C
P E3 B
B 0 0 0 0 C B
C B b3 C
C ReLU
B
B b3 C
C
B C=B C ) B C
B 0 0 w4,2 w4,3 C B b4 C B 0 C
B C B C B C
Bw5,0 0 0 0 C B
B C B b5 C
C
B
B b5 C
C Keep track of location
B C B C B C
@ 0 0 0 w6,3 A @ b6 A @ b6 A
0 w7,1 0 0 b7 0
Output Stationary Dataflow

[Han et al., ISCA 2016] 46

Sparse CNN (SCNN)
Supports Convolutional Layers

Densely Packed All-to all

Storage of Weights Mechanism to Add to
Multiplication of
Scattered Partial Sums
and Activations Weights and Activations

a a * x
b a * y
x a * z
c
d
y = Scatter
z b * x network
e b * y
f
b * z
Accumulate MULs
…

PE frontend PE backend
Input Stationary Dataflow
[Parashar et al., ISCA 2017] 47
Structured/Coarse-Grained Pruning
• Scalpel
– Prune to match the underlying data-parallel hardware
organization for speed up

Example: 2-way SIMD

Dense weights Sparse weights

[Yu et al., ISCA 2017] 48

Compact Network Architectures

• Break large convolutional layers into a series

of smaller convolutional layers
– Fewer weights, but same effective receptive field

• Before Training: Network Architecture Design

• After Training: Decompose Trained Filters

49
Network Architecture Design
Build Network with series of Small Filters
GoogleNet/Inception v3
5x5 filter Apply sequentially
5x1 filter

1x5 filter
decompose

separable
filters

VGG-16
5x5 filter Two 3x3 filters Apply sequentially

decompose

50
Network Architecture Design
Reduce size and computation with 1x1 Filter (bottleneck)

Figure Source:
Stanford cs231n

Used in Network In Network(NiN) and GoogLeNet

[Lin et al., ArXiV 2013 / ICLR 2014] [Szegedy et al., ArXiV 2014 / CVPR 2015]
51
Network Architecture Design
Reduce size and computation with 1x1 Filter (bottleneck)

Figure Source:
Stanford cs231n

Used in Network In Network(NiN) and GoogLeNet

[Lin et al., ArXiV 2013 / ICLR 2014] [Szegedy et al., ArXiV 2014 / CVPR 2015]
52
Network Architecture Design
Reduce size and computation with 1x1 Filter (bottleneck)

Figure Source:
Stanford cs231n

Used in Network In Network(NiN) and GoogLeNet

[Lin et al., ArXiV 2013 / ICLR 2014] [Szegedy et al., ArXiV 2014 / CVPR 2015]
53
Bottleneck in Popular DNN models

compress

ResNet
expand

GoogleNet

compress

54
SqueezeNet
Reduce weights by reducing number of input
channels by “squeezing” with 1x1
50x fewer weights than AlexNet
(no accuracy loss)
Fire Module

[F.N. Iandola et al., ArXiv, 2016]] 55

Energy Consumption of Existing DNNs
93%
91% ResNet-50
VGG-16
Top-5 Accuracy

89% GoogLeNet
87%
85%
83%
81% AlexNet SqueezeNet
79%
77%
5E+08 5E+09 5E+10
Normalized Energy Consump&on
Original DNN

Deeper CNNs with fewer weights do not necessarily consume less

energy than shallower CNNs with more weights
[Yang et al., CVPR 2017] 56
Decompose Trained Filters
After training, perform low-rank approximation by applying tensor
decomposition to weight kernel; then fine-tune weights for accuracy

R = canonical rank [Lebedev et al., ICLR 2015] 57

Decompose Trained Filters
Visualization of Filters
Original Approx.

• Speed up by 1.6 – 2.7x on CPU/GPU for CONV1,

CONV2 layers
• Reduce size by 5 - 13x for FC layer
• < 1% drop in accuracy
[Denton et al., NIPS 2014] 58
Decompose Trained Filters on Phone
Tucker Decomposition

59
[Kim et al., ICLR 2016]
Knowledge Distillation

class
probabilities

softmax
softmax
DNN A DNN B
(teacher) (teacher)

Try to match

softmax
DNN
(student)

[Bucilu et al., KDD 2006],[Hinton et al., arXiv 2015] 60

Paper 8
No ratings yet
Paper 8
7 pages
High-Performance Hardware For Machine Learning - 0916
No ratings yet
High-Performance Hardware For Machine Learning - 0916
68 pages
Risc Acc
No ratings yet
Risc Acc
7 pages
5 Lecture 28 01 25
No ratings yet
5 Lecture 28 01 25
47 pages
BNNs
No ratings yet
BNNs
18 pages
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
No ratings yet
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
41 pages
Neuromorphic Architectures Lec 4-16-1731320691
No ratings yet
Neuromorphic Architectures Lec 4-16-1731320691
276 pages
Auto QNN
No ratings yet
Auto QNN
23 pages
Binary Neural Networks
No ratings yet
Binary Neural Networks
218 pages
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
No ratings yet
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
13 pages
FP BNN On FPGA
No ratings yet
FP BNN On FPGA
15 pages
Back To Simplicit - How To Train Accurate BNNs From Scratch
No ratings yet
Back To Simplicit - How To Train Accurate BNNs From Scratch
9 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
An Energy Efficient Convolutional Neural Network Accelerator For Speech Classification Based On FPGA and Quantization
No ratings yet
An Energy Efficient Convolutional Neural Network Accelerator For Speech Classification Based On FPGA and Quantization
13 pages
Deep Convolutional Neural Network Inference With Floating-Point Weights and
No ratings yet
Deep Convolutional Neural Network Inference With Floating-Point Weights and
10 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
10 1109@mwscas48704 2020 9184436
No ratings yet
10 1109@mwscas48704 2020 9184436
4 pages
Introduction To Weight Quantization PDF
No ratings yet
Introduction To Weight Quantization PDF
9 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
And The Bit Goes Down
No ratings yet
And The Bit Goes Down
11 pages
Unit III
No ratings yet
Unit III
58 pages
EECS251Leture-JennyHuang 2021
No ratings yet
EECS251Leture-JennyHuang 2021
67 pages
Basic Design Approaches To Accelerating Deep Neural Networks
No ratings yet
Basic Design Approaches To Accelerating Deep Neural Networks
93 pages
Jacob Quantization and Training
No ratings yet
Jacob Quantization and Training
10 pages
A Survey of Quantization Methods For Efficient Neural Network Inference
No ratings yet
A Survey of Quantization Methods For Efficient Neural Network Inference
33 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
BNN in FPGA
No ratings yet
BNN in FPGA
15 pages
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
No ratings yet
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
5 pages
Quantization and Training of Neural Networks For Efficient Integer-Arithmetic-Only Inference
No ratings yet
Quantization and Training of Neural Networks For Efficient Integer-Arithmetic-Only Inference
14 pages
Zhang 2021
No ratings yet
Zhang 2021
12 pages
DAC'22 EBSP Bit Sparsity DNN
No ratings yet
DAC'22 EBSP Bit Sparsity DNN
6 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Tesi
No ratings yet
Tesi
73 pages
MicroNet ICCV2021
No ratings yet
MicroNet ICCV2021
10 pages
DNN Architectures
No ratings yet
DNN Architectures
12 pages
A Survey of Model Compression and Acceleration For Deep Neural Networks
No ratings yet
A Survey of Model Compression and Acceleration For Deep Neural Networks
10 pages
An Survey of Neural Network Compression
No ratings yet
An Survey of Neural Network Compression
73 pages
ICCV'21 Liu Improving Neural Network Efficiency Via Post-Training Quantization With Adaptive Floating-Point ICCV 2021 Paper
No ratings yet
ICCV'21 Liu Improving Neural Network Efficiency Via Post-Training Quantization With Adaptive Floating-Point ICCV 2021 Paper
10 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Tesi
No ratings yet
Tesi
57 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
DNN Accelerators
No ratings yet
DNN Accelerators
29 pages
Efficient Deep Learning in Network Compression and
No ratings yet
Efficient Deep Learning in Network Compression and
21 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
10 3390@electronics8030295
No ratings yet
10 3390@electronics8030295
15 pages
A DNN Optimization Framework With Unlabeled
No ratings yet
A DNN Optimization Framework With Unlabeled
5 pages
Module 10 - Learners Guide
No ratings yet
Module 10 - Learners Guide
29 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
3 Lecture 21 01 25
No ratings yet
3 Lecture 21 01 25
62 pages
Reactnet: Towards Precise Binary Neural Network With Generalized Activation Functions
No ratings yet
Reactnet: Towards Precise Binary Neural Network With Generalized Activation Functions
18 pages
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
No ratings yet
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
5 pages
Applsci 15 00688 v3
No ratings yet
Applsci 15 00688 v3
21 pages
23-CNN Operations - Architecture - Simple Convolution Network-09!09!2024
No ratings yet
23-CNN Operations - Architecture - Simple Convolution Network-09!09!2024
8 pages
Faraone 2018
No ratings yet
Faraone 2018
4 pages
Efficient Hardware For DNN
No ratings yet
Efficient Hardware For DNN
77 pages
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
No ratings yet
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
8 pages
Employability Skills: Brush Up Your Computing
From Everand
Employability Skills: Brush Up Your Computing
Clive W. Humphris
No ratings yet
Unit3 Part 2
No ratings yet
Unit3 Part 2
15 pages
Configuration of LAN and WAN
100% (1)
Configuration of LAN and WAN
9 pages
Coding Sites - : Hackerearth or Hackerrank Interviewbit
No ratings yet
Coding Sites - : Hackerearth or Hackerrank Interviewbit
4 pages
PreCAT Syllabus-3
No ratings yet
PreCAT Syllabus-3
5 pages
Top 100 Python Interview Questions
100% (1)
Top 100 Python Interview Questions
19 pages
CC Unit-4
No ratings yet
CC Unit-4
33 pages
Precision 3590 Spec Sheet
No ratings yet
Precision 3590 Spec Sheet
14 pages
Docker Basic Interview Questionand Answers
No ratings yet
Docker Basic Interview Questionand Answers
8 pages
U2 E3 Test Plan
No ratings yet
U2 E3 Test Plan
9 pages
AMP Practical
No ratings yet
AMP Practical
54 pages
Amcol 2009jun23 Conv Ta 63
No ratings yet
Amcol 2009jun23 Conv Ta 63
5 pages
DMI-GE User Guide v1.40
No ratings yet
DMI-GE User Guide v1.40
20 pages
4 02 0250 20010 Vci en A4
No ratings yet
4 02 0250 20010 Vci en A4
20 pages
Real Time Scheduling
100% (1)
Real Time Scheduling
103 pages
Azure Devops Resume Example
No ratings yet
Azure Devops Resume Example
1 page
001-84932 PSoC 5LP CY8C58LP Family Datasheet Programmable System-on-Chip PSoC
No ratings yet
001-84932 PSoC 5LP CY8C58LP Family Datasheet Programmable System-on-Chip PSoC
140 pages
Topic 5-Abstract Data Structures - Revision - Notes
100% (1)
Topic 5-Abstract Data Structures - Revision - Notes
12 pages
Exception Handling in Java
No ratings yet
Exception Handling in Java
34 pages
Pelco CM6700 Matrix Switcher-Controller Manual PDF
No ratings yet
Pelco CM6700 Matrix Switcher-Controller Manual PDF
44 pages
Working With The DataFlux Data Job Transform
No ratings yet
Working With The DataFlux Data Job Transform
22 pages
Upgradation Process
No ratings yet
Upgradation Process
6 pages
Sony Vegas Pro 14
No ratings yet
Sony Vegas Pro 14
10 pages
Valvelink Software
No ratings yet
Valvelink Software
12 pages
Brother QL-800
No ratings yet
Brother QL-800
12 pages
28-Computer Basic Questions With Answers PDF Notes For All Exams
No ratings yet
28-Computer Basic Questions With Answers PDF Notes For All Exams
43 pages
Ankit Verma: Mobile Number:9198767513 Enrolment Number:IIT2013157
No ratings yet
Ankit Verma: Mobile Number:9198767513 Enrolment Number:IIT2013157
4 pages
Shield Pt2313
No ratings yet
Shield Pt2313
7 pages
Pedestrian Crossing Traffic Light System Tasks 4 To 6
No ratings yet
Pedestrian Crossing Traffic Light System Tasks 4 To 6
3 pages
Object Oriented Analysis and Design (OOAD) : Sequence Diagrams
No ratings yet
Object Oriented Analysis and Design (OOAD) : Sequence Diagrams
72 pages
SAP BODS Course Curriuculum - Mr. Veeresh
No ratings yet
SAP BODS Course Curriuculum - Mr. Veeresh
2 pages