0% found this document useful (0 votes)

12 views

Deep Nets

Uploaded by

Khang Thái Duy

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Deep Nets

Uploaded by

Khang Thái Duy

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

CSCI 5922

Neural Networks and Deep Learning

Deep Nets

Mike Mozer
Department of Computer Science and
Institute of Cognitive Science
University of Colorado at Boulder
Why Stop At One Hidden Layer?


E.g., vision hierarchy for recognizing handprinted text

Word output layer

Character hidden layer 3
Stroke hidden layer 2
Edge hidden layer 1
Pixel input layer
From Ng’s group
Why Deeply Layered Networks Fail


Credit assignment problem
 How is a neuron in layer 2 supposed to know
what it should output until all the neurons
above it do something sensible?
 How is a neuron in layer 4 supposed to know
what it should output until all the neurons
below it do something sensible?
Deeper Vs. Shallower Nets


Deeper net can represent any mapping that shallower net can
 Use identity mappings for the additional layers

Deeper net in principle is more likely to overfit

But in practice it often underfits on the training set
 Degradation due to harder credit-assignment problem
CIFAR-10
 Deeper isn’t always better!

ImageNet
thin=train,

thick=valid.

He, Zhang, Ren, and Sun (2015)

Why Deeply Layered Networks Fail


Vanishing gradient problem
 With logistic or tanh units
1 𝜕𝑦𝑗
𝑦 𝑗= =𝑦 𝑗 ( 1 − 𝑦 𝑗 )
1+ exp ⁡( − 𝑧 𝑗 ) 𝜕𝑧 𝑗
𝜕𝑦𝑗
𝑦 𝑗 =tanh ( 𝑧 𝑗 ) =( 1+ 𝑦 𝑗 ) ( 1− 𝑦 𝑗 )
𝜕𝑧 𝑗

gradient
 Error gradients get squashed as they
are passed back through a deep
network
𝒏 …𝟑 𝟐 𝟏
layer
Why Deeply Layered Networks Fail

Exploding gradient problem

gradient


 with linear or ReLU units

𝒚 =m ax (𝟎 , 𝒛 )
𝝏𝒚
𝝏𝒛
={𝟎 𝒛 ≤𝟎
𝟏 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
𝒏 …𝟑 𝟐 𝟏 layer

𝝏𝒚
𝝏𝒙
=
{ 𝟎
𝒘 𝒚𝒙
𝒚 =𝟎
𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
𝒚
 can be a problem when
𝒙
Hack Solutions

gradient

Using ReLUs
 can avoid squashing of gradient
𝒏 …𝟑 𝟐 𝟏
layer

Use gradient clipping

( ( ))


𝝏𝑬
𝚫 𝒘 𝒙𝒚 𝐦𝐚𝐱 − 𝚫 𝟎 , 𝐦𝐢𝐧 ⁡ 𝜟 𝟎 ,
 for exploding gradients 𝝏 𝒘 𝒙𝒚

Use gradient sign 𝝏𝑬

( )


𝚫 𝒘 𝒙𝒚 𝐬𝐢𝐠𝐧
𝝏 𝒘 𝒙𝒚
 for exploding & vanishing gradients
Hack Solutions


Hard weight constraints
𝐦𝐢𝐧 (‖𝒘 ‖𝟐 , 𝒍 )
𝒘← 𝒘
‖𝒘‖𝟐

 avoids sigmoid saturation for vanishing gradients

 may prevent gradients from blowing up by limiting error gradient scaling
Hack Solutions


Batch normalization
 As weights are learned, the distribution
of activations changes,
 Affects ideal
 Affects appropriate learning rate for 𝑾𝟐
𝒉𝟏
 Solution
ensure that distribution of activations for 𝑾𝟏
each unit doesn’t change over the course of
learning
Hack Solutions

Batch normalization [LINK]
 : activation of a unit in any layer for minibatch example

𝑾𝟐
𝒉𝟏
𝑾𝟏

 parameters and trained via gradient descent

 separate parameters for each neuron
Hack Solutions


Unsupervised layer-by-layer pretraining
 Goal is to start weights off in a sensible configuration instead of using random
initial weights
 Methods of pretraining
autoencoders
restricted Boltzmann machines (Hinton’s group)
 The dominant paradigm from ~ 2000-2010
 Still useful today if
not much labeled data
lots of unlabeled data
Autoencoders

Self-supervised training procedure

Given a set of input vectors (no target outputs)

Map input back to itself via a hidden layer bottleneck


How to achieve bottleneck?
 Fewer neurons
 Sparsity constraint
 Information transmission constraint (e.g., add noise to unit, or shut off randomly,
a.k.a. dropout)
Autoencoder Combines
An Encoder And A Decoder

Decoder

Encoder
Stacked Autoencoders

...

copy
deep network


Note that decoders can be stacked to produce a generative domain model
Restricted Boltzmann Machines (RBMs)

For today, RBM is like an autoencoder with the output layer folded back onto the input.

 We’ll spend a whole class on probabilistic models like RBMs


Additional constraints

 probabilistic neurons
Depth of ImageNet Challenge Winner

source: http://chatbotslife.com
Trick From Long Ago To Avoid Local Optima

Add direct connections from input to output layer

Easy bits of the mapping are learned by the direct connections
 Easy bit = linear or saturating-linear functions of the input
 Often captures 80-90% of the variance in the outputs

Hidden units are reserved for learning the hard bits of the mapping
 They don’t need to learn to copy activations forward for the linear portion of the
mapping

Problem
 Adds a lot of free parameters
Latest Tricks


Novel architectures that
 skip layers
 have linear connectivity between layers

Advantage over direct-connection architectures
 no/few additional free parameters
Deep Residual Networks (ResNet)


Add linear short-cut connections to architecture


Basic building block activation flow


Variations
 allow different input and output dimensionalities:

 can have an arbitrary number of layers

He, Zhang, Ren, and Sun (2015)

ResNet


kklk

He, Zhang, Ren, and Sun (2015)

ResNet

Top-1 Error %

He, Zhang, Ren, and Sun (2015)

A Year Later…


Do proper identity mappings

He, Zhang, Ren, and Sun (2016)

Highway Networks


Suppose each layer of the network made a decision whether to
 copy input forward:
 perform a nonlinear transform of the input:

Weighting coefficient

 Coefficient is a function of the input:

Srivastava, Greff, & Schmidhuber (2015)

Highway Networks Facilitate Training Deep Networks

Srivastava, Greff, & Schmidhuber (2015)

BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Unit III
No ratings yet
Unit III
58 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
GENAI-SEE
No ratings yet
GENAI-SEE
51 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
SS_2020
No ratings yet
SS_2020
21 pages
sdl unit 2 3 4
No ratings yet
sdl unit 2 3 4
12 pages
ML prep for samsung
No ratings yet
ML prep for samsung
73 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
No ratings yet
ImageNet Classification With Deep Convolutional Convolutional Neural Networks PDF
37 pages
Artificial Neural Networks_dl
No ratings yet
Artificial Neural Networks_dl
55 pages
Machine Learning Tutorial
No ratings yet
Machine Learning Tutorial
149 pages
1
No ratings yet
1
8 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
DL Intro
No ratings yet
DL Intro
64 pages
Deep Learning & Neural Networks: Kevin Duh
No ratings yet
Deep Learning & Neural Networks: Kevin Duh
86 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
ML Modelling - part 1
No ratings yet
ML Modelling - part 1
7 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
Object Classification Using CNN
No ratings yet
Object Classification Using CNN
9 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Session: Deep Learning: Module: Digital Image Processing Module Code: IMP302
No ratings yet
Session: Deep Learning: Module: Digital Image Processing Module Code: IMP302
34 pages
More On CNN
No ratings yet
More On CNN
131 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
CNN
No ratings yet
CNN
31 pages
2 Deep Neural Network_241120_095158
No ratings yet
2 Deep Neural Network_241120_095158
47 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Chapter 2 - 2 Shallow neural network 2_2
No ratings yet
Chapter 2 - 2 Shallow neural network 2_2
34 pages
ANN notes
No ratings yet
ANN notes
7 pages
Neuralnetworks 1
No ratings yet
Neuralnetworks 1
65 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
ArtificialSuperIntelligenceASITheEvolutionofAIBeyondHumanCapacity
No ratings yet
ArtificialSuperIntelligenceASITheEvolutionofAIBeyondHumanCapacity
6 pages
RGPV Syllabus 6 Sem
No ratings yet
RGPV Syllabus 6 Sem
12 pages
Dlvu Lecture12
No ratings yet
Dlvu Lecture12
19 pages
AI Foundations _AI Notebook
No ratings yet
AI Foundations _AI Notebook
61 pages
1 22csu601-Aiml Syllabus
No ratings yet
1 22csu601-Aiml Syllabus
4 pages
Critique Paper
No ratings yet
Critique Paper
7 pages
research_paper[1]
No ratings yet
research_paper[1]
4 pages
(4.a) Morphological Operations
No ratings yet
(4.a) Morphological Operations
48 pages
103-English 1
No ratings yet
103-English 1
7 pages
Test Bank Ais Accounting Information System Test Bank - Compress MCQ Lang
No ratings yet
Test Bank Ais Accounting Information System Test Bank - Compress MCQ Lang
15 pages
Weka 3.6 Tutorial: (Waikato Environment For Knowledge Analysis)
No ratings yet
Weka 3.6 Tutorial: (Waikato Environment For Knowledge Analysis)
12 pages
Frames in AI
No ratings yet
Frames in AI
14 pages
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
No ratings yet
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
14 pages
Performance Evaluation of Machine Learning
No ratings yet
Performance Evaluation of Machine Learning
5 pages
SmallSEOTools Plagiarism Report 5
No ratings yet
SmallSEOTools Plagiarism Report 5
3 pages
Instant Access to The Routledge Social Science Handbook of AI Routledge International Handbooks 1st Edition Anthony Elliott (Editor) ebook Full Chapters
100% (1)
Instant Access to The Routledge Social Science Handbook of AI Routledge International Handbooks 1st Edition Anthony Elliott (Editor) ebook Full Chapters
40 pages
The Role of Artificial Intelligence in Enhancing Cybersecurity and Internal Audit
No ratings yet
The Role of Artificial Intelligence in Enhancing Cybersecurity and Internal Audit
7 pages
R18B Tech CSESyllabus
No ratings yet
R18B Tech CSESyllabus
1 page
Week 10 STS Asynchronous Module 8
No ratings yet
Week 10 STS Asynchronous Module 8
6 pages
AIML20
No ratings yet
AIML20
8 pages
ĐỀ 29
No ratings yet
ĐỀ 29
6 pages
Vasanth Murukuri-Resume
No ratings yet
Vasanth Murukuri-Resume
1 page
Recipe Generator Using Deep Learning
No ratings yet
Recipe Generator Using Deep Learning
8 pages
Performance Evaluation of Different Supervised Learning Algorithms For Mobile Price Classification
No ratings yet
Performance Evaluation of Different Supervised Learning Algorithms For Mobile Price Classification
10 pages
r05320505 Neural Networks
100% (2)
r05320505 Neural Networks
5 pages
Awh 23163strategic Redo
No ratings yet
Awh 23163strategic Redo
23 pages
CSC3206 Artificial Intelligence - Lecture 1
No ratings yet
CSC3206 Artificial Intelligence - Lecture 1
48 pages
Facticity - AI by AI Seer Named To TIME's List of The Best Inventions of 2024
No ratings yet
Facticity - AI by AI Seer Named To TIME's List of The Best Inventions of 2024
5 pages
Birla Institute of Technology, Mesra, Ranchi - 835215 (India) Submit Query
No ratings yet
Birla Institute of Technology, Mesra, Ranchi - 835215 (India) Submit Query
12 pages
The 60-Year Curriculum
No ratings yet
The 60-Year Curriculum
8 pages

Deep Nets

Uploaded by

Deep Nets

Uploaded by

CSCI 5922

Neural Networks and Deep Learning

Word output layer

He, Zhang, Ren, and Sun (2015)

Exploding gradient problem

 with linear or ReLU units

Use gradient clipping

Use gradient sign 𝝏𝑬

 avoids sigmoid saturation for vanishing gradients

 parameters and trained via gradient descent

 We’ll spend a whole class on probabilistic models like RBMs

 can have an arbitrary number of layers

He, Zhang, Ren, and Sun (2015)

He, Zhang, Ren, and Sun (2015)

He, Zhang, Ren, and Sun (2015)

He, Zhang, Ren, and Sun (2016)

 Coefficient is a function of the input:

Srivastava, Greff, & Schmidhuber (2015)

Srivastava, Greff, & Schmidhuber (2015)

You might also like