0% found this document useful (0 votes)

109 views33 pages

Introduction To Deep Learning: TA: Drew Hudson May 8, 2020

This document provides an introduction to deep learning, covering convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It discusses how CNNs use filters and parameter sharing to extract features from images while providing translational invariance. RNNs are introduced as a way to model sequential data using recurrent cells that maintain internal states. Popular optimization techniques for training deep networks like dropout and residual connections are also summarized. The document concludes by mentioning some popular deep learning tools and resources for learning more about this field.

Uploaded by

vip_thb_2007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views33 pages

Introduction To Deep Learning: TA: Drew Hudson May 8, 2020

Uploaded by

vip_thb_2007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to Deep Learning

TA: Drew Hudson

May 8, 2020

Slides credits: Atharva Parulekar, Jingbo Yang, Drew Hudson, Guanzhi Wang
Overview
● Motivation for deep learning
● Convolutional neural networks
● Recurrent neural networks
● Deep learning tools
But we learned multi-layer perceptron in class?
Expensive to learn. Will not generalize well

Does not exploit the order and local relations in the data!

64x64x3=12288 parameters
We also want many layers
What are areas of deep learning?

Convolutional NN Recurrent NN
Image Time Series

Graph NN
Networks/Relational
Deep RL
Control System
What are areas of deep learning?

Convolutional
Neural Network

Recurrent NN Deep RL Graph NN

Let us look at images in detail
Filters
Why not extract features using filters?

Better, why not let the data dictate

what filters to use?

Learnable filters!!
Convolution on multiple channels
Images are generally RGB !!

How would a filter work on a image

with RGB channels?

The filter should also have 3

channels.

Now the output has a channel for

every filter we have used.
Parameter sharing

Lesser the parameters less computationally intensive the training. This is a

win win as we are reusing parameters.
Translational invariance
Since we are training filters to
detect cats and the moving
these filters over the data, a
differently positioned cat will
also get detected by the same
set of filters.
Filters? Layers of filters?

Images that maximize filter outputs at certain How deeper layers can learn deeper
layers. We observe that the images get more embeddings. How an eye is made up of multiple
complex as filters are situated deeper curves and a face is made up of two eyes.
How do we use convolutions?

Let convolutions extract features and let normal cnn’s decide on them.
Image credit: LeCun et al. (1998)
Convolution really is just a linear operation
In fact convolution is a giant matrix
multiplication.

We can expand the 2 dimensional

image into a vector and the conv
operation into a matrix.
Nonlinearities/Activations
● hidden
For layers, often
● ReLU:
● Hyperbolic tangent:

For output layers, often

● Linear (identity):
● Sigmoid:
● Softmax: (normalize the logits into a discrete probability distribution)
How do we learn?
Instead of

They are “optimizers”

● Momentum: Gradient + Momentum

● Nestrov: Momentum + Gradients
● Adagrad: Normalize with sum of sq
● RMSprop: Normalize with moving
avg of sum of squares
● ADAM: RMsprop + momentum
● https://ruder.io/optimizing-gradient-d
escent/
Mini-batch Gradient Descent
Expensive to compute gradient for large dataset

Memory size

Compute time

Mini-batch: takes a sample of training data

How to we sample intelligently?

Is deeper better?
Deeper networks seem to be
more powerful but harder to train.

● Loss of information during

forward propagation
● Loss of gradient info during
back propagation

There are many ways to “keep

the gradient going”
Solution
Connect the layers, create a gradient highway or information

highway.

ResNet (2015)
Image credit: He et al. (2015)
Initialization
Can we initialize all neurons to zero? Relu units once knocked out and their
output is zero, their gradient flow also
If all the weights are same we will not becomes zero.
be able to break symmetry of the
network and all filters will end up We need small random numbers at
learning the same thing. initialization.

Large numbers, might knock relu units Variance : 1/sqrt(n)

out. Mean: 0
Popular initialization setups

(Xavier, Kaiming) (Uniform, Normal)

Dropout
What does cutting off some network
connections do?

Trains multiple smaller networks in an

ensemble.

Can drop entire layer too!

Acts like a really good regularizer

Tricks for training
Data augmentation if your data set is
smaller. This helps the network
generalize more.

Early stopping if training loss goes

above validation loss.

Random hyperparameter search or grid

search?
CNN sounds like fun!
What are some other areas of deep learning?

Recurrent NN
Time Series

Convolutional NN Deep RL Graph NN

We can also have 1D architectures (remember this)
CNN works on any data where there is a
local pattern

We use 1D convolutions on DNA

sequences, text sequences and music
notes

But what if time series has causal

dependency or any kind of sequential
dependency?
To address sequential dependency?
Use recurrent neural network (RNN)
Latent Output
Unrolling an RNN
Previous output

One time step

RNN Cell

They are really the same cell,

NOT many different cells like kernels of CNN
How does RNN produce result?
Evolving “embedding”

Result after reading

full sentence

I love CS !
There are 2 types of RNN cells
Store in “long term memory” Response to current input Reset gate Update gate

Response to
current input

Long Short Term Memory (LSTM) Gated Recurrent Unit (GRU)

Recurrent AND deep?
Taking last value
Pay “attention” to
everything

Stacking Attention Model

“Recurrent” AND convolutional?

Temporal convolutional network

Temporal dependency achieved through “one-

sided” convolution

More efficient because deep learning

packages are optimized for matrix
multiplication = convolution

No hard dependency
More? Take CS230, CS236, CS231N, CS224N

Convolutional NN Recurrent NN
Image Time Series

Graph NN
Networks/Relational
Deep RL
Control System
Not today, but take CS234 and CS224W

Convolutional NN Recurrent NN
Image Time Series

Graph NN
Networks/Relational
Deep RL
Control System
Tools for deep learning Specialized
Groups

Popular Tools
$50 not enough! Where can I get free stuff?
Google Colab
Azure Notebook
Free (limited-ish) GPU access
Kaggle kernel???
Works nicely with Tensorflow
Amazon SageMaker?
Links to Google Drive

Register a new Google Cloud account To SAVE money

=> Instant $300??

CLOSE your GPU instance
=> AWS free tier (limited compute)

=> Azure education account, $200? ~$1 an hour

Good luck!
Well, have fun too :D

4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
Introduction To Deep Learning 17th January 2025
No ratings yet
Introduction To Deep Learning 17th January 2025
60 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
19 Deep Learning
100% (1)
19 Deep Learning
49 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
8 Deep Learning CNN
No ratings yet
8 Deep Learning CNN
63 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
Interpersonal Reactivity Index (IRI) PDF
No ratings yet
Interpersonal Reactivity Index (IRI) PDF
6 pages
Parent Teacher Meeting PDF
No ratings yet
Parent Teacher Meeting PDF
5 pages
Long Term Articulation Goal
100% (1)
Long Term Articulation Goal
2 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
DLA Unit 4
No ratings yet
DLA Unit 4
38 pages
Unit 4 (B) NGP
No ratings yet
Unit 4 (B) NGP
127 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
64 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
95 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
Lec 2
No ratings yet
Lec 2
42 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Unit 4
No ratings yet
Unit 4
27 pages
7 CNN
No ratings yet
7 CNN
66 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
40 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
Ai 4 All
No ratings yet
Ai 4 All
18 pages
Introduction To Deep Neural Networks - DataCamp
No ratings yet
Introduction To Deep Neural Networks - DataCamp
10 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
DL-19-CNN Sequential Model 210223
No ratings yet
DL-19-CNN Sequential Model 210223
18 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
L10-DL Intro
No ratings yet
L10-DL Intro
25 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
49 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Deep Learning Report For Students
No ratings yet
Deep Learning Report For Students
32 pages
Deep Learning 1737909076
No ratings yet
Deep Learning 1737909076
29 pages
20-Modified Volume-Price Trend Indicator
100% (1)
20-Modified Volume-Price Trend Indicator
7 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
Intro to CNNs for Tech Enthusiasts
No ratings yet
Intro to CNNs for Tech Enthusiasts
31 pages
Introduction To Convolutional Neural Networks1-Unit3
No ratings yet
Introduction To Convolutional Neural Networks1-Unit3
10 pages
Present Simple, Past Simple and Future Simple
100% (1)
Present Simple, Past Simple and Future Simple
9 pages
Is Google Making Us Stupid
100% (1)
Is Google Making Us Stupid
2 pages
Four Unit
No ratings yet
Four Unit
3 pages
Introduction To Deep Learning: by Gargee Sanyal
No ratings yet
Introduction To Deep Learning: by Gargee Sanyal
20 pages
Deep Learning Day 27
No ratings yet
Deep Learning Day 27
43 pages
Marzano 9 High-Yield Strategies
No ratings yet
Marzano 9 High-Yield Strategies
5 pages
Deep Neural Networks Explained
No ratings yet
Deep Neural Networks Explained
12 pages
AI 101: Comprehensive Guide to Deep Learning
No ratings yet
AI 101: Comprehensive Guide to Deep Learning
13 pages
PROF ED Assessment and Evaluation of Learning I
100% (1)
PROF ED Assessment and Evaluation of Learning I
5 pages
Group I
No ratings yet
Group I
20 pages
Module .The Teaching Prof
No ratings yet
Module .The Teaching Prof
6 pages
DL Unit 1
No ratings yet
DL Unit 1
199 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
31 pages
Theory Building in Management
No ratings yet
Theory Building in Management
18 pages
CFA Questions and Solutions
100% (1)
CFA Questions and Solutions
16 pages
Theology & Neuroscience: A New Image
100% (1)
Theology & Neuroscience: A New Image
5 pages
Principles of Public and Customer Relations
100% (1)
Principles of Public and Customer Relations
3 pages
Lifespan Development Insights
No ratings yet
Lifespan Development Insights
15 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Grade 3 Antonyms Lesson Plan
100% (2)
Grade 3 Antonyms Lesson Plan
3 pages
Conflict Management
100% (1)
Conflict Management
28 pages
10 Class Notes On Creativity and Innovation
No ratings yet
10 Class Notes On Creativity and Innovation
3 pages
Credit Rating Agencies
100% (1)
Credit Rating Agencies
8 pages
Strategic Management Critique
No ratings yet
Strategic Management Critique
26 pages
How Will Your Account Grow Based On
No ratings yet
How Will Your Account Grow Based On
8 pages
Balance: Insert Your Numbers Into Yellow Bordered Cells. Don't Change Any Other Numbers
No ratings yet
Balance: Insert Your Numbers Into Yellow Bordered Cells. Don't Change Any Other Numbers
3 pages
Unit 1: Introduction To Short Vowels
No ratings yet
Unit 1: Introduction To Short Vowels
8 pages
Tracing Inter-Textuality in The Language of Pakistani Advertisements
No ratings yet
Tracing Inter-Textuality in The Language of Pakistani Advertisements
26 pages
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
16 pages
IBM Article On AI
No ratings yet
IBM Article On AI
10 pages
Dosi
No ratings yet
Dosi
11 pages
Cae Writing 2015 All Tasks
82% (11)
Cae Writing 2015 All Tasks
21 pages
Reconstructive Memory Note Taking
No ratings yet
Reconstructive Memory Note Taking
6 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
Lagrange Multipliers: D D N×D N 1
No ratings yet
Lagrange Multipliers: D D N×D N 1
3 pages
Performance Task On Literature
No ratings yet
Performance Task On Literature
3 pages
OOMD Exam Prep for CS Students
0% (1)
OOMD Exam Prep for CS Students
5 pages
Grab Receipt IOS-7151776-4-014
No ratings yet
Grab Receipt IOS-7151776-4-014
1 page
Lesson 11 - Passage 3
No ratings yet
Lesson 11 - Passage 3
4 pages
FP Grade 3 English HL LP 28 - 30 April
No ratings yet
FP Grade 3 English HL LP 28 - 30 April
3 pages
The Bell Syllabus
No ratings yet
The Bell Syllabus
9 pages
C) Increased, Due To A Smaller Spread Between Required Return and Growth
No ratings yet
C) Increased, Due To A Smaller Spread Between Required Return and Growth
2 pages
Concept Map
No ratings yet
Concept Map
3 pages
Cópia de Four-Corners-Level1-Unit9-Teachers-Edition-Language-Summary
No ratings yet
Cópia de Four-Corners-Level1-Unit9-Teachers-Edition-Language-Summary
1 page
English Workshop for 11th Grade
0% (1)
English Workshop for 11th Grade
3 pages
Hoi Phieu - Giai Bai Tap
No ratings yet
Hoi Phieu - Giai Bai Tap
6 pages

Introduction To Deep Learning: TA: Drew Hudson May 8, 2020

Uploaded by

Introduction To Deep Learning: TA: Drew Hudson May 8, 2020

Uploaded by

Introduction to Deep Learning

TA: Drew Hudson

Recurrent NN Deep RL Graph NN

Better, why not let the data dictate

How would a filter work on a image

The filter should also have 3

Now the output has a channel for

Lesser the parameters less computationally intensive the training. This is a

We can expand the 2 dimensional

For output layers, often

They are “optimizers”

● Momentum: Gradient + Momentum

Mini-batch: takes a sample of training data

How to we sample intelligently?

● Loss of information during

There are many ways to “keep

Large numbers, might knock relu units Variance : 1/sqrt(n)

(Xavier, Kaiming) (Uniform, Normal)

Trains multiple smaller networks in an

Can drop entire layer too!

Acts like a really good regularizer

Early stopping if training loss goes

Random hyperparameter search or grid

Convolutional NN Deep RL Graph NN

We use 1D convolutions on DNA

But what if time series has causal

One time step

They are really the same cell,

Result after reading

Long Short Term Memory (LSTM) Gated Recurrent Unit (GRU)

Stacking Attention Model

Temporal convolutional network

Temporal dependency achieved through “one-

More efficient because deep learning

Register a new Google Cloud account To SAVE money

=> Instant $300??

=> Azure education account, $200? ~$1 an hour

You might also like