0% found this document useful (0 votes)
56 views

Introduction To Deep Learning: TA: Drew Hudson May 8, 2020

This document provides an introduction to deep learning, covering convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It discusses how CNNs use filters and parameter sharing to extract features from images while providing translational invariance. RNNs are introduced as a way to model sequential data using recurrent cells that maintain internal states. Popular optimization techniques for training deep networks like dropout and residual connections are also summarized. The document concludes by mentioning some popular deep learning tools and resources for learning more about this field.

Uploaded by

vip_thb_2007
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Introduction To Deep Learning: TA: Drew Hudson May 8, 2020

This document provides an introduction to deep learning, covering convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It discusses how CNNs use filters and parameter sharing to extract features from images while providing translational invariance. RNNs are introduced as a way to model sequential data using recurrent cells that maintain internal states. Popular optimization techniques for training deep networks like dropout and residual connections are also summarized. The document concludes by mentioning some popular deep learning tools and resources for learning more about this field.

Uploaded by

vip_thb_2007
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Deep Learning

TA: Drew Hudson


May 8, 2020

Slides credits: Atharva Parulekar, Jingbo Yang, Drew Hudson, Guanzhi Wang
Overview
● Motivation for deep learning
● Convolutional neural networks
● Recurrent neural networks
● Deep learning tools
But we learned multi-layer perceptron in class?
Expensive to learn. Will not generalize well

Does not exploit the order and local relations in the data!

64x64x3=12288 parameters
We also want many layers
What are areas of deep learning?

Convolutional NN Recurrent NN
Image Time Series

Graph NN
Networks/Relational
Deep RL
Control System
What are areas of deep learning?

Convolutional
Neural Network

Recurrent NN Deep RL Graph NN


Let us look at images in detail
Filters
Why not extract features using filters?

Better, why not let the data dictate


what filters to use?

Learnable filters!!
Convolution on multiple channels
Images are generally RGB !!

How would a filter work on a image


with RGB channels?

The filter should also have 3


channels.

Now the output has a channel for


every filter we have used.
Parameter sharing

Lesser the parameters less computationally intensive the training. This is a


win win as we are reusing parameters.
Translational invariance
Since we are training filters to
detect cats and the moving
these filters over the data, a
differently positioned cat will
also get detected by the same
set of filters.
Filters? Layers of filters?

Images that maximize filter outputs at certain How deeper layers can learn deeper
layers. We observe that the images get more embeddings. How an eye is made up of multiple
complex as filters are situated deeper curves and a face is made up of two eyes.
How do we use convolutions?

Let convolutions extract features and let normal cnn’s decide on them.
Image credit: LeCun et al. (1998)
Convolution really is just a linear operation
In fact convolution is a giant matrix
multiplication.

We can expand the 2 dimensional


image into a vector and the conv
operation into a matrix.
Nonlinearities/Activations
● hidden
For   layers, often
● ReLU:
● Hyperbolic tangent:

For output layers, often


● Linear (identity):
● Sigmoid:
● Softmax: (normalize the logits into a discrete probability distribution)
How do we learn?
Instead of

They are “optimizers”

● Momentum: Gradient + Momentum


● Nestrov: Momentum + Gradients
● Adagrad: Normalize with sum of sq
● RMSprop: Normalize with moving
avg of sum of squares
● ADAM: RMsprop + momentum
● https://ruder.io/optimizing-gradient-d
escent/
Mini-batch Gradient Descent
Expensive to compute gradient for large dataset

Memory size

Compute time

Mini-batch: takes a sample of training data

How to we sample intelligently?


Is deeper better?
Deeper networks seem to be
more powerful but harder to train.

● Loss of information during


forward propagation
● Loss of gradient info during
back propagation

There are many ways to “keep


the gradient going”
Solution
Connect the layers, create a gradient highway or information

highway.

ResNet (2015)
Image credit: He et al. (2015)
Initialization
Can we initialize all neurons to zero? Relu units once knocked out and their
output is zero, their gradient flow also
If all the weights are same we will not becomes zero.
be able to break symmetry of the
network and all filters will end up We need small random numbers at
learning the same thing. initialization.

Large numbers, might knock relu units Variance : 1/sqrt(n)


out. Mean: 0
Popular initialization setups

(Xavier, Kaiming) (Uniform, Normal)


Dropout
What does cutting off some network
connections do?

Trains multiple smaller networks in an


ensemble.

Can drop entire layer too!

Acts like a really good regularizer


Tricks for training
Data augmentation if your data set is
smaller. This helps the network
generalize more.

Early stopping if training loss goes


above validation loss.

Random hyperparameter search or grid


search?
CNN sounds like fun!
What are some other areas of deep learning?

Recurrent NN
Time Series

Convolutional NN Deep RL Graph NN


We can also have 1D architectures (remember this)
CNN works on any data where there is a
local pattern

We use 1D convolutions on DNA


sequences, text sequences and music
notes

But what if time series has causal


dependency or any kind of sequential
dependency?
To address sequential dependency?
Use recurrent neural network (RNN)
Latent Output
Unrolling an RNN
Previous output

One time step


RNN Cell

They are really the same cell,


NOT many different cells like kernels of CNN
How does RNN produce result?
Evolving “embedding”

Result after reading


full sentence

I love CS !
There are 2 types of RNN cells
Store in “long term memory” Response to current input Reset gate Update gate

Response to
current input

Long Short Term Memory (LSTM) Gated Recurrent Unit (GRU)


Recurrent AND deep?
Taking last value
Pay “attention” to
everything

Stacking Attention Model


“Recurrent” AND convolutional?

Temporal convolutional network

Temporal dependency achieved through “one-


sided” convolution

More efficient because deep learning


packages are optimized for matrix
multiplication = convolution

No hard dependency
More? Take CS230, CS236, CS231N, CS224N

Convolutional NN Recurrent NN
Image Time Series

Graph NN
Networks/Relational
Deep RL
Control System
Not today, but take CS234 and CS224W

Convolutional NN Recurrent NN
Image Time Series

Graph NN
Networks/Relational
Deep RL
Control System
Tools for deep learning Specialized
Groups

Popular Tools
$50 not enough! Where can I get free stuff?
Google Colab
Azure Notebook
Free (limited-ish) GPU access
Kaggle kernel???
Works nicely with Tensorflow
Amazon SageMaker?
Links to Google Drive

Register a new Google Cloud account To SAVE money

=> Instant $300??


CLOSE your GPU instance
=> AWS free tier (limited compute)

=> Azure education account, $200? ~$1 an hour


Good luck!
Well, have fun too :D

You might also like