0% found this document useful (0 votes)
179 views

Lesson 6: Practical Deep Learning For Coders (V2)

This document discusses recurrent neural networks (RNNs) and their ability to process variable length sequence data with long-term dependencies. It provides examples of using RNNs for tasks like character-level language modeling where each next character is predicted based on previous characters. The key aspects are RNNs' stateful memory and ability to represent sequences as they can retain information about previous inputs/outputs through their hidden state and hidden-to-hidden connections. Diagrams demonstrate how RNNs can be stacked and trained on full sequences using backpropagation through time.

Uploaded by

John Curtis
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views

Lesson 6: Practical Deep Learning For Coders (V2)

This document discusses recurrent neural networks (RNNs) and their ability to process variable length sequence data with long-term dependencies. It provides examples of using RNNs for tasks like character-level language modeling where each next character is predicted based on previous characters. The key aspects are RNNs' stateful memory and ability to represent sequences as they can retain information about previous inputs/outputs through their hidden state and hidden-to-hidden connections. Diagrams demonstrate how RNNs can be stacked and trained on full sequences using backpropagation through time.

Uploaded by

John Curtis
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Lesson 6

PRACTICAL DEEP LEARNING FOR CODERS (V2)


Why we need RNNs

“I went to Nepal in 2009” Variable length Long-term


“In 2009, I went to Nepal” sequence dependency

Stateful
Memory
Representation
\begin{proof} We may assume that $\mathcal{I}$ is an abelian sheaf on
$\mathcal{C}$. \item Given a morphism $\Delta : \mathcal{F} \to \mathcal{I}$
is an injective and let $\mathfrak q$ be an abelian sheaf on $X$. Let
$\mathcal{F}$ be a fibered complex. Let $\mathcal{F}$ be a category.
Basic NN with single hidden layer

Output: batch_size * #classes

Matrix product; softmax

Hidden: batch_size * # activations

Matrix product; relu

Input: batch_size * #inputs


Image CNN with single dense hidden layer NB: batch_size dimension and activation
function not shown here or in following slides

Output: #classes

Matrix product

FC1: # activations

(Flatten); matrix product

Output
Conv1: # filters * (h/2) * (w/2)
Hidden
Convolution stride 2
Input
Input: #channel * h * w
Predicting char 3 using chars 1 & 2 NB: layer operations not shown;
remember that arrows represent layer operations

char 3 output: vocab size

FC2: # activations

Output
char 2 input FC1: # activations
Hidden

Input
char 1 input: vocab size
Predicting char 4 using chars 1, 2 & 3

InputHidden
char 4 output: vocab size
HiddenOutput
HiddenHidden

FC3: # activations

char 3 input FC2: # activations

char 2 input FC1: # activations

char 1 input: vocab size


Predicting char n using chars 1 to n-1 NB: no hidden/output labels shown

InputHidden
HiddenOutput
HiddenHidden

char n input Repeat for 2n-1


Output

Hidden

Input
char 1 input
Predicting chars 2 to n using chars 1 to n-1

InputHidden
HiddenOutput
HiddenHidden

char n input Repeat for 1n-1


Output

Hidden
Initialize to zeros
Input
Predicting chars 2 to n using chars 1 to n-1 using stacked RNNs

Repeat for 1n-1

char n input Repeat for 1n-1


Initialize to zeros

Initialize to zeros
Unrolled stacked RNNs for sequences

char 3 input

char 2 input

char 1 input
Backprop

InputHidden
HiddenOutput
HiddenHidden

Loss

char 3 input

char 2 input

You might also like