ECSE484 Intro v2
ECSE484 Intro v2
Deep Learning
Prof. Wyatt Newman, ECSE Dept.
Case Western Reserve University
Nvidia DIGITS
Neuroscience background: neuron cells
Aima UCB
Spiking (typical):
Cat is paralyzed
Skull opened; electrodes on cells in visual cortex
Electrode signals are recorded in response to visual stimuli
Reveals repeated pattern of interconnections
Templated pattern is repeated at higher levels of
abstraction
Circle-surround
neuronal
connectivity
architecture in
visual cortex:
simple cells
Complex cells in visual cortex: connectivity patterns
templated for specific responses. Convolutional.
Modelling neurons: the McCulloch-Pitts Model
Aima UCB
History: the Perceptron
• Single-layer network
• Weights on inputs
• Weighted inputs summed
• Summed inputs as
argument to “squashing”
function
• Output as a discriminator
• Geometric interpretation:
• Weights are
components of surface
normal of a separating
plane
• Bias corresponds to
planar offset
Perceptron illustration: two inputs,
classify status as “AND” operation
Modeling Logic Functions
AND 1 W0=-1.5
y 1 W0= 0.5
x1 W1=1 y
x1 W1= -1
x2 W2=1
OR
NOT
1 W0=-0.5
y
x1 W1=1
x2 W2=1
Training the Perceptron: adjust weights
iteratively until input/output relations are
consistent with the training data
Marvin Minksy and the death of (single-layer)
perceptrons:
• Limited to linearly separable problems
• XOR problem is a simple example of (single-layer) perceptron limitations
Multilayer networks:
Hyperbolic tangent
Similar to sigmoidal function
S(t) = [2 / (1 + e^-2t)] - 1, t = current time step
Similar ease of derivative computation to sigmoid
Rectified Linear Unit
f(x) = 0 for x < 0; f(x) = x for x >= 0
cs231n Stanford
Multi-layer Perceptron, or (single-
hidden-layer) Feedforward Network
Wikipedia.org
XOR Function (Multi-layer perceptron)
Neurons 2: 8:
Aima UCB
Perceptron Learning
Aima UCB
Usage Examples
Pattern Classification: Function approximation:
Aima UCB
The art of neural-network design
A key to successful neural-net training is properly pre-processing the input data
Scale data to range [-1, 1] or [0, 1], or some sensible range
Crop, rotate, resize, filter images (normalize)
Divide data up into training, testing, and validation data
Create multiple unique and erroneous copies of normalized images to provide to
NN for training
(Optional) Principal Component Analysis or autocorrelation-based feature filters
Convert possibly correlated variables into linearly un-correlated variables, resulting in
the set of principal components of the data vector
Be lucky—choose good feature filters, number of neurons, connection topology, …
Hope for good training convergence
Can “luck” be automated??
Neural Network Architectures
Organized Layer-Wise
Fully-connected layers are typical in most neural nets
In fully-connected layers, each neuron in layer i is connected to each neuron in layer i+1
i.e. 1 and 2:
More hidden layers are harder to train…is it worth it? cs231n Stanford
Biologically-Inspired Web of Neurons
Why Now?
high-performance computing
GPU’s
architecture successes
Some Neural-Net Design Tricks:
autocorrelation nets
Creating a successful neural network requires
constraints to be met. A symmetrical NN can provide a
unique answer to this problem as on the right
NN Constraints:
Generalize the input enough to starve all but the
strongest neurons in order to find what represents all of
the data in an efficient manner
Recreate input to test neural network hypothesis
Grow data from starved input neurons
Generate feedback to the neural network system using
backpropagation
Repeat until the error is below a certain threshold or a
certain amount of training time has elapsed
Auto-encoder
Goal: find the derivative of the error with respect to the weight connecting unit i to unit j
Reason: propagate error slope back upstream to correct neural network weights until near 0
value met
Yaldex Game Development
Backpropagation Error Derivation
Derivative of the error w.r.t. The gradient of unit j's output with respect to
to the weight connecting the net sum defined by activation function
unit i to unit j
cs231n Stanford
Convolutional Layer
An image convolution of the previous layer,
where the weights specify the convolution
filter
Consists of rectangular grid of neurons
Each neuron takes input from rectangular
section of previous layer
Deep Learning Review,
Weights for that rectangular section are the Nature
same for all neurons in that conv. layer
Pooling
The purpose of the pooling layer in a ConvNet is to perform a downsampling
operation
These layers typically follow the convolutional layers in a ConvNet
Given spatial dimensions, height and width, the pooling layer downsamples
data from a larger space into the space height x width x depth
Several ways to do this include:
Max-Pooling: Take the maximum of the values pooled per each block
Linear Pooling: a learned linear combination of the neurons in the block
Overall, the algorithm does the following:
Takes small rectangular blocks from the convolutional layer
Subsamples them to produce a single output from each block
Yann LeCun is a large contributor to deep learning. This is his custom NN structure
for handwritten character recognition:
The following are filters learned by Krizhevsky et al. Each of the 96 filters are
of size 11x11x3, and each one is shared by the 55^2 neurons of one depth slice
Levels abstraction of features
Dropout
A simple way to prevent NNs from overfitting data
Used in deep NNs as an approximate Bayesian inference in deep Gaussian
processes
The MNIST data set consists of 28x28 pixel handwritten digit images
The task is to classify the images into 10 digit classes.
The following table compares the performance of dropout on mnist with other
techniques like support vector machine
http://nikhilbuduma.com,
Dive into Deep RNNs
RNN with LSTM
http://nikhilbuduma.com,
Dive into Deep RNNs
Regularization with Dropout
Dropout in Long-Short-Term
Memory (LSTM)
Do not want to over-fit and
memorize exact sequences
Rather, want to generalize
input using dropout to store
most important elements of
data
visualstudiomagazine.com, NN Train-
Validate-Test
Stochastic Gradient Descent (SGD)
Stepping down the slope from near global maxima to global minima
Minimum is the solution
http://scs.ryerson.ca/~aharley/neural-
networks/
Training Visualization: cont’d
http://scs.ryerson.ca/~aharley/neural-
networks/
Training Error over Time
Deep NNs take sufficient time to train, i.e. hundreds to millions of iterations
or until time cap is met (usually several days)
http://scs.ryerson.ca/~aharley/neural-
networks/
Advancing Deep Learning
Mark Zuckerberg claims that the best way to grow deep learning is at the
university level, taking consideration of individual and group research projects to
combine ambitious ideas with professional and well-funded mutual goals of both
Jide-salu.com
companies and researchers
Top Research Contributors in the field include the following:
Geoffrey Hinton, Google, University of Toronto
Known for Boltzmann machines, backpropagation, dropout, deep belief nets, etc.
Yann LeCun, Facebook Research, NYU Center for Data Science, ATT Research
Known for research with Hinton, drafting initial backpropagation paper, convolutional
networks, graph transformer networks, regularization methods, etc.
Andrew Ng, Baidu, Stanford
Known for co-founding Coursera, furthering deep learning education
Cs.tau.ac.il
Summary
Neural networks:
are modeled from the brain
can be simple or complicated
are limited by computational resources
In the past 20 years, neural networks have come a long way
Increased hardware capacity
Increased machine learning research
Increased interest in data analysis
Deep neural networks created the term “deep learning”
Many companies currently employ deep learning to solve many diverse problems
The capacity and applicability of machine learning is foreshadowed by deep learning