0% found this document useful (0 votes)
60 views

ECSE484 Intro v2

Deep neural networks are modeled after the human brain and consist of multiple hidden layers with many neurons and connections. Earlier layers can potentially learn useful feature filters from the input data. Deep learning involves stacking deep multi-layer neural networks to perform tasks like image classification with high accuracy. The success of deep learning is partly due to recent breakthroughs using deep neural networks on large datasets like ImageNet.

Uploaded by

Zaid Sulaiman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

ECSE484 Intro v2

Deep neural networks are modeled after the human brain and consist of multiple hidden layers with many neurons and connections. Earlier layers can potentially learn useful feature filters from the input data. Deep learning involves stacking deep multi-layer neural networks to perform tasks like image classification with high accuracy. The success of deep learning is partly due to recent breakthroughs using deep neural networks on large datasets like ImageNet.

Uploaded by

Zaid Sulaiman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

The Neural Basis of

Deep Learning
Prof. Wyatt Newman, ECSE Dept.
Case Western Reserve University

Nvidia DIGITS
Neuroscience background: neuron cells

Creating Mind; John E. Dowling


Example neuron:
Types of neurons:
(many!)
More neural taxonomy…
Synapses:
Cellular communications
via neurotransmitters:

• action potentials lead to release of


neurotransmitters,
• Neurotransmitters induce response of ion channels
in post-synaptic cleft.
• Information flow is unidirectional.
• Connectivity and strength of connectivity (density
of ion channels) constitutes the brain’s program.
SEM image of a synapse and vesicles:

From Kandel,Schwartz, Jessel


Principles of Neural Science
Interneurons:
Human Brain
 10^11 neurons, 10^14 synapses, 1ms to 10ms cycle time
 Signals between neurons are noisy electric potential
spikes

Aima UCB
Spiking (typical):

• increased stimulus of a neuron results in faster voltage


spikes.
• Voltage spikes can be transmitted long distance (like
voltage-to-frequency converter).
• Faster spiking results in more neurotransmitter release
(like frequency-to-voltage converter).
• Dynamics can be abstracted as a nonlinear but
monotonic input/output behavior.
Horseshoe Crab Vision:

• Crude ommatidia as light receptors.


• Specific, templated interconnectivity architecture.
• Both excitation and inhibition important.
Templated structure of
competitive neurons:

• neighboring cells have mutual inhibition (tending to


silence spiking from neighbors).
• Results in equivalent bright/dim edge detector.
• Imposed topology of interconnections is a repeated
(templated) pattern
Hubel and Wiesel:
simple, complex and hypercomplex cells in
cat’s visual cortex (1981 Nobel Prize)
 https://www.youtube.com/watch?v=jw6nBWo21Zk

 Cat is paralyzed
 Skull opened; electrodes on cells in visual cortex
 Electrode signals are recorded in response to visual stimuli
 Reveals repeated pattern of interconnections
 Templated pattern is repeated at higher levels of
abstraction
Circle-surround
neuronal
connectivity
architecture in
visual cortex:
simple cells
Complex cells in visual cortex: connectivity patterns
templated for specific responses. Convolutional.
Modelling neurons: the McCulloch-Pitts Model

 Simplified model of real neuron unit


 Produces the output of a “squashing” linear function g(in)

Aima UCB
History: the Perceptron
• Single-layer network
• Weights on inputs
• Weighted inputs summed
• Summed inputs as
argument to “squashing”
function
• Output as a discriminator
• Geometric interpretation:
• Weights are
components of surface
normal of a separating
plane
• Bias corresponds to
planar offset
Perceptron illustration: two inputs,
classify status as “AND” operation
Modeling Logic Functions

 AND 1 W0=-1.5
y 1 W0= 0.5
x1 W1=1 y
x1 W1= -1
x2 W2=1
 OR
NOT

1 W0=-0.5
y
x1 W1=1

x2 W2=1
Training the Perceptron: adjust weights
iteratively until input/output relations are
consistent with the training data
Marvin Minksy and the death of (single-layer)
perceptrons:
• Limited to linearly separable problems
• XOR problem is a simple example of (single-layer) perceptron limitations
Multilayer networks:

• Overcome the limitations of single-layer


networks
• Only helps if activation functions are
nonlinear (mathematical proof)
Activation Functions: g(in)
 Step
 Starts at -1, at a certain point, immediately
increases to 1
 Linear
 G(x) = ax + b
 Steadily increases from negative values to positive
values
 Threshold
 Piecewise function
 Lower threshold at -1
 Linearly increases from -1 to 1
 Upper threshold at 1
 Log-Sigmoid (Softmax)
 S(t) = 1 / (1 + e^-bt), t = current time step, b =
slope parameter beta
 Derivative is computationally-easy to perform
 S-shaped function curve
Yaldex Game Development
Activation Functions: cont’d

 Hyperbolic tangent
 Similar to sigmoidal function
 S(t) = [2 / (1 + e^-2t)] - 1, t = current time step
 Similar ease of derivative computation to sigmoid
 Rectified Linear Unit
 f(x) = 0 for x < 0; f(x) = x for x >= 0

cs231n Stanford
Multi-layer Perceptron, or (single-
hidden-layer) Feedforward Network

Wikipedia.org
XOR Function (Multi-layer perceptron)

Beyond Linear Separability:


slideplayer.com
XOR Experiments (Matlab)

 Neurons 2: 8:

 Output Purelin: Tansig:

EECS 484 PS5,


S. Howard
Neural Network with Single Hidden Layer:
A Universal Approximator (1989 proof)

 In the mathematical theory of artificial neural networks,


the universal approximation theorem states that a feed-
forward network with a single hidden layer containing a
finite number of neurons (i.e., a multilayer perceptron),
can approximate continuous functions on compact subsets
of Rn, under mild assumptions on the activation function.
 Neural-net research consequently largely focused on
single-hidden-layer networks
Feed-forward Network (FF)
 Single-layer perceptron (SLP)
 Multi-layer perceptron (MLP)
 Feed-forward network has no internal state, only implements
mathematical function per each unit
 Hidden layer’s job is to transform input into useful data
 Output layer’s job is to transform the output of the hidden layer
section into an understandable and useful scale

Aima UCB
Perceptron Learning

 Adjust weights to minimize error during learning of training set


 Given input x and true output y, the squared error for an example is:

 Optimization can be achieved using gradient descent:

 Update weights with rule:

Aima UCB
Usage Examples
 Pattern Classification: Function approximation:

 Clustering of data points:

Sigmoid Fncs and Their usage in


ANNs, UCF Excel
How to find good weights? Backpropagation
• Cast as a parameter-optimization problem
• Computationally convenient, but not biologically plausible
• Can be slow
• Subject to overtraining and local minima traps

 Output layer: similar to SLP

 Hidden layer: back-propagate the error from the output layer

 Update rule: update hidden layer weights

Aima UCB
The art of neural-network design
 A key to successful neural-net training is properly pre-processing the input data
 Scale data to range [-1, 1] or [0, 1], or some sensible range
 Crop, rotate, resize, filter images (normalize)
 Divide data up into training, testing, and validation data
 Create multiple unique and erroneous copies of normalized images to provide to
NN for training
 (Optional) Principal Component Analysis or autocorrelation-based feature filters
 Convert possibly correlated variables into linearly un-correlated variables, resulting in
the set of principal components of the data vector
 Be lucky—choose good feature filters, number of neurons, connection topology, …
 Hope for good training convergence
 Can “luck” be automated??
Neural Network Architectures
 Organized Layer-Wise
 Fully-connected layers are typical in most neural nets
 In fully-connected layers, each neuron in layer i is connected to each neuron in layer i+1
 i.e. 1 and 2:

 Partially-connected layers are possible


 Output layer
 Typically have linear identity activation function but nothing special

 Used to represent all class scores (classification)

 More hidden layers are harder to train…is it worth it? cs231n Stanford
Biologically-Inspired Web of Neurons

 Neural networks can be constructed with brain-based models


 Many layers with many neurons (and many, many synapses!)
 Imposed topology of connections (not fully connected)
 Specific, templated patterns (as in Hubel and Wiesel)
 Some synapses are innate (pre-encoded; feature detectors?)
What Are Deep Neural Networks?
 Deep Neural Networks (DNN) are the foundation of deep
learning
 Deep learning is equivalent to concatenating deep, multi-
layered neural network structures
 Deep means:
 Many hidden layers
 Many (e.g. millions) of parameters

Google Image, Deep NN


Why Deep Neural Networks?
 Early layers can possibly discover useful feature filters
 Less dependence on luck/art?
 Potentially fewer parameters than a single hidden layer with comparable
performance
 Exploit simplifications (e.g. Hubel/Wiesel; autocorrelation)
 Recent performance breakthroughs:
 ImageNet classification performance
 MNIST character recognition
 Google search engine replacement
 Speech recognition
 Text understanding

Why Now?
 high-performance computing
 GPU’s
 architecture successes
Some Neural-Net Design Tricks:
autocorrelation nets
 Creating a successful neural network requires
constraints to be met. A symmetrical NN can provide a
unique answer to this problem as on the right 
 NN Constraints:
 Generalize the input enough to starve all but the
strongest neurons in order to find what represents all of
the data in an efficient manner
 Recreate input to test neural network hypothesis
 Grow data from starved input neurons
 Generate feedback to the neural network system using
backpropagation
 Repeat until the error is below a certain threshold or a
certain amount of training time has elapsed
Auto-encoder

 The following auto-encoder tries to learn the function hW,b(x)


 Layer L1 is the input layer
 Data is fed through weighted synapses into activation functions,
but it is also starved from 7 neurons to 4 neurons
 Proceeding, layer L2 feeds the data through to layer L3 in an
attempt to recreate the input data.
 If successful, then the synapses from L1 to L2 have discovered a
simplification, and outputs of L2 are simpler, more revealing
features extracted from raw inputs.

Ng, Stanford, Sparse Auto-encoder


Training Weights: Feedforward with
Backpropagation
 Feedforward net with backpropagation:

 Goal: find the derivative of the error with respect to the weight connecting unit i to unit j
 Reason: propagate error slope back upstream to correct neural network weights until near 0
value met
Yaldex Game Development
Backpropagation Error Derivation
 Derivative of the error w.r.t.  The gradient of unit j's output with respect to
to the weight connecting the net sum defined by activation function
unit i to unit j

 Representation of first two terms in error


 Derivative of net sum of derivative
unit j w.r.t. weight dzj/dwij
(output yi of the previous
unit i)

Yaldex Game Development


Convolutional NN (ConvNet, CNN)
 In particular, unlike a typical NN, ConvNet layers have neurons arranged in 3
dimensions: width, height, depth. (Note: depth refers to the third dimension of
an activation volume, not to the depth of a full NN, which refers to the total
number of layers in a network)

 Each layer of a ConvNet transforms one volume of activations to another through a


differentiable function. Three main types of layers are used to construct the
ConvNet. The layers are: Convolutional, Pooling, and Fully-Connected Layer (as
in regular NNs).

cs231n Stanford
Convolutional Layer
 An image convolution of the previous layer,
where the weights specify the convolution
filter
 Consists of rectangular grid of neurons
 Each neuron takes input from rectangular
section of previous layer
Deep Learning Review,
 Weights for that rectangular section are the Nature
same for all neurons in that conv. layer
Pooling
 The purpose of the pooling layer in a ConvNet is to perform a downsampling
operation
 These layers typically follow the convolutional layers in a ConvNet
 Given spatial dimensions, height and width, the pooling layer downsamples
data from a larger space into the space height x width x depth
 Several ways to do this include:
 Max-Pooling: Take the maximum of the values pooled per each block
 Linear Pooling: a learned linear combination of the neurons in the block
 Overall, the algorithm does the following:
 Takes small rectangular blocks from the convolutional layer
 Subsamples them to produce a single output from each block

Object detection, deep


learning, and R- CNNs,
slideplayer.com
LeNet

 Yann LeCun is a large contributor to deep learning. This is his custom NN structure
for handwritten character recognition:

 LeCun currently works for Facebook Research Lab in NY

LeNet-5, LeCun 1980


Deep Neural Networks

 How do deep NNs map features from images to computable or memorized


real-valued numbers for image classification?

 Use descriptors (little blocks of object


edges) to search image for patterns

ECCV, Classical Image Classification Methods


Human-understandable ConvNet
representations

ECCV, Classical Image Classification Methods


Internal Representation

 The following are filters learned by Krizhevsky et al. Each of the 96 filters are
of size 11x11x3, and each one is shared by the 55^2 neurons of one depth slice
Levels abstraction of features
Dropout
 A simple way to prevent NNs from overfitting data
 Used in deep NNs as an approximate Bayesian inference in deep Gaussian
processes

Dropout: A Simple Way to Prevent Neural Networks from


Overfitting;
Dropout as a Bayesian Approximation: Representing Model
Uncertainty in Deep Learning
Dropout: cont’d

With dropout, FF operation


turns into following:

Dropout: A Simple Way to Prevent


Neural Networks from Overfitting;
Dropout Results on MNIST

 The MNIST data set consists of 28x28 pixel handwritten digit images
 The task is to classify the images into 10 digit classes.
 The following table compares the performance of dropout on mnist with other
techniques like support vector machine

Dropout: A Simple Way to Prevent


Neural Networks from Overfitting;
Recurrent NNs (RNN)

 In RNNs, connections between units form a directed cycle


 An internal state of the network is created, which allows it to exhibit dynamic
temporal behavior. Unlike FF NNs, RNNs can use their internal memory to
process arbitrarily long sequences of input data, i.e. long-short-term memory

Deep Learning Review,


Nature
Long-Short-Term Memory (LSTM)

http://nikhilbuduma.com,
Dive into Deep RNNs
RNN with LSTM

http://nikhilbuduma.com,
Dive into Deep RNNs
Regularization with Dropout

 Dropout in Long-Short-Term
Memory (LSTM)
 Do not want to over-fit and
memorize exact sequences
 Rather, want to generalize
input using dropout to store
most important elements of
data

Recurrent NN Regularization, Zaremba et. al.


Recurrent Convolutional Networks
 Recently, many successes have been found combining ConvNets with
Recurrent NNs especially with object detection and classification
 In [1], they use deep recurrent convolutional networks (rather than just deep
CNN’s) for successful image recognition and description. They call it Long-
term Recurrent Convolutional Network (LRCN). It can be visualized as follows:
 There may be several hundred of these layers ->

Long-term Recurrent Convolutional Networks for Visual


Recognition and Description, Donahue et. al.
Testing and Validation
 Proper Training/Testing/validation must occur
 Over-fitting or under-fitting data are always problems
 Problems solved by dropout, regularization, PCA, ICA, clustering layers

visualstudiomagazine.com, NN Train-
Validate-Test
Stochastic Gradient Descent (SGD)

Yaldex Game Development


SGD Decisions
 Take slow, incremental approach or faster, more sporadic, exponentially
decaying approach?
 Increase/Decrease learning rate
 Increase/Decrease step size
 Add momentum:

 Potential issues: Speed, predictability, convergence, local minima


 Solution: tune parameters, weights, data representation

Yaldex Game Development


Visualize of Training Results

 Stepping down the slope from near global maxima to global minima
 Minimum is the solution

Tuning the learning rate


in Gradient Descent,
datumbox.com
Training Visualization: cont’d

One neuron network with two input/target pairs:

http://scs.ryerson.ca/~aharley/neural-
networks/
Training Visualization: cont’d

One neuron network with multiple input/target pairs:

http://scs.ryerson.ca/~aharley/neural-
networks/
Training Error over Time
 Deep NNs take sufficient time to train, i.e. hundreds to millions of iterations
or until time cap is met (usually several days)

http://scs.ryerson.ca/~aharley/neural-
networks/
Advancing Deep Learning

 Mark Zuckerberg claims that the best way to grow deep learning is at the
university level, taking consideration of individual and group research projects to
combine ambitious ideas with professional and well-funded mutual goals of both
Jide-salu.com
companies and researchers
 Top Research Contributors in the field include the following:
 Geoffrey Hinton, Google, University of Toronto
 Known for Boltzmann machines, backpropagation, dropout, deep belief nets, etc.
 Yann LeCun, Facebook Research, NYU Center for Data Science, ATT Research
 Known for research with Hinton, drafting initial backpropagation paper, convolutional
networks, graph transformer networks, regularization methods, etc.
 Andrew Ng, Baidu, Stanford
 Known for co-founding Coursera, furthering deep learning education

Cs.tau.ac.il
Summary
 Neural networks:
 are modeled from the brain
 can be simple or complicated
 are limited by computational resources
 In the past 20 years, neural networks have come a long way
 Increased hardware capacity
 Increased machine learning research
 Increased interest in data analysis
 Deep neural networks created the term “deep learning”
 Many companies currently employ deep learning to solve many diverse problems
 The capacity and applicability of machine learning is foreshadowed by deep learning

You might also like