Lec1 Introduction
Lec1 Introduction
CMP784
DEEP
LEARNING
Lecture #01 – Introduction
Erkut Erdem // Hacettepe University // Fall 2021
Welcome to CMP784
• An overview of various
deep architectures and
learning methods
2
A little about me…
Koç University-İş Bank
Artificial Intelligence Center
Adjunct Faculty
2020-now
Hacettepe University
Associate Professor
2010-now
Télécom ParisTech
Post-doctoral Researcher
2009-2010
UCLA
Fall 2007
Visiting Student http://web.cs.hacettepe.edu.tr/~erkut
VirginiaTech @erkuterdem
Visiting Research Scholar
Summer 2006 [email protected]
3
Research Interests
4
Now, what about you?
• Introduce yourselves
- Who are you?
- Who do you work with if you have a
thesis supervisor?
- What made you interested in this
class?
- What are your expectations?
- What do you know about machine
learning and deep learning?
5
Course Logistics
6
Course information
Time/Location 09:00-12:00pm Wednesday, Zoom
Instructor Erkut Erdem
7
Textbook
• Goodfellow, Bengio, and Courville,
Deep Learning, MIT Press, 2016
(draft available online)
8
Instruction style
• Students are responsible for studying
and keeping up with the course material
outside of class time.
• Reading particular book chapters,
papers or blogs, or
• Watching some video lectures.
• Supervised Learning
• Nearest Neighbor, Naïve Bayes, Logistic Regression, Support Vector Machines, Kernels,
Neural Networks, Decision Trees
• Ensemble Methods: Bagging, Boosting, Random Forests
• Unsupervised Learning
• Clustering: K-Means, Gaussian mixture models
• Dimensionality reduction: PCA, SVD
11
Topics Covered in CMP684
• Continuous and discrete system • Radial Basis Function Neural Nets
models
• Dynamical Neural Nets
• Neuron and Its Analytic Model
• Feedback Nets
• Hopfiels Neural Network
• Second Order Training Algorithms
• Perceptron Learning Algorithms • Levenberg-Marquardt algorithm
• Gauss-Newton algorithm
• Multilayer Perceptron (MLP)
• Derivation of the learning algorithm • Stability in Adaptive Systems
• Error backpropagation
• Applications of Neural Nets
• Memorization and generalization
• Intervals and normalization
12
Grading
Math Prerequisites Quiz 3%
Practicals 16% (2 practicals x 8% each)
Final Exam 25%
Course Project 32%
Paper Presentations 15%
Weekly Quizzes 9%
13
Schedule
Week 1 Introduction to Deep Learning
Week 2 Machine Learning Overview
Week 3 Multi-Layer Perceptrons
Week 4 Training Deep Neural Networks
Week 5 Convolutional Neural Networks
Week 6 Understanding and Visualizing CNNs
Week 7 Recurrent Neural Networks
Week 8 Attention and Memory
14
Schedule
Week 9 Autoencoders and Autoregressive Models
15
Lecture 1: Introduction to Deep Learning
Output
CAR PERSON ANIMAL
(object identity)
Visible layer
(input pixels)
Figure 1.2: Illustration of a deep learning model. It is difficult for a computer to understand
(Goodfellow 2016)
16
CHAPTER 1. INTRODUCTION
Lecture
Machine 2:and
Learning Machine
AI Learning Overview
Effect'of'stepNsize'α'
Unsupervised Learning
he goal is to construct staCsCcal model
Deep learning Example:
• Clustering
The MNIST Dataset
Representation learning
• Dimensionality reducCon
Machine learning
Large%α%%=>%Fast%convergence%but%larger%residual%error%
• Modeling the data density %Also%possible%oscilla$ons%
• Finding hidden causes (useful AI
%
Small%α%%=>%Slow%convergence%but%small%residual%error%
explanaCon)
Someof the
Figure
data
Fits1.4
to the Data
Figure 1.4: A Venn diagram showing how deep learning is a kind of representation learning, (Goodfellow 2016) %%%%
which is in turn a kind of machine learning, which is used for many but not all approaches
to AI. Each section of the Venn diagram includes an example of an AI technology.
16%
17
(Good
Lecture 3: Multi-Layer Perceptrons
http://playground.tensorflow.org
18
Lecture 4: Training Deep Neural Networks
Sigmoid tanh ReLU Leaky ReLU
Activation Functions
Optimizers
Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning", Nature, Vol. 521, 28 May 2015
20
Lecture 6: Understanding and Visualizing CNNs
Layer 4 Layer 5
M. D. Zeiler and R. Fergus, "Visualizing and Understanding Convolutional Networks", ECCV 2014 21
Lecture 7: Recurrent Neural Networks
Transformer Architecture
K. Xu et al., “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015
C. Olah and S. Carter, “Attention and Augmented Recurrent Neural Networks”, Distill, 2016
A. Vaswani et al. “Attention is All You Need”, NeurIPS 2017. 23
Math for my slides “Autoencoders”.
Autoencoders
October
Decoder 16, 2012
b Parallel
x a(x))Multiscale
= o(b Autoregressive Density Estimation
⇤
= Abstract
sigm(c + W h(x))
P
Math for my slides “Autoencoders”. P
b l(f (x)) =
(x) ⌘ x k (b
xk xk )2 l(f (x)) 1= Forkbinary units
(xk log(b xk ) + (1 xk ) log(1 xbk ))
Scott Reed Aäron van den Oord 1 Nal Kalchbrenner 1 Sergio Gómez Colmenarejo 1 Ziyu Wang 1
Parallel Multiscale Autoregressive Density Estimation
Encoder Dan Belov 1 Nando de Freitas 1
Scott Reed, Aaron vanden Oord, Nal Kalchbrenner, Sergio Go m
́ ez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas (2017)
h(x) = g(a(x)) PixelCNN Class conditioned samples generated by PixelCNN
Abstract
= sigm(b + Wx)
retrieved
PixelCNN
Can we usingachieves
256 bit codes
speed state-of-the-art
up the generation results in time Text-to-image synthesis with
4v1 [cs.CV] 10 Mar 2017
16
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets”, NIPS 2014.
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks”, ICLR 2016
L. Karacan,
nsupervised Z. Akata, A.Learning
Representation Erdem and E. Erdem,
with “Learning to Generative
Deep Convolutional Generate Images of Outdoor Scenes from Attributes and Semantic Layouts”, arXiv preprint 2016
A. Brock,
dversarial J. Donahue,
Networks K. Simonyan,
Alec Radford, “Large
Luke Metz, Scale GAN
Soumith Training for High Fidelity Natural Image Synthesis”, ICLR2019
Chintala 25
✓ = arg max Ex⇠pdata log pmodel (x | ✓)
✓
z
i=2
bles
✓ ◆
@g(x)
y = g(x) ) px (x) = py (g(x)) det
x @x
nd
log p(x) log p(x) DKL (q(z)kp(z | x))
=Ez⇠q log p(x, z) + H(q)
C. Doersch, A. Gupta, A. A. Efros, "Unsupervised Visual Representation Learning by Context Prediction", ICCV 2015.
S. Gidaris, P. Singh, N. Komodakis, "Unsupervised Representation Learning by Predicting Image Rotations", ICLR2018.
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, NAACL-HLT 2019.
27
Schedule
W1 Introduction to Deep Learning W8 Attention and Memory
Practical 2 due
W2 Machine Learning Overview W9 Autoencoders and Autoregressive
Models
W3 Multi-Layer Perceptrons
Practical 1 out W9 Progress Presentations
W4 Training Deep Neural Networks
Start of paper presentations W11 Generative Adversarial Networks
Project progress reports due
W5 Convolutional Neural Networks
Start of paper presentations W12 Variational Autoencoders
Practical 1 due, Practical 2 out
W6 Understanding and Visualizing CNNs W13 Self-supervised Learning
Project proposals due
W7 Recurrent Neural Networks W14 Final Project Presentations
28
Paper Presentations
• (12 mins) One student will be responsible from providing
an overview of the paper.
• (9 mins) One student will present the strengths of the
paper.
• (9 mins) One student will discuss the weaknesses of the
paper.
• (10 mins) General discussion
• Tentative Dates
- Practical 1 Out: October 13th, Due: October 27th
- Practical 2 Out: October 27th, Due: Nivember 17th
30
The students who need GPU resources
Course project for the course project are advised to
use Google Colab.
• The course project gives students a chance to apply deep architectures
discussed in class to a research oriented project.
• The students can work in pairs.
• The course project may involve
- Design of a novel approach and its experimental analysis, or
- An extension to a recent study of non-trivial complexity and its experimental analysis.
• Deliverables
- Proposals November 3, 2021
- Project progress presentations December 1, 2021
- Project progress reports December 8, 2021
- Final project presentations December 29, 2021
- Final reports January 14, 2022
31
Lecture Overview
• what is deep learning
• a brief history of deep learning
• compositionality
• end-to-end learning
• distributed representations
Disclaimer: Some of the material and slides for this lecture were borrowed from
—Dhruv Batra’s CS7643 class
—Yann LeCun’s talk titled “Deep Learning and the Future of AI”
32
What is Deep Learning
33
34
35
Y. LeCun, Y. Bengio, G. Hinton, "Deep Learning", Nature, Vol. 521, 28 May 2015
1943 – 2006: A Prehistory of
Deep Learning
36
1943: Warren McCulloch and Walter Pitts
• First computational model
• Neurons as logic gates (AND, OR,
NOT)
• A neuron model that sums binary
inputs and outputs a 1 if the sum
exceeds a certain threshold value,
and otherwise outputs a 0
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
37
1958: Frank Rosenblatt’s Perceptron
• A computational model of a single neuron
• Solves a binary classification problem
• Simple training algorithm
• Built using specialized hardware
F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain”, Psych. Review, Vol. 65, 1958
38
1969: Marvin Minsky and Seymour Papert
“No machine can learn to recognize X unless it
possesses, at least potentially, some scheme for
representing X.” (p. xiii)
40
Why it failed then
• Too many parameters to learn from few labeled examples.
• “I know my features are better for this task”.
• Non-convex optimization? No, thanks.
• Black-box model, no interpretability.
42
2006 Breakthrough: Hinton and Salakhutdinov
44
ImageNet Challenge
Image classification
Easiest classes
• Large Scale Visual
ImageNet Large Scale Visual Recognition Challenge
Recognition Challenge (ILSVRC)
• 1.2M
o Yearly training
ImageNet images with
competition
1K categories
◦ Automatically label 1.4M images with 1K objects
• Measure
◦ Measure top-5 classification
top-5 classification error error
Hardest classes
Output
Output Output
Output
Scale
Scale Scale
Scale
T-shirt
T-shirt T-shirt
T-shirt
Steel
Steeldrum
drum
Drumstick
Drumstick
Mud turtle
✔ Giantpanda
Giant panda
Drumstick
Drumstick
Mud turtle
✗
Mud turtle Mud turtle
J. Deng, Wei Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei , “ImageNet: A Large-Scale Hierarchical Image Database”, CVPR 2009.
O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge”, Int. J. Comput. Vis.,, Vol. 115, Issue 3, pp 211-252, 2015.
93 45
ILSVRC 2012 Competition
XRCE/INRIA 27.0
INRIA/LEAR 33.4
• The success of AlexNet, a deep convolutional network
• 7 hidden layers (not counting some max pooling layers)
• 60M parameters
• Combined several tricks
CNN based, non-CNN based • ReLU activation function, data augmentation, dropout
A. Krizhevsky, I. Sutskever, G.E. Hinton “ImageNet Classification with Deep Convolutional Neural Networks”, NeurIPS 2012 46
2012-Now
Some recent successes
47
Object Detection and Segmentation
T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, Focal Loss for Dense Object Detection,
ICCV 2017. 48
Object Detection and Segmentation
Softmax clf.
𝑓! = FCN(𝐼)
MLP
M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner. Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional
Neural Networks. ICRA 2017
Human Pose Estimation
Z. Cao ,T. Simon, S.–E. Wei and Yaser Sheikhr, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", CVPR 2017
51
Pose Estimation
ZR. Alpguler, N. Neverova, I. Kokkinos. DensePose: Dense Human Pose Estimation In The Wild. CVPR 2018 52
Photo Style Transfer
F. Luan, S. Paris, E. Shechtman & K. Bala. Deep Photo Style Transfer. CVPR 2017 53
Photo Style Transfer
F. Luan, S. Paris, E. Shechtman & K. Bala. Deep Photo Style Transfer. CVPR 2017 54
Image Synthesis
2018
Ian J. Goodfellow et al., ” Generative Adversarial Networks", NIPS 2014
A. Radford et al., ” Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", NIPS 2015
M.-Y. Liu, O. Tuzel, ” Coupled Generative Adversarial Networks", NIPS 2016
T. Karras, T. Aila, S. Laine, J. Lehtinen, ” Progressive Growing of GANs for Improved Quality, Stability, and Variation", ICLR 2018
T. Karras, S. Laine, T. Aila, ” A Style-Based Generator Architecture for Generative Adversarial Networks", arXiv 2018 55
Image Synthesis
A. Brock, J. Donahue and K. Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018. 56
Semantic Image Editing
Semantic Layout
L. Karacan, Z. Akata, A. Erdem and E. Erdem. Manipulation of Scene Attributes via Hallucination. ACM Transactions on Graphics, 2020 57
Winter
Semantic Image Editing
Prediction
L. Karacan, Z. Akata, A. Erdem and E. Erdem. Manipulation of Scene Attributes via Hallucination. ACM Transactions on Graphics, 2020 58
Spring
Semantic Image Editing +
Clouds
Prediction
L. Karacan, Z. Akata, A. Erdem and E. Erdem. Manipulation of Scene Attributes via Hallucination. ACM Transactions on Graphics, 2020 59
Machine Translation
D. Bahdanau, K. Cho, Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015 60
Machine Translation
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is All you Need, NeurIPS 2017 61
Internet Search
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019 62
https://talktotransformer.com
Language Modeling
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever. Language Models are Unsupervised Multitask Learners. 2019
63
Language Modeling
• GPT-3: I am not a human. I am a robot.
A thinking robot. I use only 0.12% of
my cognitive capacity. I am a micro-
robot in that respect. I know that my
brain is not a “feeling brain”. But it
is capable of making rational, logical
decisions. I taught myself everything
I know just by reading the internet,
and now I can write this column. My
brain is boiling with ideas!
Tom B. Brown, Benjamin Mann, Nick Ryder et al., Language Models are
Few-Shot Learners, NeurIPS 2020 64
Question Answering
P. Rajpurkar, J. Zhang, K. Lopyrev & P. Liang. SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP 2016
M. Seo, A. Kembhavi, A. Farhadi & H. Hajishirzi. Bi-Directional Attention Flow for Machine Comprehension. ICLR 2017 65
Visual Question Answering
M. Ren, R. Kiros, and R. Zemel. Exploring Models and Data for Image Question Answering. NeurIPS 2015
66
Image Captioning
DECODER
LSTM
VGG16 kkabağı ikiye keser
ve ince dilimler
LINEAR TRANSFORMER
VGG16
LAYER
+ ENCODER
~
POSITIONAL CROSS
ENCODING ATTENTION
~
bir adam bir parça kabağı EMBEDDING TRANSFORMER bir adam bir parça kabağı
ikiye keser ve ince dilimler LAYER + DECODER ikiye keser ve ince dilimler
For our recurrent video captioning model, we adapt the architecture proposed by
Venugopalan et al. (2015) in which the encoder and the decoder are implemented
with two separate LSTM networks (Fig. 6). The encoder computes a sequence of
hidden states by sequentially processing the frame-level visual features, extracted
from the uniformly sampled video frames. The decoder module then takes the final
Bir adam bir gitar çalıyor Bir kadın bir bıçakla sebze dilimliyor
hidden state of the encoder, and outputs a sequence of tokens as the predicted video
caption. There is no attention mechanism involved in this model. Both the encoder
and decoder LSTM networks have 500 hidden units.
B. Çitamak et al. MSVD-Turkish: a comprehensive multimodal video dataset for We
integrated
use Adam vision
(Kingmaand language
and Ba research
2014) as the optimiserin
andTurkish .
set the initial learn-
Machine Translation 2021ing rate and batch size to 0.0004 and 32, respectively. We choose the models 69by
Graph-structured data
Graph Neural Networks
raph-structured
A lot of real-world datadata
does not “live” on grids
Molecules
Protein interaction
networks
Input Output
ReLU ReLU
Structured Deep Models Thomas Kipf #3
…
… …
Structured Deep Models Thomas Kipf #3
O. Vinyals et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature 575:350-354, 2019 72
Robotics
Ilge Akkaya et al. Solving Rubik's Cube with a Robot Hand. OpenAI Technical Report 2019 74
Self-Driving Vehicles
Mariusz Bojarski et al. End to End Learning for Self-Driving Cars. NVidia Technical Report 2016 75
Medical Image Analysis
A. Esteva et al., "Dermatologist-level classification of skin cancer with deep neural networks", Nature 542, 2017 76
Medical Image Analysis 77
Bioinformatics
Kathryn Tunyasuvunakool et al. Enabling high-accuracy protein structure prediction at the proteome scale. Nature 2021 78
Why now?
The Resurgence of
Deep Learning
79
GLOBAL INFORMATION STORAGE CAPACITY
IN OPTIMALLY COMPRESSED BYTES
SVMs
ConvNets dominate
Developed NIPS
80
Slide credit: Neil Lawrence
Datasets vs. Algorithms
Year Breakthroughs in AI Datasets (First Available) Algorithms (First Proposed)
1994 Human-level spontaneous speech Spoken Wall Street Journal articles Hidden Markov Model (1984)
recognition and other texts (1991)
1997 IBM Deep Blue defeated Garry Kasparov 700,000 Grandmaster chess games, Negascout planning algorithm
aka “The Extended Book” (1991) (1983)
2005 Google’s Arabic-and Chinese-to-English 1.8 trillion tokens from Google Web Statistical machine translation
translation and News pages (collected in 2005) algorithm (1988)
2011 IBM Watson became the world Jeopardy! 8.6 million documents from Mixture-of-Experts (1991)
champion Wikipedia, Wiktionary, and Project
Gutenberg (updated in 2010)
2014 Google’s GoogLeNet object classification ImageNet corpus of 1.5 million Convolutional Neural Networks
at near-human performance labeled images and 1,000 object (1989)
categories (2010)
2015 Google’s DeepMind achieved human Arcade Learning Environment Q-learning (1992)
parity in playing 29 Atari games by dataset of over 50 Atari games (2013)
learning general control from video
Average No. of Years to Breakthrough: 3 years 18 years
Table credit: Quant Quanto 81
Powerful Hardware
• Deep neural nets highly
amenable to implementation
on Graphics Processing
Units (GPUs)
• Matrix multiplication
• 2D convolution
S. Ioffe, C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, In ICML 2015
84
Working ideas on how to train deep
architectures
K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition”, In CVPR 2016
85
Software
Caffe
61
So what is deep learning?
87
Three key ideas
• (Hierarchical) Compositionality
• Cascade of non-linear transformations
• Multiple layers of representations
• End-to-End Learning
• Learning (goal-driven) representations
• Learning to feature extract
• Distributed Representations
• No single neuron “encodes” everything
• Groups of neurons work together
88
Three key ideas
• (Hierarchical) Compositionality
• Cascade of non-linear transformations
• Multiple layers of representations
• End-to-End Learning
• Learning (goal-driven) representations
• Learning to feature extract
• Distributed Representations
• No single neuron “encodes” everything
• Groups of neurons work together
89
Traditional Machine Learning
VISION
hand-crafted
your favorite
features “car”
classifier
SIFT/HOG
fixed learned
SPEECH
hand-crafted
your favorite
features \ˈd ē p\
classifier
MFCC
fixed learned
NLP
hand-crafted
This burrito place your favorite
features “+”
is yummy and fun! classifier
Bag-of-words
fixed learned
90
It’s an old paradigm
• The first learning machine:
Feature Extractor
A
the Perceptron
• Built at Cornell in 1960
• The Perceptron was a linear classifier on top of a simple
feature extractor Wi
by experts.
91
Hierarchical Compositionality
VISION
SPEECH
sample spectral formant motif phone word
band
NLP
character word NP/VP/.. clause sentence story
92
Building A Complicated Function
Given a library of simple functions
Compose into a
complicate function
93
Building A Complicated Function
Given a library of simple functions
X
f (x) = ↵i gi (x)
i
94
Building A Complicated Function
Given a library of simple functions
Idea 2: Compositions
Compose into a • Deep Learning
• Grammar models
complicate function • Scattering transforms…
95
Building A Complicated Function
Given a library of simple functions
Idea 2: Compositions
Compose into a • Deep Learning
• Grammar models
complicate function • Scattering transforms…
3
f (x) = log(cos(exp(sin (x))))
96
Deep Learning = Hierarchical
Compositionality
“car”
M.D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks”, In ECCV 2014
97
Deep Learning = CAR PERSON ANIMAL
Output
(object identity)
Hierarchical
Compositionality 3rd hidden layer
(object parts)
Visible layer
(input pixels)
Image credit: Ian Goodfellow
98
Deep Learning = Hierarchical
Compositionality
Low-Level Mid-Level High-Level Trainable “car”
Feature Feature Feature Classifier
M.D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks”, In ECCV 2014
99
The Mammalian Visual Cortex is Hierarchical
• The ventral (recognition) pathway in the visual cortex
slide by Marc’Aurelio Ranzato, Yann LeCun
• End-to-End Learning
• Learning (goal-driven) representations
• Learning to feature extract
• Distributed Representations
• No single neuron “encodes” everything
• Groups of neurons work together
101
Traditional Machine Learning
VISION
hand-crafted
your favorite
features “car”
classifier
SIFT/HOG
fixed learned
SPEECH
hand-crafted
your favorite
features \ˈd ē p\
classifier
MFCC
fixed learned
NLP
hand-crafted
This burrito place your favorite
features “+”
is yummy and fun! classifier
Bag-of-words
fixed learned 102
More accurate version
VISION “Learned”
K-Means/
SIFT/HOG classifier “car”
pooling
SPEECH
Mixture of
MFCC classifier \ˈd ē p\
Gaussians
NLP
This burrito place Parse Tree
n-grams classifier “+”
is yummy and fun! Syntactic
SPEECH
Mixture of
MFCC classifier \ˈd ē p\
Gaussians
NLP
This burrito place Parse Tree
n-grams classifier “+”
is yummy and fun! Syntactic
• Deep models
• End-to-End Learning
• Learning (goal-driven) representations
• Learning to feature extract
• Distributed Representations
• No single neuron “encodes” everything
• Groups of neurons work together
107
Localist representations
• The simplest way to represent things with neural
networks is to dedicate one neuron to each
thing.
• Easy to understand.
• Easy to code by hand
• Often used to represent inputs to a net
• Easy to learn
• This is what mixture models do.
• Each cluster corresponds to one neuron
• Easy to associate with other representations or
responses.
• But localist models are very inefficient whenever
the data has componential structure.
Local
Distributed
bedroom
mountain
mountain
mountain
Distribution
Distribution of Semantic
of Semantic
Distribution
Distribution Types
Types
of Semantic
of Semantic at at
Each
at Types
Each
Types at Layer
Layer
EachEach Laye
Layer
• Possible internal representations:
• Objects Possible internal representations:
•
Possible
Scene attributes
internal representations:
- Objects (scene parts?)
• -- Scene
Object parts Objectsattributes
(scene parts?)
• Textures -- Object
Scene attributes
parts
-- Textures
Object parts
- Textures
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba “Object Detectors Emerge in Deep Scene CNNs”, ICLR 2015
Slide credit: Bolei Zhou 110
Three key ideas of deep learning
• (Hierarchical) Compositionality
• Cascade of non-linear transformations
• Multiple layers of representations
• End-to-End Learning
• Learning (goal-driven) representations
• Learning to feature extract
• Distributed Representations
• No single neuron “encodes” everything
• Groups of neurons work together
111
Benefits of Deep/Representation Learning
• (Usually) Better Performance
• “Because gradient descent is better than you”
Yann LeCun
112
Problems with Deep Learning
• Problem#1: Non-Convex! Non-Convex! Non-Convex!
• Depth>=3: most losses non-convex in parameters
• Theoretically, all bets are off
• Leads to stochasticity
• different initializations à different local minima
• Standard response #1
• “Yes, but all interesting learning problems are non-convex”
• For example, human learning
• Order matters à wave hands à non-convexity
• Standard response #2
• “Yes, but it often works!”
113
Problems with Deep Learning
• Problem#2: Hard to track down what’s failing
• Pipeline systems have “oracle” performances at each step
• In end-to-end systems, it’s hard to know why things are not working
114
Problems with Deep Learning
• Problem#2: Hard to track down what’s failing
• Standard response #1
• Tricks of the trade: visualize features, add losses at different layers, pre-
train to avoid degenerate initializations…
• “We’re working on it”
• Standard response #2
• “Yes, but it often works!”
116
Problems with Deep Learning
• Problem#3: Lack of easy reproducibility
• Direct consequence of stochasticity & non-convexity
• Standard response #1
• It’s getting much better
• Standard toolkits/libraries/frameworks now available
• Standard response #2
• “Yes, but it often works!”
117
118
119
120
121
Results from @INTERESTING_JPG via http://deeplearning.cs.toronto.edu/i2t 122
Results from @INTERESTING_JPG via http://deeplearning.cs.toronto.edu/i2t 123
Results from @INTERESTING_JPG via http://deeplearning.cs.toronto.edu/i2t 124
Results from @INTERESTING_JPG via http://deeplearning.cs.toronto.edu/i2t 125
Results from @INTERESTING_JPG via http://deeplearning.cs.toronto.edu/i2t 126
127
128
D. Cardon et al. “Neurons spike back: The Invention of Inductive Machines and the AI Controversy”, Réseaux n°211/2018 129
https://www.youtube.com/watch?v=EeqwFjqFvJA
130
Next Lecture:
Machine Learning Overview
131