0% found this document useful (0 votes)
97 views51 pages

Tensorflow and Deep Learning

Uploaded by

Sara Elateif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views51 pages

Tensorflow and Deep Learning

Uploaded by

Sara Elateif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

>TensorFlow and deep learning_

without a PhD

deep deep
Science ! Code ...

#Tensorflow @martin_gorner
Hello World: handwritten digits classification - MNIST

?
MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/
Very simple model: softmax classification

784 pixels
28x28
pixels
... weighted sum of all
pixels + bias

softmax ...

0 1 2 9
neuron outputs

Data & Analytics 3


In matrix notation, 100 images at a time
10 columns
w0,0 w0,1 w0,2 w0,3 … w0,9
w1,0 w1,1 w1,2 w1,3 … w1,9 broadcast
w2,0 w2,1 w2,2 w2,3 … w2,9
w3,0 w3,1 w3,2 w3,3 … w3,9

784 lines
w4,0 w4,1 w4,2 w4,3 … w4,9
x w5,0 w5,1 w5,2 w5,3 … w5,9
x w6,0 w6,1 w6,2 w6,3 … w6,9
x
X : 100 images, x
x
w7,0 w7,1 w7,2 w7,3 … w7,9
w8,0 w8,1 w8,2 w8,3 … w8,9
one per line, x …
x
flattened x
w783,0 w783,1 w783,2 … w783,9

L0,0 L0,1 L0,2 L0,3 ……L0,9 + b 0 b1 b2 b3 … b9


L1,0 L1,1 L1,2 L1,3 … L1,9
L2,0 L2,1 L2,2 L2,3 … L2,9
L3,0 L3,1 L3,2 L3,3 … L3,9 + Same biases
L4,0 L4,1 L4,2 L4,3 … L4,9
… on all lines
L99,0 L99,1 L99,2 … L99,9
784 pixels
Softmax, on a batch of images

Predictions Images Weights Biases


Y[100, 10] X[100, 748] W[748,10] b[10]

broadcast
applied line matrix multiply
on all lines
by line
tensor shapes in [ ]

Data & Analytics 5


Now in TensorFlow (Python)

tensor shapes: X[100, 748] W[748,10] b[10]

Y = tf.nn.softmax(tf.matmul(X, W) + b)

broadcast
matrix multiply
on all lines

Data & Analytics 6


Success ?
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 1 0 0 0

actual probabilities, “one-hot” encoded

Cross entropy:
this is a “6”
computed probabilities
0.1 0.2 0.1 0.3 0.2 0.1 0.9 0.2 0.1 0.1
0 1 2 3 4 5 6 7 8 9

Data & Analytics 7


Demo
92%
TensorFlow - initialisation

import tensorflow as tf this will become the batch size, 100

X = tf.placeholder(tf.float32, [None, 28, 28, 1])


W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10])) 28 x 28 grayscale images

init = tf.initialize_all_variables()

Training = computing variables W and b

Data & Analytics 10


TensorFlow - success metrics
flattening images
# model
Y = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)
# placeholder for correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])
“one-hot” encoded
# loss function
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))
“one-hot” decoding
# % of correct answers found in batch
is_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

Data & Analytics 11


TensorFlow - training

learning rate

optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)

loss function

Data & Analytics 12


TensorFlow - run !
sess = tf.Session() running a Tensorflow
sess.run(init) computation, feeding
placeholders
for i in range(1000):
# load batch of images and correct answers
batch_X, batch_Y = mnist.train.next_batch(100)
train_data={X: batch_X, Y_: batch_Y}

# train
sess.run(train_step, feed_dict=train_data)

# success ?
Tip: a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data)
do this
every 100 # success on test data ?
iterations test_data={X: mnist.test.images, Y_: mnist.test.labels}
a,c = sess.run([accuracy, cross_entropy, It], feed=test_data)
TensorFlow - full python code
training step

import tensorflow as tf
initialisation optimizer = tf.train.GradientDescentOptimizer(0.003)
train_step = optimizer.minimize(cross_entropy)

X = tf.placeholder(tf.float32, [None, 28, 28, 1]) sess = tf.Session()


W = tf.Variable(tf.zeros([784, 10])) sess.run(init)
b = tf.Variable(tf.zeros([10]))
init = tf.initialize_all_variables() model for i in range(10000):
# load batch of images and correct answers
# model batch_X, batch_Y = mnist.train.next_batch(100)
Y=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b) train_data={X: batch_X, Y_: batch_Y}

# placeholder for correct answers # train


Y_ = tf.placeholder(tf.float32, [None, 10]) sess.run(train_step, feed_dict=train_data) Run
# loss function
success metrics # success ? add code to print it
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y)) a,c = sess.run([accuracy, cross_entropy], feed=train_data)

# % of correct answers found in batch # success on test data ?


is_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1)) test_data={X:mnist.test.images, Y_:mnist.test.labels}
accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32)) a,c = sess.run([accuracy, cross_entropy], feed=test_data)

Data & Analytics 14


Cookbook

Softmax
Cross-entropy
Mini-batch
ML
e e p !

|
Go d

|
|
|
Let’s try 5 fully-connected layers !
overkill
;-)

784

200

100
sigmoïd function
60

30

10 softmax
0 1 2 ... 9

Data & Analytics 17


TensorFlow - initialisation

K = 200
L = 100 weights initialised
M = 60 with random values
N = 30

W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))
B1 = tf.Variable(tf.zeros([K])

W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))


B2 = tf.Variable(tf.zeros([L])
W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))
B3 = tf.Variable(tf.zeros([M])
W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))
B4 = tf.Variable(tf.zeros([N])
W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))
B5 = tf.Variable(tf.zeros([10]))

Data & Analytics 18


TensorFlow - the model

weights and biases

X = tf.reshape(X, [-1, 28*28])

Y1 = tf.nn.sigmoid(tf.matmul(X, W1) + B1)


Y2 = tf.nn.sigmoid(tf.matmul(Y1, W2) + B2)
Y3 = tf.nn.sigmoid(tf.matmul(Y2, W3) + B3)
Y4 = tf.nn.sigmoid(tf.matmul(Y3, W4) + B4)
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)

Data & Analytics 19


Demo - slow start ?

Data & Analytics 20


Relu !
RELU
RELU = Rectified Linear Unit

Y = tf.nn.relu(tf.matmul(X, W) + b)
Data & Analytics 22
RELU

Data & Analytics 23


Demo - noisy accuracy curve ?

yuck!
Slow down . . .
Learning
rate decay
Learning rate decay

Learning rate 0.003 at start then dropping exponentially to 0.0001


Data & Analytics 26
Demo - dying neurons

Dy
ing
...

Data & Analytics 27


Dropout
Dropout

pkeep =
tf.placeholder(tf.float32) TRAINING EVALUATION
pkeep=0.75 pkeep=1

Yf = tf.nn.relu(tf.matmul(X, W) + B)
Y = tf.nn.dropout(Yf, pkeep)
Data & Analytics 29
Dropout

Dy
ing
...

D
ea
d
with dropout
Data & Analytics 30
Demo
98%
Data & Analytics 32
All the party tricks

98.2%
97.9%
sustained
peak

Sigmoid,decaying
RELU, learning
learningrate
learning
rate= =0.003
0.003
rate 0.003 -> 0.0001 and dropout 0.75

Data & Analytics 33


Overfitting
Cross-entropy loss

Overfitting

Data & Analytics 34


Overfitting Too many
neurons
?!?

BAD
Network
Not
enough
DATA
Convolutional layer
convolutional
subsampling

+padding convolutional
subsampling
convolutional
subsampling
stride

W1[4, 4, 3]

W2[4, 4, 3] W[4, 4, 3, 2]
filter input output
size channels channels

Data & Analytics 36


Hacker’s tip

ALL
Convolu-
tional
Convolutional neural network
+ biases on
all layers
28x28x1
convolutional layer, 4 channels
W1[5, 5, 1, 4] stride 1
28x28x4
convolutional layer, 8 channels
14x14x8 W2[4, 4, 4, 8] stride 2

convolutional layer, 12 channels


7x7x12
W3[4, 4, 8, 12] stride 2

200 fully connected layer W4[7x7x12, 200]


10 softmax readout layer W5[200, 10]
Tensorflow - initialisation
filter input output
K=4
size channels channels
L=8
M=12

W1 = tf.Variable(tf.truncated_normal([5, 5, 1, K] ,stddev=0.1))
B1 = tf.Variable(tf.ones([K])/10)
W2 = tf.Variable(tf.truncated_normal([5, 5, K, L] ,stddev=0.1))
B2 = tf.Variable(tf.ones([L])/10)
W3 = tf.Variable(tf.truncated_normal([4, 4, L, M] ,stddev=0.1))
B3 = tf.Variable(tf.ones([M])/10)
weights initialised
N=200 with random values

W4 = tf.Variable(tf.truncated_normal([7*7*M, N] ,stddev=0.1))
B4 = tf.Variable(tf.ones([N])/10)
W5 = tf.Variable(tf.truncated_normal([N, 10] ,stddev=0.1))
B5 = tf.Variable(tf.zeros([10])/10)
Tensorflow - the model

input image batch weights stride biases


X[100, 28, 28, 1]

Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)


Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)
Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3)

YY = tf.reshape(Y3, shape=[-1, 7 * 7 * M])


Y3 [100, 7, 7, 12]
flatten all values for
Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4) fully connected layer
YY [100, 7x7x12]
Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)

Data & Analytics 40


Demo
98.9%

Data & Analytics 42


WTFH ???

???

Data & Analytics 43


Bigger convolutional network + dropout
+ biases on
all layers
28x28x1
convolutional layer, 6 channels
28x28x6 W1[6, 6, 1, 6] stride 1

convolutional layer, 12 channels


14x14x12 W2[5, 5, 6, 12] stride 2

7x7x24 convolutional layer, 24 channels


W3[4, 4, 12, 24] stride 2
+DROPOUT
200 fully connected layer W4[7x7x24, 200]
p=0.75
10 softmax readout layer W5[200, 10]
Demo
99.3%

Data & Analytics 46


YEAH !

with dropout
Data & Analytics 47
Learning
Relu !
rate decay

Dropout

Softmax
Cross-entropy ALL
Mini-batch Convolu-
Go deep

Overfitting Too many


tional ?!?
neurons
|
|
BAD
Network
Not
enough
DATA
Cartoon images copyright: alexpokusay / 123RF stock photos
Have fun !
- cloud.google.com

All code snippets are on


Cloud ML ALPHA GitHub:
github.com/martin-
your TensorFlow models gorner/tensorflow-
trained in Google’s cloud, mnist-tutorial
fast.
This presentation:

Pre-trained models:
goo.gl/pHeXe7

tensorflow.org Cloud Vision API

Cloud Speech API ALPHA


Martin Görner
Google Developer relations
That’s all
Google Translate API
@martin_gorner folks...
Data & Analytics 49
plus.google.com/+MartinGorner
Workshop

Keyboard shortcuts for the


visualisation GUI:

1 ......... display 1st graph only


2 ......... display 2nd graph only
3 ......... display 3rd graph only
4 ......... display 4th graph only
5 ......... display 5th graph only
6 ......... display 6th graph only
7 ......... display graphs 1 and 2
8 ......... display graphs 4 and 5
9 ......... display graphs 3 and 6
ESC or 0 .. back to displaying all graphs

SPACE ..... pause/resume


O ......... box zoom mode (then use mouse)
H ......... reset all zooms
Ctrl-S .... save current image

Data & Analytics 50


Starter code and solutions:
Workshop github.com/martin-gorner/tensorflow-mnist-tutorial
1. Theory (sit back and listen) 6. Practice
Replace all your sigmoids with RELUs. Test. Then add
Softmax classifier, mini-batch, cross-entropy and how to learning rate decay from 0.003 to 0.0001 using the
implement them in Tensorflow (slides 1-14) formula lr = lrmin+(lrmax-lrmin)*exp(-i/2000).
Solution in: mnist_2.1_five_layers_relu_lrdecay.py

2. Practice 7. Practice (if time allows)


Open file: mnist_1.0_softmax.py
Run it, play with the visualisations (see instructions on Add dropout on all layers using a value between 0.5 and
previous slide), read and understand the code as well as 0.8 for pkeep.
the basic structure of a Tensorflow program. Solution in: mnist_2.2_five_layers_relu_lrdecay_dropout.py

3. Theory (sit back and listen) 8. Theory (sit back and listen)

Hidden layers, sigmoid activation function (slides 16-19) Convolutional networks (slides 36-42)

4. Practice 9. Practice
Start from the file you have and add one or two hidden
layers. Use cross_entropy_with_logits to avoid Replace your model with a convolutional network,
numerical instabilities with log(0). without dropout.
Solution in: mnist_2.0_five_layers_sigmoid.py Solution in: mnist_3.0_convolutional.py

5. Theory (sit back and listen) 10. Practice (if time allows)

The neural network toolbox: RELUs, learning rate decay, Try a bigger neural network (good hyperparameters on
dropout, overfitting (slides 20-35) slide 44) and add dropout on the last layer.
Solution in: mnist_3.0_convolutional_bigger_dropout.py

You might also like