0% found this document useful (0 votes)
12 views

anthony

The document discusses deep learning techniques for image recognition using Python, TensorFlow, and TensorBoard, focusing on the ImageNet competition and the tasks of object detection and localization. It explains Convolutional Neural Networks (CNNs), their architecture, and the importance of activation functions and backpropagation in training these models. Additionally, it covers the use of TensorFlow for implementing CNNs and the role of TensorBoard for visualizing and debugging neural networks.

Uploaded by

divu271004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

anthony

The document discusses deep learning techniques for image recognition using Python, TensorFlow, and TensorBoard, focusing on the ImageNet competition and the tasks of object detection and localization. It explains Convolutional Neural Networks (CNNs), their architecture, and the importance of activation functions and backpropagation in training these models. Additionally, it covers the use of TensorFlow for implementing CNNs and the role of TensorBoard for visualizing and debugging neural networks.

Uploaded by

divu271004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Deep Learning: Image Recognition

Using Python, Tensorflow, and Tensorboard

Anthony Hershberger

February 22, 2018

1 / 34
Outline

ImageNet Competition
Object Detection
Object Localization
Current Research

Convolutional Neural Networks


What are CNN’s?
Activation Functions

TensorFlow

TensorBoard

2 / 34
ImageNet Large Scale Visual Recognition Challenge 2018

I Started in 2010, ImageNet is a competition where research


groups, commercial enterprises, and individuals compete on
several image challenges.
I This is the first year that the competition will be hosted by
Kaggle.
I The winner of the object localization challenge will be the
team which achieves the minimum error across all test images.
I The winner of the object detection challenge will be the team
which achieves the highest accuracy on the most amount of
object categories

3 / 34
Tasks of Image Recognition

I When you read papers, there are many references to object


localization, object detection, image segmentation.
I Object localization is the task of detecting a single instance
of an object and localizing it with a bounding box.
I Object detection is the task of detecting all instances given
a known class label such as cars, boats, or people.
I Image Segmentation is a process of assigning a class label
to every pixel such that every pixel is within the boundaries of
the object.

4 / 34
ImageNet Dataset

I ImageNet is an image database of over 1.2 million images that


are human-annotated with synsets, or synonym sets, that
classify the images within a word hierarchy.
I The training data for the object localization challenge is a
subset of 150,000 images of ImageNet.
I The training set for the object detection challenge is a subset
of 450k images and 475k objects for classification. The test
set includes 40k images.
I ImageNet Website

5 / 34
Object Localization

I For each photo input, the algorithm will output 5 class labels
denoted
ci , i = 1, 2, ...5
in decreasing order of confidence.
I The algorithm will also output 5 bounding boxes denoted

bi , i = 1, 2, ..., 5

for each class label.


I The ground truth labels are for the image

Ck , k = 1, 2, ..., n

class labels.

6 / 34
Object Localization

I For each ground truth class label Ck , the ground truth


bounding boxes are

Bkm , m = 1, 2, ..., Mk

where Mk is the number of instances of the k th object in the


current image.
I Let d(ci , Ck ) = 0 if ci = Ck and 1 otherwise.

7 / 34
Object Localization

I The function that we are trying to minimize is

1X
e= (mini minm max(d(ci , Ck ), f (bi , Bkm )
n
where d is 0 if the your algorithm predicts the same class as
the bounded class label and f (bi , Bkm ) = 0 if your algorithm’s
bounding box overlaps the ground truth bounding box by
more than 50

8 / 34
Object Detection Challenge

I For each input image, your algorithm must produce a set of


annotations
(ci , si , bi )
where ci are the class labels, si are the confidence scores, and
bi are the bounding boxes.
I There are 200 categories labeled for the test set that are fully
annotated.
I Any objects that are not annotated or create duplicate
bounding boxes will be penalized.

9 / 34
Current Research in Image Recognition

1. ImageNet Classification with Deep Convolutional Neural


Networks* (2012)
2. ZFNet*(2013)
3. Going Deeper with Convolutions*(2014)
4. Delving Deep into Rectifiers: Surpassing Human-Level
Performance on ImageNet Classification*(2015)
5. Squeeze and Excitation Networks* (2017)
6. Mask R-CNN (2018)
*ImageNet winner

10 / 34
Current Research in Image Recognition

I In 2012, AlexNet significantly outperformed all submissions


with a 16 percent error by introducing a deep CNN that
featured stacked convolutional layers.
I In 2013, ZFNet improved on AlexNet’s work by adjusting the
stride and filter sizes to assist in the tuning of
hyperparameters.
I In 2014, GoogLeNet developed inception modules and
implemented the use of average pooling layers to reduce
parameter sizes while reducing error.
I In 2015, Resnet introduced the concept of skip connections
and batch normalization

11 / 34
Convolutional Neural Networks

I Convolutional Neural Networks are a subset of traditional


feed-forward neural networks and have seen much success in
the area of image recognition for their ability to build deep
accurate networks, reduce parameters for efficient training,
and quick recall for deployment.
I A basic CNN architecture involves one or more convolutional
layers attached by non-linear activation functions to pooling
layers and output through a fully connected layers to obtain
the class scores and make predictions.

12 / 34
Convolutional Neural Networks

I In image recognition, a three-layer CNN will learn the edges


from raw pixels on the first ConvLayer, learn simple shapes on
the second ConvLayer, and learn higher level features on the
3rd layer. The 3rd layer will then be connected to the class
probabilities through a fully connected layer and
I The sliding window approach of CNN’s leads to location
invariance which means the network is not affected by
scaling, rotation and translation.

13 / 34
Layers of Convolutional Neural Networks

I Convolutional Layers are the fundamental layers of CNN’s


and involve any layer where convolutional operations occur.
They are the most computational expensive layers but where
most information is learned in the model. These layers are
connected through other layers by non-linear activation
functions.
I Pooling Layers are typically applied after convolution layers
to reduce output dimensionality while minimizing the
information loss of the ConvLayer. Max Pooling using a 2x2
filter and stride of 2 has been a popular operation in recent
research.
I Fully connected Layers attach all neurons to all the
activations in the previous layer

14 / 34
Spatial Arrangements

I Spatial arrangement deals with neuron arrangement and the


volume of the output on the next layer.
I The output volume on the layer following the convolutional
layer is determined by three hyper parameters: depth,stride
and zero padding.

15 / 34
Depth, Stride, and Zero Padding

I Depth describes the output volume and directly corresponds


to the number of filters. The filters convolve around the input
by shifting a fixed number of times. The fixed unit that the
filter convolves is called the stride.
I Stride describes the unit of time that we shift or slide the
filter when convolving an input layer.
I Zero padding is used to pad an input with zeros. Under
some circumstances, zero padding is used for convenience.
The most important feature of zero padding is to preserve the
spatial sizes of outputs.

16 / 34
Activation Functions

I In a CNN, the activation function of a node decides whether


that neuron should fire or activate. Mathematically speaking,
activation functions look at the coefficients of the convolution
matrix, impose a sigmoid function, and attempt to map those
values to 0 (don’t fire) or 1 (fire).
I Activation functions impose non-linearity on the model.
Without an activation function the weight and bias terms
would transform linearly.
I CNN’s are tasked with complex feature extraction that would
perform poorly with a linear activation function

17 / 34
Backpropogation

I First we will look at some standard notation regarding the


concept of backpropogation.
X : Tensor pairs of inputs of xi and class labels yi
xi` : Input x of node i at layer `
yi` : Output y of node i at layer `
wijk : The weight of node j in layer `k coming from node i
bik : The bias of node i in layer k
θ: the grouping of parameters, w and b
0
yi : the actual predicted class label
α: the learning rate of our update function

18 / 34
Backpropogation

I When training a convolutional neural network using the


gradient descent algorithm, the network requires gradient
computations of the chosen error function w .r .t the weights
wijk and biases bik commonly grouped together as θ.
I Gradient descent will iteratively update the weight and bias at
each node and its speed is set by the hyperparamter α, which
is the learning rate. The function to iteratively update the
weights and bias is:

dE (X , θt
θt+1 = thetat − α

19 / 34
backpropogation

I Commonly used error functions used in CNN’s are the mean


squared error defined as
N
1 X 0
E (X , θ) = (yi − yi )2
2N
i=1

as well as the cross entropy loss


X 0
− yi · log(yi )

I The objective of backpropogation is to minimize the the error


function using an optimization method such as gradient
descent.

20 / 34
Computational Complexity of CNNs

I CNN’s are computationally expensive due to the large


WxHxD tensor volumes.
I On a given layer, a 28x28x1 grayscale image would have 784
weights and a 28x28x3 RGB image would have 2352 weights.
I For most computer systems, the largest bottleneck for a CNN
is the GPU memory. (at most 24 gb)
I The usage of memory depends on the number of activations
of intermediate tensors(backpropogation), the number of total
network parameters, and overhead.

21 / 34
Tools for Neural Networks
TensorFlow PyTorch

I Tensorflow is compiled on I PyTorch implements


the use of static dynamic computational
computational graphs graphs.
I Encapsulation: Define the I This allows for dynamic
network architecture within data structures including
the TF session and then mixed lists, stacks, you
execute. name it.
I TF uses static I Since the neural networks
computational graphs are modular and each part
because under the hood of the model is considered
they can pre-compile a separate piece, you can
functions, pre-allocate be very creative when
buffers in memory allowing building your network.
for faster optimization. 22 / 34
Tools for Neural Networks:TensorFlow

I TensorFlow was developed internally by the Google Brain


Team to replace a distributed system called DistBelief.
I TF allows the user to schedule tasks between CPU and GPUs,
or between Multiple GPUs.
I TF provides computationally efficient algorithms for
optimization, node operations, and tools for debugging.
I TF uses static computational graphs to structure the
numerical computations.
I Installing TensorFlow on Windows

23 / 34
TensorFlow installed for CUDA

I TensorFlow programs typically run significantly faster on a


GPU than on a CPU.
I An Nvidia 1080 has 2560 CUDA cores, 8GB GDDRX memory,
and 256 bit memory interface(320GB/s)

24 / 34
CNN Code Explanation

1. Import Tensorflow into your environment

import tensorflow as tf

2. Access the MNIST data through the TF module

from tensorflow.examples.tutorials.mnist import input_data


MNIST = input_data.read_data_sets("/tmp/data/", one_hot=True)

25 / 34
CNN Code Explanation

3. Define your placeholders. Placeholders are nodes that will be


fed values or labels at the time of execution.
x = tf.placeholder(tf.float32, [None, 784], name=’X’)
y = tf.placeholder(tf.float32, [None, 10], name=’Labels’)
4. Define your variables. Variables are stateful nodes that will
output their current value.
W = tf.Variable(tf.zeros([784, 10]), name=’Weights’)
b = tf.Variable(tf.zeros([10]), name=’Bias’)

5. Initialize your variables.


init = tf.global_variables_initializer()

26 / 34
CNN Code Explanation

6. Scope your operations


with tf.name_scope(’Model’):
# Model
pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
with tf.name_scope(’Loss’):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), r
with tf.name_scope(’SGD’):
optimizer = tf.train.GradientDescentOptimizer(learning_rate).mi
with tf.name_scope(’Accuracy’):
acc = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
acc = tf.reduce_mean(tf.cast(acc, tf.float32))

Note: By scoping your operations, TensorFlow can define the


structure of your computational graph for data visualization.
Scoping also assists in scheduling tasks to different
components.

27 / 34
CNN Code Explanation

7. Define the metrics that you want to output to monitor the


perfomance of your network
tf.summary.scalar("loss", cross_entropy)
tf.summary.scalar("accuracy", acc)
merged_summary_op = tf.summary.merge_all()

Note: TensorFlow requires that you place summaries into the


log files in order to be read by Tensorboard.

28 / 34
CNN Code Explanation

7. tf.Session() executes the model.


with tf.Session() as sess:
sess.run(init)
...
for i in range(combined_batch):
batch_xs, batch_ys = MNIST.train.next_batch(batch_size)
_, c, summary = sess.run([optimizer, cross_entropy, merged_summar
feed_dict={x: batch_xs, y: batch_ys})

Note: tf.Session() deploys the defined computational graph


onto the device maps. We must call Sess.run(fetches, feeds)
to execute the nodes in our graph. Fetches are lists of graph
nodes that return outputs. Feeds represent mappings of
graph nodes to concrete values. Keys are graph nodes. The
items are numpy data.

29 / 34
Backpropogation

I We calculate our gradients by creating an optimizer object.


I In this model we have chosen to use gradient descent
optimizer and create a node operation that minimizes the
function using cross entropy loss.
I TensorFlow has gradient operations attached to every node.
When a node is called and executed, TF calculates the
gradients with respect to the loss parameters using
backpropagation.
I TF calculates gradients with respect to the variables which is
why it is important to define variables and placeholders
separately.

30 / 34
TensorBoard

I TensorBoard is a suite of visualization tools to help


decompose, optimize and debug even the most complex
neural networks.
I TensorBoard operates by reading TensorFlow event files.

31 / 34
How do you set up TensorBoard?

1. Create a TensorFlow graph that you would like to collect


summary data from and decide which nodes you would like to
add to your list of summary operations.
2. Store the summary data into a temporary log file within the
Tensorboard environment.
3. Open up your computer’s web browser and enter in your log
address that is attached to the TB port.

32 / 34
MNIST Data Demo

I Let’s use TensorFlow and Python to train a Convolution


Neural Network and output the results to TensorBoard.
I Special Thanks to Martin Gorner’s video on YouTube for
TensorFlow Tutorial.
I Tensorflow and deep learning - without a PhD by Martin
Gorner

33 / 34

You might also like