Ch-2-Mathematical Building Blocks NN
Ch-2-Mathematical Building Blocks NN
3
Here, our network consists of a sequence of two Dense layers, which
are densely connected (also called fully connected) neural layers. The
second (and last) layer is a 10-way softmax layer, which means it will
return an array of 10 probability scores (summing to 1). Each score will
be the probability that the current digit image belongs to one of our 10
digit classes.
The softmax function, often used in the final layer of a neural network
model for classification tasks, converts raw output scores — also
known as logits — into probabilities by taking the exponential of each
output and normalizing these values by dividing by the sum of all the
exponentials.
4
To make the network ready for training, we need to pick three more
things, as part of the compilation step:
• A loss function—How the network will be able to measure its
performance on the training data, and thus how it will be able to
navigate itself in the right direction.
• An optimizer—The mechanism through which the network will
update itself based on the data it sees and its loss function.
• Metrics to monitor during training and testing—Here, we’ll only care
about accuracy (the fraction of the images that were correctly
classified).
5
Data representations for neural networks
In the previous example, we started from data stored in
multidimensional Numpy arrays, also called tensors. In general, all
current machine-learning systems use tensors as their basic data
structure. Tensors are fundamental to the field—so fundamental that
Google’s TensorFlow was named after them.
At its core, a tensor is a container for data—almost always numerical
data. So, it’s a container for numbers. You may be already familiar with
matrices, which are 2D tensors: tensors are a generalization of matrices
to an random number of dimensions (note that in the context of
tensors, a dimension is often called an axis).
6
Tensors
A Tensor is a N-dimensional Matrix:
• A Scalar is a 0-dimensional tensor
• A Vector is a 1-dimensional tensor
• A Matrix is a 2-dimensional tensor
A Tensor is a generalization of Vectors and Matrices to higher
dimensions.
7
8
Tensor Ranks
The number of directions a tensor can have in a N-dimensional space,
is called the Rank of the tensor.
The rank is denoted R.
A Scalar is a single number.
• It has 0 Axes
• It has a Rank of 0
• It is a 0-dimensional Tensor
9
A Vector is an array of numbers.
• It has 1 Axis
• It has a Rank of 1
• It is a 1-dimensional Tensor
• A Matrix is a 2-dimensional array.
• It has 2 Axis
• It has a Rank of 2
• It is a 2-dimensional Tensor
10
Vectors
11
An array of numbers is called a vector, or 1D tensor. A 1D tensor is said
to have exactly one axis. Following is a Numpy vector:
>>> x = np.array([12, 3, 6, 14])
>>> x array([12, 3, 6, 14])
>>> x.ndim 1
This vector has five entries and so is called a 5-dimensional vector.
Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one
axis and has five dimensions along its axis, whereas a 5D tensor has five
axes (and may have any number of dimensions along each axis).
Dimensionality can denote either the number of entries along a specific
axis (as in the case of our 5D vector) or the number of axes in a tensor
(such as a 5D tensor), which can be confusing at times.
12
Scalars (0D tensors)
A tensor that contains only one number is called a scalar (or scalar tensor, or
0-dimensional tensor, or 0D tensor). In Numpy, a float32 or float64 number is
a scalar tensor (or scalar array). You can display the number of axes of a
Numpy tensor via the ndim attribute; a scalar tensor has 0 axes (ndim == 0).
The number of axes of a tensor is also called its rank. Here’s a Numpy scalar:
>>> import numpy as np
>>> x = np.array(12)
>>> x array(12)
>>> x.ndim
0
Note: ndim represents the number of dimensions (axes) of the ndarray
13
Matrices (2D tensors)
An array of vectors is a matrix, or 2D tensor. A matrix has two axes
(often referred to rows and columns).
A rectangular representation of numbers in rows and columns is called
a matrix. If a matrix has n rows and m columns it is said to be an order
“n x m” matrix (read as n by m matrix). Matrices are used to represent
a dataset systematically with the rows representing different data
points and the columns representing their different parameters.
14
This is a Numpy matrix:
>>> x = np.array([[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]])
>>> x.ndim
2
The entries from the first axis are called the rows, and the entries from
the second axis are called the columns. In the previous example, [5, 78,
2, 34, 0] is the first row of x, and [5, 6, 7] is the first column.
15
3D tensors and higher-dimensional tensors
Most of the structured data is usually represented in the form of tables
or a specific matrix.
If you pack such matrices in a new array, you obtain a 3D tensor, which
you can visually understand as a cube of numbers. Following is a Numpy
3D tensor: By packing 3D tensors in an array, you can create a 4D tensor,
and so on. In deep learning, you’ll generally manipulate tensors that are
0D to 4D, although you may go up to 5D if you process video data.
16
Key attributes
A tensor is defined by three key attributes:
Number of axes (rank)—For instance, a 3D tensor has three axes, and a
matrix has two axes. This is also called the tensor’s ndim in Python libraries
such as Numpy.
Shape—This is a tuple of integers that describes how many dimensions the
tensor has along each axis. For instance, the previous matrix example has
shape (3, 5), and the 3D tensor example has shape (3, 3, 5). A vector has a
shape with a single element, such as (5,), whereas a scalar has an empty
shape, ().
Data type (usually called dtype in Python libraries)—This is the type of the
data contained in the tensor; for instance, a tensor’s type could be float32,
uint8, float64, and so on. On rare occasions, you may see a char tensor. Note
that string tensors don’t exist in Numpy (or in most other libraries), because
tensors live in preallocated, contiguous memory segments: and strings, being
variable length, would preclude the use of this implementation
17
18
Let’s display the fourth digit in this 3D tensor, using the library
Matplotlib (part of the standard scientific Python suite).
Matplotlib is a low level graph plotting library in python that serves as a
visualization utility.
• Matplotlib was created by John D. Hunter.
• Matplotlib is open source and we can use it freely.
• Matplotlib is mostly written in python, a few segments are written in
C, Objective-C and Javascript for Platform compatibility.
19
Displaying the fourth digit
digit = train_images[4]
import matplotlib.pyplot as plt
plt.imshow(digit, cmap=plt.cm.binary)
plt.show()
20
Real-world examples of data tensors
Let’s make data tensors more concrete with a few examples similar to
what you’ll encounter later. The data you’ll manipulate will almost
always fall into one of the following categories:
• Vector data—2D tensors of shape (samples, features)
• Timeseries data or sequence data—3D tensors of shape (samples,
timesteps, features)
• Images—4D tensors of shape (samples, height, width, channels) or
(samples, channels, height, width)
• Video—5D tensors of shape (samples, frames, height, width,
channels) or (samples, frames, channels, height, width)
21
Vector data
This is the most common case. In such a dataset, each single data point can
be encoded as a vector, and thus a batch of data will be encoded as a 2D
tensor (that is, an array of vectors), where the first axis is the samples axis
and the second axis is the features axis. Let’s take a look at two examples:
• An actuarial dataset of people, where we consider each person’s age, ZIP
code, and income. Each person can be characterized as a vector of 3 values,
and thus an entire dataset of 100,000 people can be stored in a 2D tensor
of shape (100000, 3).
• A dataset of text documents, where we represent each document by the
counts of how many times each word appears in it (out of a dictionary of
20,000 common words). Each document can be encoded as a vector of
20,000 values (one count per word in the dictionary), and thus an entire
dataset of 500 documents can be stored in a tensor of shape (500, 20000).
22
Timeseries data or sequence data
Whenever time matters in your data (or the notion of sequence order),
it makes sense to store it in a 3D tensor with ancleartime axis. Each
sample can be encoded as a sequence of vectors (a 2D tensor), and
thus a batch of data will be encoded as a 3D tensor.
A 3D timeseries data tensor
23
The time axis is always the second axis (axis of index 1), by convention.
Let’s look at a few examples:
24
• A dataset of tweets, where we encode each tweet as a sequence of
280 characters out of an alphabet of 128 unique characters. In this
setting, each character can be encoded as a binary vector of size 128
(an all-zeros vector except for a 1 entry at the index corresponding to
the character). Then each tweet can be encoded as a 2D tensor of
shape (280, 128), and a dataset of 1 million tweets can be stored in a
tensor of shape (1000000, 280, 128).
25
Image Data
Images typically have three dimensions: height, width, and color depth.
Although grayscale images (like our MNIST digits) have only a single
color channel and could thus be stored in 2D tensors, by convention
image tensors are always 3D, with a one-dimensional color channel for
grayscale images. A batch of 128 grayscale images of size 256 × 256
could thus be stored in a tensor of shape (128, 256, 256, 1), and a
batch of 128 color images could be stored in a tensor of shape (128,
256, 256, 3).
26
A 4D image data tensor (channels-first convention)
27
There are two conventions for shapes of images tensors: the channels-
last convention (used by TensorFlow) and the channels-first convention
(used by Theano). The TensorFlow machine-learning framework, from
Google, places the color-depth axis at the end: (samples, height, width,
color_depth).
Meanwhile, Theano places the color depth axis right after the batch
axis: (samples, color_depth, height, width). With the Theano
convention, the previous examples would become (128, 1, 256, 256)
and (128, 3, 256, 256). The Keras framework provides support for both
formats.
Theano is a Python library that allows us to evaluate mathematical
operations including multi-dimensional arrays efficiently. It is mostly
used in building Deep Learning Projects. Theano works way faster on
the Graphics Processing Unit (GPU) rather than on the CPU.
28
Video data
Video data is one of the few types of real-world data for which you’ll need
5D tensors. A video can be understood as a sequence of frames, each frame
being a color image. Because each frame can be stored in a 3D tensor
(height, width, color_depth), a sequence of frames can be stored in a 4D
tensor (frames, height, width, color_ depth), and thus a batch of different
videos can be stored in a 5D tensor of shape (samples, frames, height, width,
color_depth).
For instance, a 60-second, 144 × 256 YouTube video clip sampled at 4 frames
per second would have 240 frames. A batch of four such video clips would be
stored in a tensor of shape (4, 240, 144, 256, 3). That’s a total of 106,168,320
values! If the dtype of the tensor was float32, then each value would be
stored in 32 bits, so the tensor would represent 405 MB. Heavy! Videos you
encounter in real life are much lighter, because they aren’t stored in float32,
and they’re typically compressed by a large factor (such as in the MPEG
format).
29