dl-unit-3
dl-unit-3
DL UNIT 3
UNIT -III
Neural Networks: Anatomy of Neural Network, Introduction to Keras: Keras, TensorFlow,
Theano and CNTK, Setting up Deep Learning Workstation, Classifying Movie Reviews: Binary
Classification, Classifying newswires: Multiclass Classification.
You can visualize their interaction as illustrated in figure 3.1: the network, composed of layers that are
chained together, maps the input data to predictions.
The loss function then compares these predictions to the targets, producing a loss value: a measure of how
well the network’s predictions match what was expected.
The optimizer uses this loss value to update the network’s weights.
We’re creating a layer that will only accept as input 2D tensors where the first dimension is 784 (axis 0,
the batch dimension, is unspecified, and thus any value would be accepted). This layer will return a tensor
where the first dimension has been transformed to be 32.
When using Keras, you don’t have to worry about compatibility, because the layers you add to your
models are dynamically built to match the shape of the incoming layer. For instance, suppose you write
the following
The second layer didn’t receive an input shape argument—instead, it automatically inferred its input
shape as being the output shape of the layer that came before.
3.1.2 Models: networks of layers
A deep-learning model is a directed, acyclic graph of layers. The most common instance is a linear stack
of layers, mapping a single input to a single output.
But as you move forward, you’ll be exposed to a much broader variety of network topologies. Some
common ones include the following:
o Two-branch networks
o Multihead networks
o Inception blocks
Picking the right network architecture is more an art than a science; and although there are some best
practices and principles you can rely on, only practice can help you become a proper neural-network
architect.
3.1.3 Loss functions and optimizers: keys to configuring the learning process
Once the network architecture is defined, you still have to choose two more things:
• Loss function (objective function)—The quantity that will be minimized during training. It
represents a measure of success for the task at hand.
• Optimizer—Determines how the network will be updated based on the loss function.
Deep learning is one of the major subfield of machine learning framework. Machine learning is the study
of design of algorithms, inspired from the model of human brain. Deep learning is becoming more popular
in data science fields like robotics, artificial intelligence(AI), audio & video recognition and image
recognition. Artificial neural network is the core of deep learning methodologies. Deep learning is
supported by various libraries such as Theano, TensorFlow, Caffe, Mxnet etc., Keras is one of the most
powerful and easy to use python library, which is built on top of popular deep learning libraries like
TensorFlow, Theano, etc., for creating deep learning models.
Overview of Keras
Keras runs on top of open source machine libraries like TensorFlow, Theano or Cognitive Toolkit (CNTK).
Theano is a python library used for fast numerical computation tasks. TensorFlow is the most famous
symbolic math library used for creating neural networks and deep learning models. TensorFlow is very
flexible and the primary benefit is distributed computing. CNTK is deep learning framework developed by
Microsoft. It uses libraries such as Python, C#, C++ or standalone machine learning toolkits. Theano and
TensorFlow are very powerful libraries but difficult to understand for creating neural networks.
Keras is based on minimal structure that provides a clean and easy way to create deep learning models
based on TensorFlow or Theano. Keras is designed to quickly define deep learning models. Well, Keras is
an optimal choice for deep learning applications.
Features
Keras leverages various optimization techniques to make high level neural network API easier and more
performant. It supports the following features −
Benefits
Keras is highly powerful and dynamic framework and comes up with the following advantages −
Keras:
Keras is a high-level neural networks API written in Python, which serves as an interface for building and
training deep learning models. It's designed to be user-friendly, modular, and extensible, allowing
developers to quickly prototype and experiment with neural networks. Keras provides a simple and intuitive
API for constructing various types of neural network architectures, such as convolutional neural networks
(CNNs), recurrent neural networks (RNNs), and more. It supports both CPU and GPU computations.
TensorFlow:
TensorFlow is an open-source machine learning framework developed by Google Brain. It provides a
comprehensive ecosystem of tools, libraries, and resources for building and deploying machine learning
models. TensorFlow includes a low-level API that allows users to define and execute computational graphs,
as well as a high-level API called TensorFlow Keras, which integrates seamlessly with Keras. In fact, since
TensorFlow 2.0, Keras has become the official high-level API for building models in TensorFlow.
Theano:
Theano was an open-source numerical computation library developed by the Montreal Institute for Learning
Algorithms (MILA). It allowed users to define, optimize, and evaluate mathematical expressions involving
multi-dimensional arrays efficiently. Theano was widely used in the early days of deep learning, and Keras
originally supported it as one of its backends. However, Theano development ceased in 2017, and it's no
longer actively maintained.
CNTK (Microsoft Cognitive Toolkit):
The Microsoft Cognitive Toolkit, formerly known as CNTK, is an open-source deep learning framework
developed by Microsoft. Like TensorFlow and Theano, CNTK provides a scalable and efficient platform
for training deep learning models. Keras also supported CNTK as one of its backends, allowing users to
leverage the capabilities of CNTK while using the Keras API. However, as of my last update, CNTK has
been deprecated, and Microsoft has shifted its focus to supporting TensorFlow as the primary deep learning
framework on its Azure platform.
• Installation of TensorFlow
• Installing Keras
• Optional installation of Theano
I assume that you already have Ubuntu on your computer. If not then please install the latest version of
Ubuntu. This is the most famous open-source Although it is possible to run deep learning Keras models on
Windows, it is not recommended.
Another prerequisite for running deep learning models is a good quality GPU. I will advise you to have an
NVIDIA GPU in your computer for satisfactory performance. It is a necessary condition not must though.
Because running sequence processing using recurrent neural network and image processing through
convolutional neural models in CPU is a difficult proposition.
Such models may take hours to give results when run with CPU. Whereas a modern NVIDIA GPU will
take merely 5-10 minutes to complete the models. In case if you are not interested to invest for GPU an
alternative is using cloud service for computing paying hourly rent.
However, in long run, this using this service may cost you more than upgrading your local system. So, my
suggestion will be if you are serious about deep learning and wish to continue with even moderate use, go
for a good workstation set up.
It created a big problem as I was clueless about why it is happening. In my old computer, I have used it no.
of times without any issue. After scouring the internet for several hours I got the solution. This has to do
with the Python version installed in your computer.
If you are also facing the problem (most likely if using a new computer) then first check the python version
with this command.
$ ls /bin/python*
If it returns python version 2 (for example python 2.7) then use python2-pip command or if it returns higher
version python like python 3.8 then use python3-pip command to install pip. So, now the command will be
as below
$ sudo apt-get install python3-pip
Ubuntu by default uses Python 2 while updating its packages. In case you want to use Python 3 then it needs
to be explicitly mentioned. Only Python means Python 2 for Ubuntu. So, to change the Python version, use
the following code.
# Installing Python3
$ sudo apt-get install python3-pip python3-dev
Installation steps for Python scientific suit in Ubuntu
Here the process discussed are for Windows and Linux Operating systems. For the Mac users they need to
install the Python scientific suit via Anaconda. They can install it from the Anaconda repository. It is
continuously updated document. The documentation provided in Anaconda is very vivid one with every
step in detail.
Installation of the BLAS library
The Basic Liner Algebra Subprogram (BLAS) installation is the first step in setting up your deep learning
workstation. But one thing Mac users should keep in mind that this installation does not include Graphviz
and HDF5 and they have to install them separately.
If we consider the directory nomenclature in the computer filing system, then the “directory” or “folder” is
the “group” and the “files” are the “dataset” in case of HDF5 data format. It has importance in deep learning
in order to save and fetch the Keras model from the disc.
In the next step we will install two packages called Graphviz and pydot-ng. These two packages are
necessary to visualize the Keras model. The codes for installing these two packages are as follow:
# Install graphviz
$ sudo apt-get install graphviz
# Install pydot-ng
$ sudo pip install pydot-ng
These two packages will definitely help you in the execution of the deep learning models you created. But
for the time being, you can skip their installation and proceed with the GPU configuration part. Keras can
also function without these two packages.
# Install opencv
$ sudo apt-get install python-opencv
Setting up GPU for deep learning
Here comes the most important part. As you know that GPU plays an important role in deep learning
modelling. In this section, we are going to set up the GPU support by installing two components namely
CUDA and cuDNN. But to function properly they need NVIDIA GPU.
Although you can run your Keras model even in the CPU, it will take much longer time to train a model to
compare to the time taken by GPU. So, my advice will be if you are serious about deep learning modelling,
then plan to procure an NVIDIA GPU (using cloud service paying hourly rent is also an alternative).
Lets concentrate on the setting up of GPU assuming that your computer already have latest one.
CUDA installation
To install CUDA visit NIVIDIA download page following this link https://developer.nvidia.com/cuda-
downloads. You will land in the following page. It will ask for selecting the OS you are using. As we are
using Ubuntu here (to know why to use Ubuntu as the preferred OS read this article) so click Ubuntu.
Finally the installer type you have to select. Here I have selected the network installer mainly because it
has comparatively smaller download size. I am using my mobile internet for the time being. So, it was the
best option for me. But you can choose any of the other local installation options if there is no constrain of
internet bandwidth. The plus point of local installation is you have to do this only once.
NVDIA.com
To download the specific cuDNN file for your operating system and linux distribution you have to visit the
NIVIDIA download page.
Downloading cuDNN
To download the library, you have to create an account with NVIDIA. It is a compulsory step.
Once you are in the directory where the library has been downloaded (by default it is the download folder
of your computer) run the command below. Use the filename in place of **** in the command.
I hope this article will prove helpful to set up your deep learning workstation. It is indeed a lengthy article
but covers all technicalities which you may need in case of any difficulty during the process. A little
knowledge about every component you are installing also helps you to make any further changes in the
setting.
Let me know how you find this article by commenting below. Please mention if any information I missed
or any doubt you have regarding the process. I will try my best to provide the information.
Two-class classification, or binary classification, may be the most widely applied kind of machine-learning problem.
In this example, you’ll learn to classify movie reviews as positive or negative, based on the text content of the reviews.
3.4.1 The IMDB dataset
You’ll work with the IMDB dataset: a set of 50,000 highly polarized reviews from the Internet Movie Database.
They’re split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of 50% negative and
50% positive reviews.
The argument num_words=10000 means you’ll only keep the top 10,000 most frequently occurring words inthe training
data. Rare words will be discarded. This allows you to work with vector data of manageable size
The variables train_data and test_data are lists of reviews; each review is a list of word indices (encoding a sequence of
words). train_labels and test_labels are lists of 0s and 1s, where 0 stands for negative and 1 standsfor positive:
Because you’re restricting yourself to the top 10,000 most frequent words, no word index will exceed 10,000:
>>> max([max(sequence) for sequence in train_data]) output:
9999
For kicks, here’s how you can quickly decode one of these reviews back to English words:
1. Pad your lists so that they all have the same length, turn them into an integer tensor of shape (samples,
word_indices), and then use as the first layer in your network a layer capable of handling such integer tensors (the
Embedding layer, which we’ll cover in detail later in the book).
2. One-hot encode your lists to turn them into vectors of 0s and 1s. This would mean, for instance, turning the
sequence [3, 5] into a 10,000-dimensional vector that would be all 0s except for indices 3 and 5, which would be 1s.
Then you could use as the first layer in your network a Dense layer, capable of handling floating-point vector data.
Let’s go with the latter solution to vectorize the data, which you’ll do manually for maximum clarity
Finally, you need to choose a loss function and an optimizer. Because you’re facing a binary classification problem
and the output of your network is a probability (you end your network with a single-unit layer with a sigmoid
activation), it’s best to use the binary_crossentropy loss. It isn’t the only viable choice: you could use, for instance,
mean_squared_error. But crossentropy is usually the best choice when you’re dealing with models that output
probabilities.
You’ll now train the model for 20 epochs (20 iterations over all samples in the x_train and y_train tensors), in mini-
batches of 512 samples. At the same time, you’ll monitor loss and accuracy on the 10,000 samples that you set apart.
You do so by passing the validation data as the validation_data argument.
let’s use Matplotlib to plot the training and validation loss side by side (see figure 3.7), as well as the training and
validation accuracy (see figure 3.8).
As you can see, the training loss decreases with every epoch, and the training accuracy increases with everyepoch. That’s
what you would expect when running gradient descent optimization—the quantity you’re trying to minimize should be
less with every iteration.
2.
3.
4. Data Prep
vectorize the input data
def vectorize_sequences(sequences, dimension=10000): results
= np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences): results[i,
sequence] = 1.
return results
x_train = vectorize_sequences(train_data)#1 x_test =
vectorize_sequences(test_data)#2
1. Verctorize training data
Model Defination
model = keras.Sequential([
layers.Dense(64, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(46, activation='softmax')
])
Note about this architecture:
1. We end the model with a Dense layer of size 46. This means for each input sample, the network will output a
46-dimensional vector. Each entry in this vector (each dimension) will encode a different output class.
2. The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the model will
output a probability distribution over the 46 different output classes — for every input sample, the model will
produce a 46-dimensional output vector, where output[i] is the probability thatthe sample belongs to class i. The
46 scores will sum to 1.
3. The best loss function to use in this case is categorical_crossentropy. It measures the distance between two
probability distributions: here, between the probability distribution output by the model and the true
distribution of the labels. By minimizing the distance between these two distributions, you train the model to
output something as close as possible to the true labels.
Epoch 1/20
16/16 [==============================] - 2s 81ms/step - loss: 3.1029 - accuracy:
0.4079 - val_loss: 1.7132 - val_accuracy: 0.6440
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 14/20
16/16 [==============================] - 1s 37ms/step - loss: 0.1438 - accuracy:
0.9574 - val_loss: 0.9254 - val_accuracy: 0.8190
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20