uNIT 1
uNIT 1
DEEP LEARNING
Deep learning is an aspect of artificial intelligence (AI) that is concerned with emulating
the learning approach that human beings use to gain certain types of knowledge. At its
simplest, deep learning can be thought of as a way to automate predictive analytics.
Deep learning algorithms seek to exploit the unknown structure in the input distribution
in order to discover good representations, often at multiple levels, with higher-level
learned features defined in terms of lower-level features
Deep learning models are trained by using large sets of labeled data and neural network
architectures that learn features directly from the data without the need for manual feature
extraction.
Deep learning is a specific subset of Machine Learning, which is a specific subset of
Artificial Intelligence.
Artificial Intelligence is the broad mandate of creating machines that can think intelligently.
Machine Learning is one way of doing that, by using algorithms to glean insights from data.
Deep Learning is one way of doing that, using a specific algorithm called a Neural Network
Neural networks are inspired by the structure of the cerebral cortex. At the basic level is the
perceptron, the mathematical representation of a biological neuron.
Like in the cerebral cortex, there can be several layers of interconnected perceptrons.
Input values, or in other words our underlying data, get passed through this “network” of
hidden layers until they eventually converge to the output layer.
The output layer is our prediction: it might be one node if the model just outputs a number, or
a few nodes if it’s a multiclass classification problem.
The hidden layers of a Neural Net perform modifications on the data to eventually feel out
what its relationship with the target variable is.
Each node has a weight, and it multiplies its input value by that weight. Do that over a few
different layers, and the Net is able to essentially manipulate the data into something
meaningful.
To figure out what these small weights should be, we typically use an algorithm called
Backpropagation.
The great reveal about Neural Nets is that they aren’t all that smart – they’re basically just
feeling around, through trial and error, to try and find the relationships in your data.
The hiker doesn’t actually know where she’s going – she just feels around to find a path that
might take her down the mountain. Our algorithm is the same – it’s feeling around to figure
out how to make the most accurate predictions. The final values that each our our nodes in a
Neural Net takes on is a reflection of that process.
In the 1980s, most neural networks were a single layer due to the cost of computation and
availability of data.
Nowadays we can afford to have more hidden layers in our Neural Nets, hence the moniker
“Deep” Learning.
The different types of Neural Networks available for use have also proliferated. Models like
Convolutional Neural Nets, Recurrent Neural Nets, and Long Short-Term Memory are
finding compelling use cases across the board.
Why is Deep Learning Important?
Deep Learning is important for one reason, and one reason only: we’ve been able to achieve
meaningful, useful accuracy on tasks that matter.
Machine Learning has been used for classification on images and text for decades, but it
struggled to cross the threshold – there’s a baseline accuracy that algorithms need to have to
work in business settings.
Deep Learning is finally enabling us to cross that line in places we weren’t able to before.
Applications
Computer vision is a great example of a task that Deep Learning has transformed into
something realistic for business applications. Using Deep Learning to classify and label
images isn’t only better than any other traditional algorithms: it’s starting to be better
than actual humans.
Speech recognition is a another area that has felt Deep Learning’s impact. Spoken
languages are so vast and ambiguous. Baidu – one of the leading search engines of China
– has developed a voice recognition system that is faster and more accurate than
humansat producing text on a mobile phone; in both English and Mandarin.
Deep Learning is important because it finally makes these tasks accessible – it brings previously
irrelevant workloads into the purview of Machine Learning.
Deep Learning Vs Machine Learning
Machine learning offers a variety of techniques and models you can choose based on
your application, the size of data you're processing, and the type of problem you want to
solve.
A successful deep learning application requires a very large amount of data (thousands of
images) to train the model, as well as GPUs, or graphics processing units, to rapidly
process your data.
When choosing between machine learning and deep learning, consider whether you have
a high-performance GPU and lots of labeled data.
Deep learning is generally more complex, so you’ll need at least a few thousand images
to get reliable results. Having a high-performance GPU means the model will take less
time to analyze all those images.
How to Create and Train Deep Learning Models
The three most common ways people use deep learning to perform object classification are:
Training from Scratch
To train a deep network from scratch, you gather a very large labeled data set and design
a network architecture that will learn the features and model.
This is good for new applications, or applications that will have a large number of output
categories.
This is a less common approach because with the large amount of data and rate of
learning, these networks typically take days or weeks to train.
Transfer Learning
Most deep learning applications use the transfer learning approach, a process that
involves fine-tuning a pretrained model.
You start with an existing network, such as AlexNet or GoogLeNet, and feed in new data
containing previously unknown classes.
After making some tweaks to the network, you can now perform a new task, such as
categorizing only dogs or cats instead of 1000 different objects. This also has the
advantage of needing much less data (processing thousands of images, rather than
millions), so computation time drops to minutes or hours.
Transfer learning requires an interface to the internals of the pre-existing network, so it
can be surgically modified and enhanced for the new task.
Feature Extraction
A slightly less common, more specialized approach to deep learning is to use the network
as a feature extractor.
Since all the layers are tasked with learning certain features from images, we can pull
these features out of the network at any time during the training process.
These features can then be used as input to a machine learning model such as support
vector machines (SVM).
HISTORY OF AI
1943 – The first mathematical model of a neural network: Walter Pitts and Warren
McCulloch
Walter Pitts, a logician, and Warren McCulloch, a neuroscientist, gave us that piece of
the puzzle in 1943 when they created the first mathematical model of a neural network.
Published in their seminal work “A Logical Calculus of Ideas Immanent in Nervous
Activity”, they proposed a combination of mathematics and algorithms that aimed to
mimic human thought processes.
Their model – typically called McCulloch-Pitts neurons – is still the standard today
(although it has evolved over the years).
Upon joining the Poughkeepsie Laboratory at IBM, Arthur Samuel would go on to create
the first computer learning programs. The programs were built to play the game of
checkers.
Arthur Samuel’s program was unique in that each time checkers was played, the
computer would always get better, correcting its mistakes and finding better ways to win
from that data. This automatic learning would be one of the first examples of machine
learning.
1957 – Setting the foundation for deep neural networks: Frank Rosenblatt
1959 – Discovery of simple cells and complex cells: David H. Hubel and Torsten Wiesel
In 1959, neurophysiologists and Nobel Laureates David H. Hubel and Torsten Wiesel
discovered two types of cells in the primary visual cortex: simple cells and complex cells.
Many artificial neural networks (ANNs) are inspired by these biological observations in
one way or another. While not a milestone for deep learning specifically, it was definitely
one that heavily influenced the field.
Kelley was a professor of aerospace and ocean engineering at the Virginia Polytechnic
Institute.
In 1960, he published “Gradient Theory of Optimal Flight Paths,” itself a major and
widely recognized paper in his field.
Many of his ideas about control theory – the behavior of systems with inputs, and how
that behavior is modified by feedback – have been applied directly to AI and ANNs over
the years.
They were used to develop the basics of a continuous backpropagation model (aka the
backward propagation of errors) used in training neural networks.
1965 – The first working deep learning networks: Alexey Ivakhnenko and V.G. Lapa
Mathematician Ivakhnenko and associates including Lapa arguably created the first
working deep learning networks in 1965, applying what had been only theories and ideas
up to that point.
Ivakhnenko developed the Group Method of Data Handling (GMDH) – defined as a
“family of inductive algorithms for computer-based mathematical modeling of multi-
parametric datasets that features fully automatic structural and parametric optimization of
models” – and applied it to neural networks.
For that reason alone, many consider Ivakhnenko the father of modern deep learning.
His learning algorithms used deep feedforward multilayer perceptrons using statistical
methods at each layer to find the best features and forward them through the system.
Using GMDH, Ivakhnenko was able to create an 8-layer deep network in 1971, and he
successfully demonstrated the learning process in a computer identification system called
Alpha.
A recognized innovator in neural networks, Fukushima is perhaps best known for the
creation of Neocognitron, an artificial neural network that learned how to recognize
visual patterns.
It has been used for handwritten character and other pattern recognition tasks,
recommender systems, and even natural language processing.
His work – which was heavily influenced by Hubel and Wiesel – led to the development
of the first convolutional neural networks, which are based on the visual cortex
organization found in animals.
They are variations of multilayer perceptrons designed to use minimal amounts of
preprocessing.
In 1982, Hopfield created and popularized the system that now bears his name.
Hopfield Networks are a recurrent neural network that serve as a content-addressable
memory system, and they remain a popular implementation tool for deep learning in the
21st century.
LeCun – another rock star in the AI and DL universe – combined convolutional neural
networks (which he was instrumental in developing) with recent backpropagation
theories to read handwritten digits in 1989.
His system was eventually used to read handwritten checks and zip codes by NCR and
other companies, processing anywhere from 10-20% of cashed checks in the United
States in the late 90s and early 2000s.
Watkins published his PhD thesis – “Learning from Delayed Rewards” – in 1989. In it,
he introduced the concept of Q-learning, which greatly improves the practicality and
feasibility of reinforcement learning in machines.
This new algorithm suggested it was possible to learn optimal control directly without
modelling the transition probabilities or expected rewards of the Markov Decision
Process.
German computer scientist Schmidhuber solved a “very deep learning” task in 1993 that
required more than 1,000 layers in the recurrent neural network.
It was a huge leap forward in the complexity and ability of neural networks.
Support vector machines – or SVMs – have been around since the 1960s, tweaked and
refined by many over the decades.
The current standard model was designed by Cortes and Vapnik in 1993 and presented in
1995.
A SVM is basically a system for recognizing and mapping similar data, and can be used
in text categorization, handwritten character recognition, and image classification as it
relates to machine learning and deep learning.
1997 – Long short-term memory was proposed: Jürgen Schmidhuber and SeppHochreiter
A recurrent neural network framework, long short-term memory (LSTM) was proposed
by Schmidhuber and Hochreiter in 1997.
They improve both the efficiency and practicality of recurrent neural networks by
eliminating the long-term dependency problem (when necessary information is located
too far “back” in the RNN and gets “lost”). LSTM networks can “remember” that
information for a longer period of time.
Refined over time, LSTM networks are widely used in DL circles, and Google recently
implemented it into its speech-recognition software for Android-powered smartphones.
LeCun was instrumental in yet another advancement in the field of deep learning when he
published his “Gradient-Based Learning Applied to Document Recognition” paper in
1998.
The Stochastic gradient descent algorithm (aka gradient-based learning) combined with
the backpropagation algorithm is the preferred and increasingly successful approach to
deep learning.
A professor and head of the Artificial Intelligence Lab at Stanford University, Fei-Fei Li
launched ImageNet in 2009.
As of 2017, it’s a very large and free database of more than 14 million (14,197,122 at last
count) labeled images available to researchers, educators, and students.
Labeled data – such as these images – are needed to “train” neural nets in supervised
learning.
Images are labeled and organized according to Wordnet, a lexical database of English
words – nouns, verbs, adverbs, and adjectives – sorted by groups of synonyms called
synsets.
Between 2011 and 2012, Alex Krizhevsky won several international machine and deep
learning competitions with his creation AlexNet, a convolutional neural network.
AlexNet built off and improved upon LeNet5 (built by YannLeCun years earlier). It
initially contained only eight layers – five convolutional followed by three fully
connected layers – and strengthened the speed and dropout using rectified linear units.
Its success kicked off a convolutional neural network renaissance in the deep learning
community.
It may sound cute and insignificant, but the so-called “Cat Experiment” was a major step
forward.Using a neural network spread over thousands of computers, the team presented
10,000,000 unlabeled images – randomly taken from YouTube – to the system and
allowed it to run analyses on the data.
when this unsupervised learning session was complete, the program had taught itself to
identify and recognize cats, performing nearly 70% better than previous attempts at
unsupervised learning.
It wasn’t perfect, though. The network recognized only about 15% of the presented
objects. That said, it was yet another baby step towards genuine AI.
2014 – DeepFace
Monster platforms are often the first thinking outside the box, and none is bigger than
Facebook.
Developed and released to the world in 2014, the social media behemoth’s deep learning
system – nicknamed DeepFace – uses neural networks to identify faces with 97.35%
accuracy. That’s an improvement of 27% over previous efforts, and a figure that rivals
that of humans (which is reported to be 97.5%).
Cray Inc., as well as many other businesses like it, are now able to offer powerful
machine and deep learning products and solutions.
Using Microsoft’s neural-network software on its XC50 supercomputers with 1,000
Nvidia Tesla P100 graphic processing units, they can perform deep learning tasks on data
in a fraction of the time they used to take – hours instead of days.
1960s: Shallow neural networks
1960-70s: Backpropagation emerges
1974-80: First AI Winter
1980s: Convolution emerges
1987-93: Second AI Winter
1990s: Unsupervised deep learning
1990s-2000s: Supervised deep learning back en vogue
2006s-present: Modern deep learning
TensorFlow is arguably one of the best deep learning frameworks and has been adopted
by several giants such as Airbus, Twitter, IBM, and others mainly due to its highly
flexible system architecture.
The most well-known use case of TensorFlow has got to be Google Translate coupled
with capabilities such as natural language processing, text classification/summarization,
speech/image/handwriting recognition, forecasting, and tagging.
TensorFlow is available on both desktop and mobile and also supports languages such as
Python, C++, and R to create deep learning models along with wrapper libraries.
TensorFlow comes with two tools that are widely used:
1. TensorBoard for the effective data visualization of network modeling and
performance.
2. TensorFlow Serving for the rapid deployment of new algorithms/experiments while
retaining the same server architecture and APIs. It also provides integration with other
TensorFlow models, which is different from conventional practices and can be extended
to serve other model and data types.
2. Caffe: caffe-top-deep-learning-framework
Caffe is a deep learning framework that is supported with interfaces like C, C++, Python,
and MATLAB as well as the command line interface.
It is well known for its speed and transposability and its applicability in modeling
convolution neural networks (CNN).
The biggest benefit of using Caffe’s C++ library (comes with a Python interface) is the
ability to access available networks from the deep net repository Caffe Model Zoo that
are pre-trained and can be used immediately.
When it comes to modeling CNNs or solving image processing issues, this should be
your go-to library.
Caffe’s biggest USP is speed. It can process over 60 million images on a daily basis with
a single Nvidia K40 GPU. That’s 1 ms/image for inference and 4 ms/image for learning
— and more recent library versions are faster still.
Caffe is a popular deep learning network for visual recognition. However, Caffe does not
support fine-granular network layers like those found in TensorFlow or CNTK.
Given the architecture, the overall support for recurrent networks, and language modeling
it's quite poor, and establishing complex layer types has to be done in a low-level
language.
Popularly known for easy training and the combination of popular model types across
servers, the Microsoft Cognitive Toolkit (previously known as CNTK) is an open-source
deep learning framework to train deep learning models.
It performs efficient convolution neural networks and training for image, speech, and
text-based data. Similar to Caffe, it is supported by interfaces such as Python, C++, and
the command line interface.
Given its coherent use of resources, the implementation of reinforcement learning models
or generative adversarial networks (GANs) can be done easily using the toolkit. It is
known to provide higher performance and scalability as compared to toolkits like Theano
or TensorFlow while operating on multiple machines.
Compared to Caffe, when it comes to inventing new complex layer types, users don’t
need to implement them in a low-level language due to the fine granularity of the
building blocks.
The Microsoft Cognitive Toolkit supports both RNN and CNN types of neural models
and thus is capable of handling images, handwriting, and speech recognition problems.
Currently, due to the lack of support on ARM architecture, its capabilities on mobile are
fairly limited.
4. Torch/PyTorch: pytorch-top-deep-learning-framework
Torch is a scientific computing framework that offers wide support for machine learning
algorithms.
It is a Lua-based deep learning framework and is used widely amongst industry giants
such as Facebook, Twitter, and Google. It employs CUDA along with C/C++ libraries for
processing and was basically made to scale the production of building models and
provide overall flexibility.
As of late, PyTorch has seen a high level of adoption within the deep learning framework
community and is considered to be a competitor to TensorFlow.
PyTorch is basically a port to the Torch deep learning framework used for constructing
deep neural networks and executing tensor computations that are high in terms of
complexity.
As opposed to Torch, PyTorch runs on Python, which means that anyone with a basic
understanding of Python can get started on building their own deep learning models.
Given PyTorch framework’s architectural style, the entire deep modeling process is far
simpler as well as transparent compared to Torch.
5. MXNet: mxnet-top-deep-learning-framework
Designed specifically for the purpose of high efficiency, productivity, and flexibility,
MXNet (pronounced as mix-net) is a deep learning framework supported by Python, R,
C++, and Julia.
The beauty of MXNet is that it gives the user the ability to code in a variety of
programming languages. This means that you can train your deep learning models with
whichever language you are comfortable in without having to learn something new from
scratch.
With the backend written in C++ and CUDA, MXNet is able to scale and work with a
myriad of GPUs, which makes it indispensable to enterprises. Case in point: Amazon
employed MXNet as its reference library for deep learning.
MXNet supports long short-term memory (LTSM) networks along with both RNNs and
CNNs.
This deep learning framework is known for its capabilities in imaging,
handwriting/speech recognition, forecasting, and NLP.
6. Chainer :chainer-top-deep-learning-framework
7. Keras: keras-top-deep-learning-framework
Well known for being minimalistic, the Keras neural network library (with a supporting
interface of Python) supports both convolutional and recurrent networks that are capable
of running on either TensorFlow or Theano.
The library is written in Python and was developed keeping quick experimentation as its
USP.
Due to the fact that the TensorFlow interface is a tad bit challenging coupled with the fact
that it is a low-level library that can be intricate for new users, Keras was built to provide
a simplistic interface for the purpose of quick prototyping by constructing effective
neural networks that can work with TensorFlow.
Lightweight, easy to use, and really straightforward when it comes to building a deep
learning model by stacking multiple layers: that is Keras in a nutshell. These are the very
reasons why Keras is a part of TensorFlow’s core API.
The primary usage of Keras is in classification, text generation and summarization,
tagging, and translation, along with speech recognition and more. If you happen to be a
developer with some experience in Python and wish to dive into deep learning, Keras is
something you should definitely check out.
8. Deeplearning4j: dl4j-top-deep-learning-framework
Gradient-Based Optimization
Gradient-based optimization is a class of algorithms used to find the minimum or maximum of a function
by iteratively adjusting parameters based on the gradient. Its Used extensively in machine learning, deep
learning, and various fields with optimization problems.
Objective: Minimize (or maximize) an objective function.
1. Objective Function - Function that measures performance, often a loss or cost function in machine
learning.
2. Gradient - Vector pointing in the steepest increase direction of a function. Negative gradient guides
parameter updates.
3. Learning Rate- Hyperparameter controlling step size in each iteration. & Importance of choosing an
appropriate value.