0% found this document useful (0 votes)
38 views

AI and Deep Learning

This document provides an overview of artificial intelligence and machine learning. It discusses the history and foundations of AI, including key events and people involved in its development. Applications of AI are also examined, such as in healthcare, astronomy, gaming, finance, social media, and data security. The document then explores machine learning, its relationship to AI, and how it allows computers to learn from experience. Finally, applications of machine learning are described, including image recognition, speech recognition, and more.

Uploaded by

sonu samge
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

AI and Deep Learning

This document provides an overview of artificial intelligence and machine learning. It discusses the history and foundations of AI, including key events and people involved in its development. Applications of AI are also examined, such as in healthcare, astronomy, gaming, finance, social media, and data security. The document then explores machine learning, its relationship to AI, and how it allows computers to learn from experience. Finally, applications of machine learning are described, including image recognition, speech recognition, and more.

Uploaded by

sonu samge
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT 1

Foundations and History of Artificial Intelligence


The Roots of AI

 The history of AI has now become a long one. Its birth coincides with the publication of the
question “Can machines think?”. In fact, this phrase used by Alan Turing in the imitation
game is considered the beginning of AI.
 On the other hand, the term owes its partnership to John McCarthy, a computer scientist
who, in 1956, organized the Dartmouth conference in which the term was officially coined.
The initial enthusiasm was followed by the so-called “AI winter”; a period identified from the
1970s to the 1990s, in which problems related to the capabilities of the avail-able
instrumentation have created an abrupt halt.
 Later, thanks to technological advancement, starting from the 2010s, AI is having a new
renaissance. And in this new “AI spring”, AI in Medicine (AIM) had no exceptions.
 This was also possible thanks to the wide-spread health data digitalization, which made it
possible to create big data systems capable of providing a solid basis for intelligent
algorithms.
 Borges do Nascimento et al., analyzing the impact of big data analysis on health indicators
and core priorities described in the World Health Organization (WHO) General Program of
Work 2019/2023 and in the European Program of Work (EPW). The article highlighted how
the accuracy and management of some chronic dis-eases can be improved by supporting
real-time analysis for diagnostic and predictive purposes.

Applications of Artificial Intelligence


Artificial Intelligence has various applications in today's society. It is becoming essential for
today's time because it can solve complex problems with an efficient way in multiple industries,
such as Healthcare, entertainment, finance, education, etc. AI is making our daily life more
comfortable and fast.
1. AI in Astronomy
 Artificial Intelligence can be very useful to solve complex universe problems. AI technology
can be helpful for understanding the universe such as how it works, origin, etc.
2. AI in Healthcare
 In the last, five to ten years, AI becoming more advantageous for the healthcare industry
and going to have a significant impact on this industry.
 Healthcare Industries are applying AI to make a better and faster diagnosis than humans. AI
can help doctors with diagnoses and can inform when patients are worsening so that
medical help can reach to the patient before hospitalization.
3. AI in Gaming
 AI can be used for gaming purpose. The AI machines can play strategic games like chess,
where the machine needs to think of a large number of possible places.
4. AI in Finance
 AI and finance industries are the best matches for each other. The finance industry is
implementing automation, chatbot, adaptive intelligence, algorithm trading, and machine
learning into financial processes.
5. AI in Data Security
 The security of data is crucial for every company and cyber-attacks are growing very rapidly
in the digital world. AI can be used to make your data more safe and secure. Some examples
such as AEG bot, AI2 Platform, are used to determine software bug and cyber-attacks in a
better way.
6. AI in Social Media
 Social Media sites such as Facebook, Twitter, and Snapchat contain billions of user profiles,
which need to be stored and managed in a very efficient way. AI can organize and manage
massive amounts of data. AI can analyse lots of data to identify the latest trends, hashtag,
and requirement of different users.

Intelligent Agents:
An intelligent agent is an autonomous entity which act upon an environment using sensors and
actuators for achieving goals. An intelligent agent may learn from the environment to achieve
their goals. A thermostat is an example of an intelligent agent.
Following are the main four rules for an AI agent:
o Rule 1: An AI agent must have the ability to perceive the environment.
o Rule 2: The observation must be used to make decisions.
o Rule 3: Decision should result in an action.
o Rule 4: The action taken by an AI agent must be a rational action.

Types of Environment 1
There are different sorts of environments, which affect what an agent has to be able to cope
with.
In designing agents, one should always consider the pair of agent and environment together.
• Fully Observable vs. Partially Observable: If an agent’s sensors give it full access to the
complete state of the environment, the environment is fully observable, otherwise it is only
partially observable or unobservable.
• Deterministic vs. Stochastic: If the next state of the environment is completely determined by
the current state and the agent’s selected action, the environment is deterministic. An
environment may appear stochastic if it is only partially observable.

Types of Environment 2
• Episodic vs. Sequential: If future decisions do not depend on the actions an agent has taken,
just the information from its sensors about the state it is in, then the environment is episodic.
• Static vs. Dynamic: If the environment can change while the agent is deciding what to do, the
environment is dynamic.

Types of Environment 3
• Discrete vs. Continuous: If the sets of percept’s and actions available to the agent are finite,
and the individual elements are distinct and well-defined, then the environment is discrete.
• Single Agent vs. Multiagent: Must other entities in the environment be modelled as agents?
Are they cooperative or competitive?

Rationality:
The rationality of an agent is measured by its performance measure. Rationality can be judged
on the basis of following points:
o Performance measure which defines the success criterion.
o Agent prior knowledge of its environment.
o Best possible actions that an agent can perform.
o The sequence of precepts.

Structure of an AI Agent
The task of AI is to design an agent program which implements the agent function. The
structure of an intelligent agent is a combination of architecture and agent program. It can be
viewed as:
Agent = Architecture + Agent program
Following are the main three terms involved in the structure of an AI agent:
Architecture: Architecture is machinery that an AI agent executes on.
Agent Function: Agent function is used to map a percept to an action.
f:P* → A
Agent program: Agent program is an implementation of agent function. An agent program
executes on the physical architecture to produce function f.
At its core, an AI agent is made up of four components: the environment, sensors, actuators,
and the decision-making mechanism.
1. Environment
The environment refers to the area or domain in which an AI agent operates. It can be a physical
space, like a factory floor, or a digital space, like a website.
2. Sensors
Sensors are the tools that an AI agent uses to perceive its environment. These can be cameras,
microphones, or any other sensory input that the AI agent can use to understand what is
happening around it.
3. Actuators
Actuators are the tools that an AI agent uses to interact with its environment. These can be
things like robotic arms, computer screens, or any other device the AI agent can use to change
the environment.
4. Decision-making mechanism
A decision-making mechanism is the brain of an AI agent. It processes the information gathered
by the sensors and decides what action to take using the actuators. The decision-making
mechanism is where the real magic happens.
AI agents use various decision-making mechanisms, such as rule-based systems, expert systems,
and neural networks, to make informed choices and perform tasks effectively.

UNIT 2

Introduction to Machine Learning and its Applications


 Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to
“self-learn” from training data and improve over time, without being explicitly
programmed.
 Machine learning algorithms are able to detect patterns in data and learn from them, in
order to make their own predictions. In short, machine learning algorithms and models
learn through experience.
 Machine learning is an automated process that enables machines to solve problems with
little or no human input, and take actions based on past observations.
 While artificial intelligence and machine learning are often used interchangeably, they are
two different concepts.
 Machine learning can be put to work on massive amounts of data and can perform much
more accurately than humans.
 It can help you save time and money on tasks and analyses, like solving customer pain
points to improve customer satisfaction, support ticket automation, and data mining from
internal sources and all over the internet.

Applications of Machine Learning:


 Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
 Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.
 Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
Real Time location of the vehicle form Google Map app and sensors
Average time has taken on past days at the same time.
 Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product while
internet surfing on the same browser and this is because of machine learning.
 Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids, and steal money in the
middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking
whether it is a genuine transaction or a fraud transaction.

Performance Metrics for Regression


Regression analysis is a subfield of supervised machine learning. It aims to model the
relationship between a certain number of features and a continuous target variable. Following
are the performance metrics used for evaluating a regression model:
 Mean Absolute Error (MAE)
 Mean Squared Error (MSE)
 Root Mean Squared Error (RMSE)
 R-Squared
 Adjusted R-squared
Mean Absolute Error (MAE)

where yᵢ is the actual expected output and ŷᵢ is the model’s prediction.


It is the simplest evaluation metric for a regression scenario and is not much popular compared
to the following metrics.
Say, yᵢ = [5,10,15,20] and ŷᵢ = [4.8,10.6,14.3,20.1]
Thus, MAE = 1/4 * (|5-4.8|+|10-10.6|+|15-14.3|+|20-20.1|) = 0.4
Mean Squared Error (MSE)

Here, the error term is squared and thus more sensitive to outliers as compared to Mean
Absolute Error (MAE).
Thus, MSE = 1/4 * (|5-4.8|^2+|10-10.6|^2+|15-14.3|^2+|20-20.1|^2) = 0.225

Root Mean Squared Error (RMSE)

Since MSE includes squared error terms, we take the square root of the MSE, which gives rise to
Root Mean Squared Error (RMSE).
Thus, RMSE = (0.225)^0.5 = 0.474

R-Squared

R-squared is calculated by dividing the sum of squares of residuals (SSres) from the regression
model by the total sum of squares (SStot) of errors from the average model and then subtract it
from 1.
R-squared is also known as the Coefficient of Determination. It explains the degree to which
the input variables explain the variation of the output / predicted variable.
A R-squared value of 0.81, tells that the input variables explains 81 % of the variation in the
output variable. The higher the R squared, the more variation is explained by the input variables
and better is the model.
Although, there exists a limitation in this metric, which is solved by the Adjusted R-squared.
Performance Metrics for Classification
Classification is the problem of identifying to which of a set of categories/classes a new
observation belongs, based on the training set of data containing records whose class label is
known. Following are the performance metrics used for evaluating a classification model:
 Accuracy
 Precision and Recall
 Specificity
 F1-score
 AUC-ROC
To understand different metrics, we must understand the Confusion matrix. A confusion matrix
is a table that is often used to describe the performance of a classification model (or "classifier")
on a set of test data for which the true values are known.
TN- True negatives (actual 0 predicted 0) &
TP- True positives (actual 1 predicted 1)
FP- False positives (actual 0 predicted 1) &
FN- False Negatives (actual 1 predicted 0)
Consider the following values for the
confusion matrix-
 True negatives (TN) = 300
 True positives (TP) = 500
 False negatives (FN) = 150
 False positives (FP) = 50

Accuracy

Accuracy is defined as the ratio of the number of correct predictions and the total number of
predictions. It lies between [0,1]. In general, higher accuracy means a better model (TP and TN
must be high).
However, accuracy is not a useful metric in case of an imbalanced dataset (datasets with uneven
distribution of classes). Say we have a data of 1000 patients out of which 50 are having cancer
and 950 not, a dumb model which always predicts as no cancer will have the accuracy of 95%,
but it is of no practical use since in this case, we want the number of False Negatives as a
minimum. Thus, we have different metrics like recall, precision, F1-score etc.
Thus, Accuracy using above values will be (500+300)/(500+50+150+300) = 800/1000 = 80%

Precision and Recall

Recall is a useful metric in case of cancer detection, where we want to minimize the number of
False negatives for any practical use since we don't want our model to mark a patient suffering
from cancer as safe. On the other hand, predicting a healthy patient as cancerous is not a big
issue since, in further diagnosis, it will be cleared that he does not have cancer. Recall is also
known as Sensitivity.
Thus, Recall using above values will be 500/(500+150) = 500/650 = 76.92%
Precision is useful when we want to reduce the number of False Positives. Consider a system
that predicts whether the e-mail received is spam or not. Taking spam as a positive class, we do
not want our system to predict non-spam e-mails (important e-mails) as spam, i.e., the aim is to
reduce the number of False Positives.
Thus, Precision using above values will be 500/(500+50) = 500/550 = 90.90%

F1-score

F1-score is a metric that combines both Precision and Recall and equals to the harmonic mean
of precision and recall. Its value lies between [0,1] (more the value better the F1-score).
Using values of precision=0.9090 and recall=0.7692, F1-score = 0.8333 = 83.33%
Data pre-processing
Data pre-processing is the process of transforming raw data into an understandable format. It is
also an important step in data mining as we cannot work with raw data. The quality of the data
should be checked before applying machine learning or data mining algorithms.
Why is Data Pre-processing Important?
Pre-processing of data is mainly to check the data quality. The quality can be checked by the
following:
 Accuracy: To check whether the data entered is correct or not.
 Completeness: To check whether the data is available or not recorded.
 Consistency: To check whether the same data is kept in all the places that do or do not
match.
 Timeliness: The data should be updated correctly.
 Believability: The data should be trustable.
 Interpretability: The understandability of the data.

Data Understanding
 Data Understanding involves several key activities, including reviewing the data, identifying
any problems or inconsistencies in the data, and determining the appropriate techniques for
cleaning and pre-processing the data.
 During this phase, the data analyst must also identify any missing values or outliers and
decide on the best way to handle them.
 This is an important step in ensuring that the data is suitable for analysis and that the results
are accurate and reliable.
 One of the benefits of Data Understanding is that it allows the data analyst to identify any
potential biases or limitations in the data that may impact the results of the analysis.
 For example, if the data is biased towards a particular group or if it contains a large number
of missing values, this can skew the results of the analysis.
 By identifying these issues early on in the process, the data analyst can take the necessary
steps to address them and ensure that the data is of the highest quality.
 Another benefit of Data Understanding is that it allows the data analyst to gain a deeper
understanding of the data and to identify any relationships or patterns that may be of
interest.
 For example, by exploring the data and examining the relationships between different
variables, the data analyst may be able to identify important insights or trends that can be
used to inform the analysis.
 This can lead to more accurate and meaningful results, which can be used to make better
decisions and drive business success.

Neural networks:
 Neural network is the fusion of artificial intelligence and brain-inspired design that reshapes
modern computing.
 Neural networks mimic the basic functioning of the human brain and are inspired by how
the human brain interprets information.
 There are different types of neural networks, from feedforward to recurrent and
convolutional, each tailored for specific tasks.
 They solve various real-time tasks because of its ability to perform computations quickly and
its fast responses.

Artificial Neural Network Perception


Perceptron is one of the simplest Artificial neural network architectures. It was introduced by
Frank Rosenblatt in 1957s. It is the simplest type of feedforward neural network, consisting of a
single layer of input nodes that are fully connected to a layer of output nodes. It can learn the
linearly separable patterns. it uses slightly different types of artificial neurons known as
threshold logic units (TLU).

Basic Components of Perceptron


A perceptron, the basic unit of a neural network, comprises essential components that
collaborate in information processing.
 Input Features: The perceptron takes multiple input features, each input feature
represents a characteristic or attribute of the input data.
 Weights: Each input feature is associated with a weight, determining the significance of
each input feature in influencing the perceptron’s output. During training, these weights
are adjusted to learn the optimal values.
 Summation Function: The perceptron calculates the weighted sum of its inputs using
the summation function. The summation function combines the inputs with their
respective weights to produce a weighted sum.
 Activation Function: The weighted sum is then passed through an activation function.
Perceptron uses Heaviside step function functions. which take the summed values as
input and compare with the threshold and provide the output as 0 or 1.
 Output: The final output of the perceptron, is determined by the activation function’s
result. For example, in binary classification problems, the output might represent a
predicted class (0 or 1).
 Bias: A bias term is often included in the perceptron model. The bias allows the model to
make adjustments that are independent of the input. It is an additional parameter that is
learned during training.
 Learning Algorithm (Weight Update Rule): During training, the perceptron learns by
adjusting its weights and bias based on a learning algorithm. A common approach is the
perceptron learning algorithm, which updates weights based on the difference between
the predicted output and the true output.

Feedforward Neural Networks


 Feedforward neural networks follow only one direction and one path, that is, the result will
always flow from input to output.
 In such a network, loops are not present and the output layer acts distinctively from the
other layers.
 These neural networks are predominantly used in pattern recognition.
 The organizations that use feedforward neural networks are often given names like bottoms
up, top-down, etc.
 All the outputs are weighed and then transferred respectively to the next layer of neurons,
commonly known as the hidden layer.
 The input to this layer can be the output for the next layer and this process goes on.
Generally, one hidden layer is used in such a network.

Feedback Neural Network


 Feedback neural networks do not follow any single path of transferring signals.
 These kinds of networks can have signals travelling from both directions, that is, from input
to output as well as from output to input.
 Feedback neural networks are a bit complex when compared to feedforward neural
networks as signals are constantly travelling from both sides.
 These networks also possess a sense of dynamism.
 Feedback neural networks aim to attend a state of equilibrium and these networks achieve
it by constantly changing themselves and by comparing the signals and units.
 The state of equilibrium is maintained until there is a change in input. When the input
changes, the network tries to achieve a new point of equilibrium.
 Various feedback neural network researchers have defined these networks as recurrent or
interactive networks.
 These are generally associated with organizations that have an individual layer.
 The prime benefit that the feedback network model offers is that the deep neural network
algorithm specifies an actual feedback system and a secondary feedback system acts as a
backup to generate the result.

Introduction and exploration of Sklearn toolkit


Scikit-learn is probably the most useful library for machine learning in Python. The sklearn
library contains a lot of efficient tools for machine learning and statistical modelling including
classification, regression, clustering and dimensionality reduction.
Components of scikit-learn:
Scikit-learn comes loaded with a lot of features. Here are a few of them to help you understand
the spread:
 Supervised learning algorithms: Think of any supervised machine learning algorithm you
might have heard about and there is a very high chance that it is part of scikit-learn. Starting
from Generalized linear models (e.g. Linear Regression), Support Vector Machines (SVM),
Decision Trees to Bayesian methods – all of them are part of scikit-learn toolbox. The spread
of machine learning algorithms is one of the big reasons for the high usage of scikit-learn.
 Cross-validation: There are various methods to check the accuracy of supervised models on
unseen data using sklearn.
 Unsupervised learning algorithms: Again there is a large spread of machine learning
algorithms in the offering – starting from clustering, factor analysis, principal component
analysis to unsupervised neural networks.
 Various toy datasets: This came in handy while learning scikit-learn. I had learned SAS using
various academic datasets (e.g. IRIS dataset, Boston House prices dataset). Having them
handy while learning a new library helped a lot.
 Feature extraction: Scikit-learn for extracting features from images and text (e.g. Bag of
words)

UNIT 3

Introduction to CNN
 A Convolutional Neural Network (CNN) is a deep learning architecture designed for image
analysis and recognition.
 It employs specialized layers to automatically learn features from images, capturing patterns
of increasing complexity.
 These features are then used to classify objects or scenes.
 CNNs have revolutionized computer vision tasks, exhibiting high accuracy and efficiency in
tasks like image classification, object detection, and image generation.
 The fundamental principle of Convolutional Neural Networks (CNNs) is hierarchical feature
learning.
 CNNs process input data, often images, by applying a series of convolutional and pooling
layers.
 Convolutional layers employ small filters to convolve across the input, detecting spatial
patterns.
 Pooling layers down sample the output, retaining important information. This enables the
network to progressively learn hierarchical features, from simple edges to complex object
parts.
 The learned features are then used for classification or other tasks.
 CNNs’ ability to automatically learn and abstract features from data has made them
exceptionally effective in image analysis, with applications spanning various fields.

Components of CNN
The CNN is made up of three types of layers: convolutional layers, pooling layers, and fully-
connected (FC) layers.

Convolution Layers
This is the very first layer in the CNN that is responsible for the extraction of the different
features from the input images. The convolution mathematical operation is done between the
input image and a filter of a specific size MxM in this layer.
The Fully Connected
The Fully Connected (FC) layer comprises the weights and biases together with the neurons and
is used to connect the neurons between two separate layers. The last several layers of a CNN
Architecture are usually positioned before the output layer.
Pooling layer
The Pooling layer is responsible for the reduction of the size(spatial) of the Convolved
Feature. This decrease in the computing power is being required to process the data by a
significant reduction in the dimensions.
There are two types of pooling
1 average pooling
2 max pooling.
A Pooling Layer is usually applied after a Convolutional Layer. This layer’s major goal is to lower
the size of the convolved feature map to reduce computational expenses. This is accomplished
by reducing the connections between layers and operating independently on each feature map.
There are numerous sorts of Pooling operations, depending on the mechanism utilised.
The largest element is obtained from the feature map in Max Pooling. The average of the
elements in a predefined sized Image segment is calculated using Average Pooling. Sum Pooling
calculates the total sum of the components in the predefined section. The Pooling Layer is
typically used to connect the Convolutional Layer and the FC Layer.
Dropout
To avoid overfitting (when a model performs well on training data but not on new data), a
dropout layer is utilised, in which a few neurons are removed from the neural network during
the training phase, resulting in a smaller model.
Activation Functions
They’re utilised to learn and approximate any form of network variable-to-variable association
that’s both continuous and complex.
It gives the network non-linearity. The ReLU, Softmax, and tanH are some of the most often
utilised activation functions.

Rectified Linear Unit (ReLU) Layer


The rectified linear activation function or ReLU is a non-linear function or piecewise
linear function that will output the input directly if it is positive, otherwise, it will output zero.
It is the most commonly used activation function in neural networks, especially in Convolutional
Neural Networks (CNNs) & Multilayer perceptrons.
It is simple yet it is more effective than it's predecessors like sigmoid or tanh.
Mathematically, it is expressed as:

Graphically it is represented as,

The main advantages of the ReLU activation function are:


 Convolutional layers and deep learning: It is the most popular activation function for
training convolutional layers and deep learning models.
 Computational simplicity: The rectifier function is trivial to implement, requiring only a
max() function.
 Representational sparsity: An important benefit of the rectifier function is that it is capable
of outputting a true zero value.
 Linear behaviour: A neural network is easier to optimize when its behaviour is linear or
close to linear.

Basic Architecture
There are two main parts to a CNN architecture
 A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction.
 The network of feature extraction consists of many pairs of convolutional or pooling
layers.
 A fully connected layer that utilizes the output from the convolution process and
predicts the class of the image based on the features extracted in previous stages.
 This CNN model of feature extraction aims to reduce the number of features present in a
dataset. It creates new features which summarises the existing features contained in an
original set of features. There are many CNN layers as shown in the CNN architecture
diagram.

1. Convolutional Layer
This layer is the first layer that is used to extract the various features from the input images. In
this layer, the mathematical operation of convolution is performed between the input image
and a filter of a particular size MxM. By sliding the filter over the input image, the dot product is
taken between the filter and the parts of the input image with respect to the size of the filter
(MxM).
The output is termed as the Feature map which gives us information about the image such as
the corners and edges. Later, this feature map is fed to other layers to learn several other
features of the input image.
The convolution layer in CNN passes the result to the next layer once applying the convolution
operation in the input. Convolutional layers in CNN benefit a lot as they ensure the spatial
relationship between the pixels is intact.
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of this layer
is to decrease the size of the convolved feature map to reduce the computational costs. This is
performed by decreasing the connections between layers and independently operates on each
feature map. Depending upon method used, there are several types of Pooling operations. It
basically summarises the features generated by a convolution layer.
In Max Pooling, the largest element is taken from feature map. Average Pooling calculates the
average of the elements in a predefined sized Image section. The total sum of the elements in
the predefined section is computed in Sum Pooling. The Pooling Layer usually serves as a bridge
between the Convolutional Layer and the FC Layer.
This CNN model generalises the features extracted by the convolution layer, and helps the
networks to recognise the features independently. With the help of this, the computations are
also reduced in a network.
3. Fully Connected Layer
The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is
used to connect the neurons between two different layers. These layers are usually placed
before the output layer and form the last few layers of a CNN Architecture.
In this, the input image from the previous layers are flattened and fed to the FC layer. The
flattened vector then undergoes few more FC layers where the mathematical functions
operations usually take place. In this stage, the classification process begins to take place. The
reason two layers are connected is that two fully connected layers will perform better than a
single connected layer. These layers in CNN reduce the human supervision
4. Dropout
Usually, when all the features are connected to the FC layer, it can cause overfitting in the
training dataset. Overfitting occurs when a particular model works so well on the training data
causing a negative impact in the model’s performance when used on a new data.
To overcome this problem, a dropout layer is utilised wherein a few neurons are dropped from
the neural network during training process resulting in reduced size of the model. On passing a
dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.
Dropout results in improving the performance of a machine learning model as it prevents
overfitting by making the network simpler. It drops neurons from the neural networks during
training.
5. Activation Functions
Finally, one of the most important parameters of the CNN model is the activation function. They
are used to learn and approximate any kind of continuous and complex relationship between
variables of the network. In simple words, it decides which information of the model should fire
in the forward direction and which ones should not at the end of the network.
Introduction to Tensorflow Hub
 TensorFlow Hub is a library for reusable machine learning modules, where
a module contains a self-contained piece of a TensorFlow Graph along with its weights and
assets. So It could be reused for transfer learning across different tasks.
 Very easy to use for there is no need to have a clear understanding about the model
architecture for retraining or inference.
 Just only add a small snippet of code in your program to convert it into a fantastic deep
learning application. Sounds cool, right?
 Installation
Just pip install the tensorflow_hub package.
pip install tensorflow-hub

Advanced CNN Networks – Alex Net, Residual Networks (ResNet)


Alex Net
AlexNet consists of five convolutional layers with max pooling, and three FC layers, with ReLU
activation functions used throughout. In addition to 3×3 filters, it also has 5×5 and 11×11
convolutional filters.
The network uses a dropout technique to reduce overfitting, and employs data augmentation
methods to increase the effective size of the training set. AlexNet’s success demonstrated the
power of deep learning methods in computer vision and led to a surge of interest in the field. It
also popularized the use of GPUs for accelerating the training of deep neural networks.

ResNet
ResNet (short for “Residual Neural Network”) is a family of deep convolutional neural networks
designed to overcome the problem of vanishing gradients that are common in very deep
networks. The idea behind ResNet is to use “residual blocks” that allow for the direct
propagation of gradients through the network, enabling the training of very deep networks.
A residual block consists of two or more convolutional layers followed by an activation function,
combined with a shortcut connection that bypasses the convolutional layers and adds the
original input directly to the output of the convolutional layers after the activation function.
This allows the network to learn residual functions that represent the difference between the
convolutional layers’ input and output, rather than trying to learn the entire mapping directly.
The use of residual blocks enables the training of very deep networks, with hundreds or
thousands of layers, significantly alleviating the issue of vanishing gradients.

Applications of CNN
Convolutional Neural Networks (CNNs) are widely used in various applications such as:
 Object Detection: CNN can detect and locate objects in images or videos.
 Image Segmentation: CNNs can segment images into different regions and tag each region
with a semantic class.
 Video Analytics: CNNs can be used for action detection, object tracking, and video scene
segmentation.
 Natural Language Processing: CNNs can be used for text classification, sentiment analysis,
and language translation tasks.
 Autonomous Systems: CNNs can be used in autonomous systems such as self-driving cars
for lane detection, obstacle detection, and traffic sign recognition.
 Decoding Facial Recognition: One of the main applications of this architecture is facial
recognition. Using this technique, facial images are broken down into multiple components.
The significant components are separating facial features from external features like light or
pose and unique facial features.
 Document rendering: The documents, including handwritten materials, can be analysed
using CNN architectures. The error rate of comparison of documents with available content
is reduced to near zero. Thousands of simultaneous commands run to analyse the
handwritten content using CNN, which is very difficult otherwise.
 Recognition of Speech: Besides Image processing, neuron networks are also useful for
recognizing speech with a huge range of vocabulary and phonics. Emotional detection using
CNN is also a focus area for researchers.

UNIT 4

A brief Overview of Modelling Sequences


 Sequence Modelling is the ability of a computer program to model, interpret, make
predictions about or generate any type of sequential data, such as audio, text etc.
 For example, a computer program that can take a piece of text in English and translate it to
French is an example of a Sequence Modelling program (because the type of data being
dealt with is text, which is sequential in nature).
 An AI algorithm called the Recurrent Neural Network, is a specialised form of the
classic Artificial Neural Network (Multi-Layer Perceptron) that is used to solve Sequence
Modelling problems.
 Recurrent Neural Networks are like Artificial Neural Networks which has loops in them.
 This means that the activation of each neuron or cell depends not only on the current input
to it but also its previous activation values.

Introduction to Recurrent Neural Network


 Neural networks imitate the function of the human brain in the fields of AI, machine
learning, and deep learning, allowing computer programs to recognize patterns and solve
common issues.
 RNNs are a type of neural network that can be used to model sequence data.
 RNNs, which are formed from feedforward networks, are similar to human brains in their
behaviour.
 Simply said, recurrent neural networks can anticipate sequential data in a way that other
algorithms can’t.

 All the inputs and outputs in standard neural networks are independent of one another,
however in some circumstances, such as when predicting the next word of a phrase, the
prior words are necessary, and so the previous words must be remembered.
 As a result, RNN was created, which used a Hidden Layer to overcome the problem.
 The most important component of RNN is the Hidden state, which remembers specific
information about a sequence.
 RNNs have a Memory that stores all information about the calculations. It employs the same
settings for each input since it produces the same outcome by performing the same task on
all inputs or hidden layers.
Advantages of RNNs:
 Handle sequential data effectively, including text, speech, and time series.
 Process inputs of any length, unlike feedforward neural networks.
 Share weights across time steps, enhancing training efficiency.
Disadvantages of RNNs:
 Prone to vanishing and exploding gradient problems, hindering learning.
 Training can be challenging, especially for long sequences.
 Computationally slower than other neural network architectures.

Long Short Term Memory (LSTM)


LSTM (Long Short-Term Memory) is a recurrent neural network (RNN) architecture widely used
in Deep Learning. It excels at capturing long-term dependencies, making it ideal for sequence
prediction tasks.
Unlike traditional neural networks, LSTM incorporates feedback connections, allowing it to
process entire sequences of data, not just individual data points. This makes it highly effective in
understanding and predicting patterns in sequential data like time series, text, and speech.
LSTM has become a powerful tool in artificial intelligence and deep learning, enabling
breakthroughs in various fields by uncovering valuable insights from sequential data.

LSTM Architecture
At a high level, LSTM works very much like an RNN cell. Here is the internal functioning of the
LSTM network. The LSTM network architecture consists of three parts, as shown in the image
below, and each part performs an individual function.

 The first part chooses whether the information coming from the previous timestamp is to be
remembered or is irrelevant and can be forgotten.
 In the second part, the cell tries to learn new information from the input to this cell.
 At last, in the third part, the cell passes the updated information from the current
timestamp to the next timestamp. This one cycle of LSTM is considered a single-time step.
 These three parts of an LSTM unit are known as gates.
 They control the flow of information in and out of the memory cell or lstm cell.
 The first gate is called Forget gate, the second gate is known as the Input gate, and the last
one is the Output gate.
 An LSTM unit that consists of these three gates and a memory cell or lstm cell can be
considered as a layer of neurons in traditional feedforward neural network, with each
neuron having a hidden layer and a current state.

UNIT 5

Introduction to Word Embeddings


 Word embedding is the collective name for a set of language modelling and feature learning
techniques in language modelling where words or phrases from the vocabulary are mapped
to vectors of real numbers.
 The core concept of word embeddings is that every word used in a language can be
represented by a set of real numbers (a vector).
 They have learned representations of text in an n-dimensional space where words that have
the same meaning have a similar representation.
 That means two similar words are placed very closely in vector space almost having similar
vector representations. So, When constructing a word embedding space the goal is to
capture some sort of relationship in that space, be it meaning, morphology, context, or
some other kind of relationship.
Few main characteristics of word embedding are listed below:
 Every word has a unique word embedding (or “vector”), which is just a list of numbers for
each word.
 The word embeddings are multidimensional; typically for a good model, embeddings are
between 50 and 500 in length.
 For each word, the embedding captures the “meaning” of the word.
 Similar words end up with similar embedding values.

Named Entity Extraction using spaCy library


 Named Entity Recognition is the most important, or I would say, the starting step in
Information Retrieval.
 Information Retrieval is the technique to extract important and useful information from
unstructured raw text documents.
 Named Entity Recognition NER works by locating and identifying the named entities
present in unstructured text into the standard categories such as person names,
locations, organizations, time expressions, quantities, monetary values, percentage,
codes etc.
 Spacy comes with an extremely fast statistical entity recognition system that assigns
labels to contiguous spans of tokens.
 Spacy provides an option to add arbitrary classes to entity recognition systems and
update the model to even include the new examples apart from already defined entities
within the model.
 Spacy has the ‘ner’ pipeline component that identifies token spans fitting a
predetermined set of named entities. These are available as the ‘ents’ property of a Doc
object.

Introduction to BERT architecture


 BERT stands for Bidirectional Encoder Representations from Transformers.
 BERT is a “deeply bidirectional” model. Bidirectional means that BERT learns information
from both the left and the right side of a token’s context during the training phase.
 As a result, the pre-trained BERT model can be fine-tuned with just one additional
output layer to create state-of-the-art models for a wide range of NLP tasks.
 BERT is based on the Transformer architecture.
 BERT is pre-trained on a large corpus of unlabelled text including the entire
Wikipedia(that’s 2,500 million words!) and Book Corpus (800 million words).
 This pre-training step is half the magic behind BERT’s success. This is because as we train
a model on a large text corpus, our model starts to pick up the deeper and intimate
understandings of how the language works.

BERT Example
The bidirectionality of a model is important for truly understanding the meaning of a
language. Let’s see an example to illustrate this. There are two sentences in this example
and both of them involve the word “bank”:

BERT captures both the left and right context


If we try to predict the nature of the word “bank” by only taking either the left or the right
context, then we will be making an error in at least one of the two given examples.
One way to deal with this is to consider both the left and the right context before making a
prediction. That’s exactly what BERT does!
And finally, the most impressive aspect of BERT. We can fine-tune it by adding just a couple
of additional output layers to create state-of-the-art models for a variety of NLP tasks.

You might also like