Artificial Intelligence and
Machine Learning for
Business (AIMLB)
Mukul Gupta
(Information Systems Area)
Biological Neurons
• The neural system of the human body consists of
three stages:
• Receptors,
• The receptors receive the stimuli either internally or from
the external world, then pass the information into the
neurons in a form of electrical impulses.
• A neural network,
• The neural network then processes the inputs then makes
proper decision of outputs.
• Effectors
• Finally, the effectors translate electrical impulses from the
neural network into responses to the outside environment.
2
Biological Neurons
3
Biological Neurons
• The brain works like a big computer.
• It processes information that it receives from the senses
and body and sends messages back to the body.
• But the brain can do much more than a machine
can:
• humans think and experience emotions with their brain,
and it is the root of human intelligence.
4
Biological Neurons
5
Biological Neurons
• A neuron mainly consists of three parts:
• Dendrites
• Dendrites are the tree-like structure that receives the
signal from surrounding neurons, where each line is
connected to one neuron.
• Soma
• Cell body, contains the nucleus.
• Axon
• Axon is a thin cylinder that transmits the signal from one
neuron to others.
• At the end of axon, the contact to the dendrites is made
through a synapse.
6
Biological Neurons: Working
7
Biological neurons
https://www.kdnuggets.com/2019/10/introduction-artificial-neural-networks.html 8
Biological neurons - Firing
Captured using two-
photon calcium imaging,
this video depicts
neurons firing in the
brain of a mouse in
response to stimulation
of its whiskers.
Neurons firing in the
brain of a mouse | UCLA
Health Newsroom
https://www.youtube.com/watch?v=4GleKfxW288 9
Perceptron
• Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model in 1958.
10
Artificial Neuron - Perceptron: History
• Frank Rosenblatt created the first perceptron,
simulating first on an IBM® 704 computer, and then
later implementing the perceptron as custom
hardware (called the Mark 1 Perceptron), with an
array of 400 photocells for vision applications.
• The photocells were randomly connected to
neurons, and the weights were implemented as
potentiometers (variable resistors) that could be
adjusted by attached motors as part of the learning
process.
11
Perceptron
• Mark I Perceptron machine, the first implementation
of the perceptron algorithm.
12
Artificial Neuron: Perceptron
13
Artificial Neuron: Perceptron
14
Neuron as a linear classifier
15
Perceptron: Final Look
16
Boolean Functions Using Perceptron
• OR Function — Can Do!
17
Boolean Functions Using Perceptron
• XOR Function — Cannot Do!
• there are no perceptron solutions for non-linearly
separated data. So, the key take away is that a single
perceptron cannot learn to separate the data that are non-
linear in nature.
18
BNN vs ANN
19
ANN: Perceptron [video]
20
A non-linear classifier?
21
Non-linearity = activation function
• A smooth (differentiable) nonlinear function that is
applied after the inner product with the weights
22
Activation Function
• The purpose of the activation function is to introduce
non-linearity into the output of a neuron.
• This is important because most real-world data is
nonlinear, and we want neurons to learn these
nonlinear representations.
• Every activation function (or non-linearity) takes a
single number and performs a certain fixed
mathematical operation on it.
23
Activation Function
24
Activation Function: Softmax
• Logistic function that maps a 𝐾 dimensional vector
(e.g., a set of 𝐾 data inputs) of real values to values
in the range of 0~1 such that all values of the vector
add up to 1
• ML (Machine Learning) NNs (Neural Networks)
often use the Softmax function to enhance the
accuracy of the classification process.
25
Activation Function: Softmax
• Example:
26
Activation Function: Softmax
• Example:
27
Activation Function: Softmax
• Advantages:
• Softmax is optimal for maximum-likelihood estimation of
the model parameters.
• The properties of softmax (all output values in the range
(0, 1) and sum up to 1.0) make it suitable for a probabilistic
interpretation that’s very useful in machine learning.
• Softmax normalization is a way of reducing the influence
of extreme values or outliers in the data without removing
data points from the set.
28
Single Layer Perceptron: Feedforward
• Because SLP is a linear classifier and if the cases
are not linearly separable the learning process will
never reach a point where all the cases are classified
properly.
29
Multi-Class: Non-Linearly Separable
Solution: MLP
30
Multi Layer Perceptron
• Neurons are arranged into networks of neurons.
• A row of neurons is called a layer and one network
can have multiple layers. The architecture of the
neurons in the network is often called the network
topology.
31
Multi Layer Perceptron
• Input Layer
• The bottom (first) layer that takes input from your dataset is
called the visible (or input) layer, because it is the exposed part
of the network.
• These are not neurons, but simply pass the input value to the
next layer.
• Hidden Layer(s)
• Layers after the input layer are called hidden layers because
that are not directly exposed to the input.
• Deep learning can refer to having many hidden layers in your
neural network.
• Output Layer
• The final layer is called the output layer and it is responsible for
outputting a value or vector of values that correspond to the
format required for the problem.
32
Output activation
● Usually, a non-linear activation after each layer
● Typically, ReLU between the layers
● At the output layer we need to consider the task, i.e.,
what kinds of outputs we want, e.g.,
○ Multi-label classification
each K output separate probability (values 0.0-1.0)
→ sigmoid
○ Multi-class classification
probability distribution over K classes (sums to 1.0)
→ softmax
○ Regression softmax:
free range of values → linear
33
NN: A toy example
• A simple network, toy example
1 1 + −1 −2 + 1 = 4
0.98 Sigmoid Function
1 4
1
-2 z
1
1 1 e z
z
-1 -2 0.12
-1
1 z
0
1 −1 + −1 1 + 0 =-2
34
NN: A toy example
• A simple network, toy example (cont’d)
For an input vector [1 −1] , the output is [0.62 0.83]
1 4 0.98 2 0.86 3 0.62
1
-1 -1
-2
1 0 -
2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
1 0.62
𝑓: 𝑅 → 𝑅 𝑓 =
−1 0.83
35
NN: Matrix Operations
• Matrix operations are helpful when working with multidimensional inputs and
outputs
1 4 0.98
1 𝜎 W x + b = a
-2
1
0.12 1 −2 1 1 0.98
-1 -2 𝜎 + = 0.12
-1 −1 1 −1 0
1
0 4
−2
36
Deep Learning
• DL applies a multi-layer process for learning rich hierarchical features (i.e., data
representations)
Input image pixels → Edges → Textures → Parts → Objects
Low-Level Mid- High-Level Trainable
Output
Features Level Features Classifier
Features
37
Deep NN
• Deep NNs have many hidden layers
Fully-connected (dense) layers (a.k.a. Multi-Layer Perceptron or MLP)
Each neuron is connected to all neurons in the succeeding layer
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2
……
……
……
……
……
xN …… yM
Input Layer Output Layer
Hidden Layers
38
What is a model? Recap
• A model is a function specified by a set of parameters 𝜃
𝑓 (𝑥) 0.99
• Example: linear predictor
𝑓 𝑥 = 𝑤𝑇 ⋅ 𝑥 + 𝑏 (𝜃 = 𝑤, 𝑏 )
parameters
39
Training NNs
• The network parameters 𝜃 include the weight matrices and bias vectors from all
layers
𝜃 = 𝑊 ,𝑏 ,𝑊 ,𝑏 ,⋯𝑊 ,𝑏
Often, the model parameters 𝜃 are referred to as weights
• Training a model to learn a set of parameters 𝜃 that are optimal (according to a
criterion) is one of the greatest challenges in ML
x1 0.1 is 1
x2
Softmax
…… y2
0.7 is 2
……
……
……
x256 …… y10
0.2 is 0
16 x 16 = 256
40
Training NNs
• Define a loss function/objective function/cost function ℒ 𝜃 that
calculates the difference (error) between the model prediction and the
true label
E.g., ℒ 𝜃 can be mean-squared error, cross-entropy, etc.
x1 …… y1 0.2 1
x2 …… y2 0.3 0
Cost
……
……
……
……
……
……
x256 …… y10 0.5 ℒ(𝜃) 0
True label “1”
41
Loss Functions
• Neural networks are trained using an optimizer and
we are required to choose a loss function while
configuring our model.
• While optimization, we use a function to evaluate
the weights and try to minimize the error.
• This objective function is our loss function and the
evaluation score calculated by this loss function is
called loss.
• In simple words, losses refer to the quality that is
computed by the model and try to minimize during
model training.
42
Loss Functions: Examples
• Regression Loss Functions
• Mean Squared Error Loss
• Mean Squared Logarithmic Error Loss
• Mean Absolute Error Loss
• Binary Classification Loss Functions
• Binary Cross-Entropy
• Hinge Loss
• Squared Hinge Loss
• Multi-Class Classification Loss Functions
• Multi-Class Cross-Entropy Loss
• Sparse Multiclass Cross-Entropy Loss
• Kullback Leibler (KL) Divergence Loss
43
Thank You