0% found this document useful (0 votes)
116 views58 pages

Neural Network

The document discusses artificial neural networks (ANN), including their structure, activation functions, and applications. ANN are modeled after biological neural networks in the brain and consist of interconnected nodes. They contain an input layer, hidden layers that refine the input, and an output layer. Common activation functions that determine node output include sigmoid, tanh, ReLU, and softmax. Deep learning networks have multiple hidden layers allowing them to recognize complex patterns in high-dimensional data.

Uploaded by

arshia saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views58 pages

Neural Network

The document discusses artificial neural networks (ANN), including their structure, activation functions, and applications. ANN are modeled after biological neural networks in the brain and consist of interconnected nodes. They contain an input layer, hidden layers that refine the input, and an output layer. Common activation functions that determine node output include sigmoid, tanh, ReLU, and softmax. Deep learning networks have multiple hidden layers allowing them to recognize complex patterns in high-dimensional data.

Uploaded by

arshia saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Machine Learning

Artificial Neural Networks

Ms. Qurat-ul-Ain
Outline
 Neural Networks
 Structure Of ANN
 Activation Functions
 ANN Learning Techniques
 ANN Working
 Perceptron
 Back Propagation
 Forward Propagation
 Feed-forward Network
 ANN Advantages & Disadvantages
 ANN Applications
What Is It?
 An artificial neural network is a crude way of trying to simulate the human

brain (digitally)
 Human brain – Approx 10 billion neurons

 Each neuron connected with thousands of others

 Parts of neuron

 Cell body

 Dendrites – receive input signal

 Axons – Give output


Neural Networks
 Networks of processing units (neurons) with connections (synapses)

between them
 Large number of neurons: 1010

 Large connectitivity: 105

 Parallel processing

 Distributed computation/memory

 Robust to noise, failures


Understanding the Brain
 Levels of analysis (Marr, 1982)

1. Computational theory
2. Representation and algorithm

3. Hardware implementation
 Reverse engineering: From hardware to theory

 Parallel processing: SIMD vs MIMD

Neural net: SIMD with modifiable local memory

Learning: Update by training/experience


Introduction
 ANN – made up of artificial neurons

 Digitally modeled biological neuron

 Each input into the neuron has its own weight associated with it

 As each input enters the nucleus (blue circle) it's multiplied by its weight.
Introduction
 The nucleus sums all these new input values which gives us

the activation
 For n inputs and n weights – weights multiplied by input and summed

a = x1w1+x2w2+x3w3... +xnwn
Introduction
 If the activation is greater than a threshold value - the neuron outputs

a signal – (for example 1)


 If the activation is less than threshold the neuron outputs zero.

 This is typically called a step function


Introduction
 The combination of summation and thresholding is called a node

 For step (activation) function – The output is 1 if:

x1w1+x2w2+x3w3... +xnwn > T


Introduction
x1w1+x2w2+x3w3... +xnwn > T

x1w1+x2w2+x3w3... +xnwn -T > 0

Let w0 = -T and x0 = 1

D = x0w0 + x1w1+x2w2+x3w3... +xnwn > 0

Output is 1 if D> 0;
Output is 0 otherwise

w0 is called a bias weight


Structure Of ANN
 Artificial Neural Networks are processing elements either in the form of

algorithms or hardware devices modeled after the neuronal structure of a


human brain cerebral cortex.
 These networks are also simply called Neural Networks. The NN is formed

of many layers.
 The multiple layers that are interconnected are often called “Multilayer

Perceptron”.
 The neurons in one layer are called “Nodes”.

 These nodes have an “Activation function”.


Structure Of ANN
The ANN has 3 main layers:
 Input Layer: The input patterns are fed to the input layers. There is one input

layer.
 Hidden Layers: There can be one or more hidden layers. The processing that

takes place in the inner layers is called “hidden layers”. The hidden layers
calculate the output based on the “weights” which is the “sum of weighted
synapse connections”. The hidden layers refine the input by removing
redundant information and send the information to the next hidden layer for
further processing.
 Output Layer: This hidden layer connects to the “output layer” where the

output is shown.
Comparison Between ML &ANN
Machine Learning Artificial Neural Network
Machine Learning learns from input data ANN are used in machine learning
and discovers output data patterns of algorithms to train the system using
interest. synapses, nodes and connection links.

ML is a subset of the field of artificial ANN is also a part of the Artificial


intelligence. Intelligence field of science and a subset of
machine learning.

ANN is a deep learning science that


analyses the data with logical structures as
ML algorithms learn from data fed to the humans do.
algorithm for decision making purpose.
Some of these algorithms are classification. Some of the ANN learning schemes are
Clustering, association data mining. Hebbian, Perceptron, Back propagation,
etc.

ML algorithms have self-learning ANN algorithms have capabilities to adjust


capabilities but would require human themselves using connection weights if the
intervention if the outcome is inaccurate. outcome comes out to be wrong.
Comparison Between ML &ANN
Machine Learning Artificial Neural Network

ML algorithms requires programming ANN also requires strong skills in


skills, data structure and big data database mathematics, probability , data structures,
knowledge. etc.

ML programs can predict the outcome for ANN can learn and make intelligent
learned set of data and adjust itself for new decision on their own for new data but it is
data. deeper than machine learning.

ANN is applied in finance domain,


ML is applied in eCommerce, healthcare, machine learning and artificial
product recommendations, etc. intelligence.

Learning such as Kohenen, radial bias,


Supervised and Unsupervised learning fall feed-forward neural network fall under
under machine learning. ANN.

Some examples of ML are Google search Some examples of ANN are face
results etc. recognition, image recognition, etc.
Neural Networks & Deep Learning
 Deep Learning networks contain several hidden layers between the input and

the output.
 These networks are distinguished by the depth of the hidden layers in them.

The input data passes through multiple steps before the output is shown.
 These networks differ from the earlier NN such as perceptron which had a

single hidden layer and was called Shallow Networks.


 Each hidden layer in the deep learning network trains the data with certain

features based on the output of the previous layer.


 The data passes through many layers of nonlinear function at the node.

 The ability of the network to learn from unlabeled data is an advantage over the

other learning algorithms.


Neural Networks & Deep Learning
 The more the number of layers, the more complex features can be

recognized as the next layer will perform aggregation of features from the
previous layers.
 Multiple hidden layers in the network increase complexity and

abstraction. This depth is also termed as a feature hierarchy.


 Due to this, deep learning networks are capable of handling high

dimensional data.
 Some examples of deep learning networks include clustering of millions

of images based on its characteristics and similarities, filtering of email


messages, applying filters to messages in CRM, identifying speech, etc.
Neural Networks & Deep Learning
 Deep Learning Networks can be trained on both labeled and unlabeled set

of data. For the unlabeled set of data, the networks such as Boltzmann
selection machines perform automatic feature extraction.
 The network learns automatically by analyzing the input through sampling

and minimizing the difference in output and distribution of input. The


neural network here finds correlations between the features and outcomes.
 The deep learning networks trained on labeled data can be applied to

unstructured data.
 The more the training data fed to the network, the more accurate it will

become.
SVM Vs. Neural Network
 SVM utilizes nonlinear mapping to make the data linear separable, hence

the kernel function is the key.


 However, ANN employs multi-layer connection and various activation

functions to deal with nonlinear problems.


 Neural Network requires a large number of input data if compared to SVM.

 The more data that is fed into the network, it will better generalize better

and accurately make predictions with fewer errors. On the other hand, SVM
and Random Forest require much fewer input data.
 They Are Both Parametric

 They Can Both Embed Non-Linearity


SVM Vs. Neural Network
 Prediction time for neural networks is generally faster than that of SVMs.

 They Both Classify With Comparable Accuracy

 If given as much training and computational power as possible, however,

NNs tend to outperform SVMs.


 Prediction time for neural networks is generally faster than that of SVMs

 The CNN outperformed the SVM classifier in terms of testing accuracy.

 In comparing the overall correctacies of the CNN and SVM classifier, CNN

was determined to have a static-significant advantage over SVM when the


pixel-based reflectance samples used, without the segmentation size
Activation Functions
 Activation functions are attached to each neuron and are mathematical

equations that determine whether a neuron should be activated or not based


on whether the neuron’s input is relevant for the model’s prediction or not.
The purpose of the activation function is to introduce the nonlinearity in the
data.

Various Types of Activation Functions are :


 Sigmoid Activation Function

 TanH / Hyperbolic Tangent Activation Function

 Rectified Linear Unit Function (ReLU)

 Leaky ReLU

 Softmax
Choice of the Activation Function
For classification tasks;
 We prefer to use sigmoid, tanh functions and their combinations.

 Due to the saturation problem, sigmoids and tanh functions are

sometimes avoided.
 As indicated earlier, ReLU function is mostly used (computationally

fast).
 ReLu variants are used to resolve a dead neuron issue (e.g., Leaky ReLu).

 It must be noted that ReLU function is only used in the hidden layers.

 Start with ReLuor leaky/randomized Relu and if the results are not

satisfactory, you may try other activation functions.


Typical Activation Functions
Step function Sign function Sigmoid function Linear function

Y Y Y Y
+1 +1 1 1

0 X 0 X 0 X 0 X
-1 -1 -1 -1

1, if X  0 sign 1, if X  0 sigmoid 1


step
Y  Y  Y  Y linear X
0, if X  0 1, if X  0 1  e X

Controls when unit is “active” or “inactive”


An Artificial Neuron- Summary So Far
Receives n-inputs

Multiplies each input by


its weight

Applies activation
function to the sum of
results

Outputs result
ANN Learning Techniques
1: Supervised Learning
 In this learning, the user trains the model using labeled data. It means

some data is already marked with the correct answers. Supervised learning
can be compared to the learning which is held in the presence of a
supervisor.

2: Unsupervised learning
 In this learning ,the model does not need supervision. It usually deals with

the unlabeled data. User permits the model to work on its own to classify
the data. It sorts the data according to the similarities and patterns
without any prior training to the data.
ANN Working
 The input node takes the information in numerical form.

 The information represents an activation value where each node has given a

number.
 The higher the number, the greater the activation.

 Based on weights and activation function, the activation value passes to the next

node.
 Each node calculates the weighted sum and updates that sum based on the transfer

function(activation function).
 After that, it applies an activation function.

 This function applies to this particular neuron. From that, the neuron concludes if

it needs to forward the signal or not.


 ANN decides the signal extension on the adjustments of the weights.
ANN Working
 The activation runs through the network until it reaches the output

node.
 The output layer shares the information in an understanding way.

 The network uses the cost function to compare the output and

expected output. Cost function refers to the difference between the


actual value and the predicted value.
 Lower the cost function, closer it is to the desired output.

 There are two processes for minimizing the cost function.

 Back Propagation
 Forward Propagation
Perceptron
 In a neural network, we have the same basic principle, except the inputs are

binary and the outputs are binary.


 The objects that do the calculations are perceptron.

 They adjust themselves to minimize the loss function until the model is

very accurate. For example, we can get handwriting analysis to be 99%


accurate.
 Neural networks are designed to work just like the human brain does. In

the case of recognizing handwriting or facial recognition, the brain very


quickly makes some decisions. For example, in the case of facial
recognition, the brain might start with “It is female or male? Is it black or
white? and so forth.
Perceptron
 Perceptron is the first neural network ever created. It consists on 2 neurons

in the inputs column and 1 neuron in the output column.


 This configuration allows to create a simple classifier to distinguish 2

groups.
 Let’s say you want your neural network to be able to return outputs

according to the rules of the “inclusive or”.


Perceptron
 if A is true and B is true, then A or B is true.

 if A is true and B is false, then A or B is true.

 if A is false and B is true, then A or B is true.

 if A is false and B is false, then A or B is false.

 If you replace the “true”s by 1 and the “false”s by 0 and put the 4 possibilities as

points with coordinates on a plan, then you realize the two final groups “false” and
“true” may be separated by a single line. This is what a Perceptron can do.
 On the other hand, if we check the case of the “exclusive or” (in which the case

“true or true” (the point (1,1)) is false), then we can see that a simple line cannot
separate the two groups, and a Perceptron isn’t able to deal with this problem.
 So, the Perceptron is indeed not a very efficient neural network, but it is simple to

create and may still be useful as a classifier.


Perceptron Learning Rule
Implement AND Function Using
Perceptron Network
Solution:
Initialize the weights and bias= 0 and learning rate to 1.
w1=w2=b=0 & a=1
x1 x2 y b
1 1 1 1
1 -1 -1 1
-1 1 -1 1
-1 -1 -1 1
For input pattern 1 1 1  i.e. x1=1, x2=1, t=1
Calculate the Net input
yin= b+w1x1 +w2x2
       = 0+0*1+0*1=0
    y=f(yin)= 0
Check y=t, No, so weight change is required.
w1(new)= w1(old) + atx1
               = 0+1*1*1=1
w2(new)= w2(old) + atx2= 
                0+1*1*1=1
b(new)= b(old) + at= 
            =0+1*1=1                     so, w1=w2=b=1
For input pattern 1 -1 -1  i.e. x1=1, x2=-1, t=-1
Calculate the Net input
yin= b+w1x1 +w2x2
      =1+1*1+1*-1 =1
y = f(yin) = 1
Check y= t, No, so weight change is required.
w1(new)= w1(old) + atx1
       = 1+1*-1'1=0
w2(new)= w2(old) + atx2
          = 1+1*-1*-1=2
b(new) = b(old) + at
           =1+1*-1=0                so, w1=0, w2=2, b=0
For input pattern -1 1 -1 i.e. xl=-1,x2=1, t=-1
Calculate the Net input
yin=b+ w1x1 + w2x2

     =1+0*-1+2*1 =2

y = f(yin) = 1
Check y=t, No, so weight change is required.
w1(new) = w1(old) + atx1

     = 0+1*-1*-1=1

w2(new) = w2(old) + atx2


               =2+1*-1*1 =1

b(new)= b(old)+ at

          = 0+1*-1= -1           so, w1=1, w2=1, b=-1


For input pattern -1 -1 -1 i.e. xl=-1,x2=-1, t=1
Calculate the Net input

yin= b+ wlxl+w2x2

      = -1+1*-1+1*-1= -3
y=f(yin)=-1

Check y= t,  YES, so weight change is not required.


Final weights: w1=1, w2=1, b=-1
Back Propagation
 But here are the question is: How the weights are updated and new weights

are calculated?
 The weights are updated with the help of optimizers. Optimizers are the

methods/ mathematical formulations to change the attributes of neural


networks i.e weights to minimizer the error.

 “Back Propagation is the process of updating and finding the

optimal values of weights or coefficients which helps the model to


minimize the error i.e difference between the actual and predicted
values”
Classification by Back-propagation
 During the learning phase, the network learns by adjusting the

weights so as to be able to predict the correct class label of the input


tuples
Forward Propagation
 The information enters into the input layer and forwards in the network to

get the output value.


 The user compares the value to the expected results.

 The next step is calculating errors and propagating the information

backward.
 This permits the user to train the neural network and modernize the

weights.
 Due to the structured algorithm, the user can adjust weights simultaneously.

 It will help the user to see which weight of the neural network is responsible

for error.
Feed-forward Network
 This network contains an input, hidden, and output layer.

 Signals can move in only one direction. Input data passes to the hidden

layer to perform the mathematical calculations.


 Processing element computes according to the weighted sum of its

inputs.
 The output of the previous layer becomes the input of the following

layer.
 This continues through all the layers and determines the output.

 Eg: Data mining


Feed-forward Network
 This network has feedback paths.

 It means signals can travel in both the direction using loops. Neurons can

have all the possible connections.


 Due to loops, it becomes a dynamic system that changes continuously to

reach in the equilibrium state.


 Eg: Recurrent neural network
A Multi-Layer Feed-Forward
Neural Network
Output vector

Output layer

Hidden layer

wij

Input layer

Input vector: X
Nets Without Hidden Layers
Input layer

Output layer – one or more output nodes

Simplest classifier – Single Neuron


A Motivating Example
 Each day you get lunch at the cafeteria.

 Your diet consists of fish, chips, and drink.

 You get several portions of each

 The cashier only tells you the total price of the meal

 After several days, you should be able to figure out the price of each

portion.
 Each meal price gives a linear constraint on the prices of the portions:
price  x fishw fish  xchipswchips  xdrink wdrink
Solving The Problem
The prices of the portions are like the weights in of a linear

neuron.

We will start with guesses for the weights and then adjust the

guesses to give a better fit to the prices given by the cashier.

w  ( w fish , wchips , wdrink )


The Cashier’s Brain
Price of meal = 850

Linear
Neuron

150 50 100

2 5 3
portions portions portions
of fish of chips of drink
A Model Of The Cashier’s Brain
With Arbitrary Initial Weights
 Residual error = 350
Price of meal = 500  The learning rule is:

wi   xi ( y  yˆ )
 With a learning rate  of 1/35,
the weight changes are +20, +50,
+30
 This gives new weights of 70,
50 50 50
100, 80
 Notice that the weight for chips
2 5 3 got worse!
Portions Of Portions Of Portions Of
Fish Chips Drink
ANN Advantages & Disadvantages
Advantages
 It has a parallel processing ability. It has the numerical strength that performs more than one

task at the same time.


 Failure of one element of the network does not affect the working of the whole system. This

characteristic makes it fault-tolerant.


 A neural network learns from the experience and does not need reprogramming.

Disadvantages of ANN
 Its black-box nature is the most prominent disadvantage of ANN.

 The neural network does not give the proper explanation of determining the output. It reduces

trust in the network.


 The duration of the development of the network is unknown.

 There is no assurance of proper network structure. There is no proper rule to determine the

structure.
Limitations Of Neural Networks
 These networks are black boxes for the user as the user does not have any

roles except feeding the input and observing the output.


 The user is unaware of the training happening in the algorithm.

 These algorithms are rather slow and require many iterations (also called

epochs) to give accurate results.


 This is because the CPU computes the weights, activation function of

each node separately thereby making it consume time as well as


resources.
 It also causes a problem with a large amount of data.
ANN Applications
 Neural Networks have been successfully used in a variety of solutions as shown

below.
 Pattern Recognition: ANN is used in pattern recognition, image recognition,

visualization of images, handwriting, speech, and other such tasks.


 Optimization Problems: Problems such as finding the shortest route, scheduling

and manufacturing where problem constraints are to be satisfied and optimal


solutions need to be achieved are using NNs.
 Forecasting: NN can predict the outcome for situations by analyzing past trends.

Applications such as banking, stock market, weather forecasting use Neural


Networks.
 Control Systems: Control systems such as computer products, chemical products,

and robotics use neural networks.


Implementing ANN in Python
 #Importing necessary Libraries

 import numpy as np

 import pandas as pd

 import tensorflow as tf

 #Loading Dataset data = pd.read_csv("Churn_Modelling.csv")

Generating Matrix of Features (X)

The basic principle while creating a machine learning model is to generate X also
called as Matrix of Features. This X basically contains all our independent variables.
Let’s create the same here.
 #Generating Matrix of Features

 X = data.iloc[:,3:-1].values
Implementing ANN in Python
 #Generating Matrix of Features
 X = data.iloc[:,3:-1].values
 #Generating Dependent Variable Vectors
 Y = data.iloc[:,-1].values
 #Encoding Categorical Variable Gender
 from sklearn.preprocessing
 import LabelEncoder LE1 = LabelEncoder()
 X[:,2] = np.array(LE1.fit_t
 #Encoding Categorical variable Geography
 from sklearn.compose
 import ColumnTransformer
 from sklearn.preprocessing import OneHotEncoder
 Ct=ColumnTransformer(transformers=[('encoder',OneHotEncoder(),
[1])],remainder="passthrough")
 X = np.array(ct.fit_transform(X))ransform(X[:,2]))
Implementing ANN in Python
 #Splitting dataset into training and testing dataset
 from sklearn.model_selection import train_test_split
 X_train,X_test,Y_train,Y_test =
train_test_split(X,Y,test_size=0.2,random_state=0)
 #Performing Feature Scaling
 from sklearn.preprocessing
 import StandardScaler
 sc = StandardScaler()
 X_train = sc.fit_transform(X_train)
 X_test = sc.transform(X_test)
 #Initialising ANN
 ann = tf.keras.models.Sequential()
 #Adding First Hidden Layer
ann.add(tf.keras.layers.Dense(units=6,activation="relu"))
Implementing ANN in Python
 #Adding Second Hidden Layer

 ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

 #Adding Output Layer

 ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))

 #Compiling ANN

 ann.compile(optimizer="adam",loss="binary_crossentropy",metrics=['

accuracy'])
 #Fitting ANN

 ann.fit(X_train,Y_train,batch_size=32,epochs = 100)
Implementing ANN in Python
 #Predicting result for Single Observation

 print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1,

1,50000]])) > 0.5)

Output:

[[False]]
 You can save your created neural network by writing the following

command
 #Saving created neural network

 ann.save("ANN.h5")
References
 Jost Schatzmann (2003) Final Year Individual Project Report Using Self-

Organizing Maps to Visualize Clusters and Trends in Multidimensional Datasets

 Imperial college London 19 June 2003

 Kohonen (1982) Teuvo Kohonen. Self-organized formation of topologically correct

feature maps. Biol. Cybernetics, volume 43, 59-62

 Vesanto (1999) SOM-Based Data Visualization Methods, Intelligent Data

 Analysis, 3:111-26

 Kohonen et al (1996) T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen, "SOM

 PAK: The Self-Organizing Map program package, " Report

 A31, Helsinki University of Technology, Laboratory of Computer and Information

Science, Jan. 1996


References
 Organizing Map in Matlab: the SOM Toolbox. In Proceedings of the Matlab DSP

Conference 1999, Espoo, Finland, pp. 35-40, 1999.

 Wong and Bergeron (1997) Pak Chung Wong and R. Daniel Bergeron. 30 Years of
Multidimensional Multivariate Visualization. In Gregory M.

 Nielson, Hans Hagan, and Heinrich Muller, editors, Scientific Visualization -

Overviews, Methodologies and Techniques, pages 3-33, Los Alamitos, CA, 1997.
IEEE Computer Society Press.

 Honkela (1997) T. Honkela, Self-Organizing Maps in Natural Language Processing,

PhD Thesis, Helsinki, University of Technology, Espoo, Finland [SVG wiki]


http://en.wikipedia.org/wiki/Scalable_Vector_Graphics

You might also like