0% found this document useful (0 votes)
8 views

Artificial Neural Networks

An artificial neural network consists of layers including an input layer, multiple hidden layers, and an output layer. Information flows between the neurons in different layers. Hidden layers can modify incoming data using transfer functions. Convolutional neural networks use hidden layers like convolution and pooling layers. Convolution applies filters to input images to extract features, while pooling reduces dimensionality. Overfitting occurs when a model fits training data but not test data. Techniques like regularization and dropout help prevent overfitting.

Uploaded by

Charef Wided
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Artificial Neural Networks

An artificial neural network consists of layers including an input layer, multiple hidden layers, and an output layer. Information flows between the neurons in different layers. Hidden layers can modify incoming data using transfer functions. Convolutional neural networks use hidden layers like convolution and pooling layers. Convolution applies filters to input images to extract features, while pooling reduces dimensionality. Overfitting occurs when a model fits training data but not test data. Techniques like regularization and dropout help prevent overfitting.

Uploaded by

Charef Wided
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Artificial Neural Networks :

A neural network consists of artificially created groups of neurons, called layers. An artificial neural
network usually consists of an input layer, a number of hidden layers and an output layer, as shown
in the figure below. Each layer consists of a number of artificial neurons, and the flow of information
between the neurons is indicated by the black lines.The first layer shown in the figure is the input
layer. This layer is considered passive, which means that it does not modify the data. The neurons in
the input layer receive values on their input channel and transmit this information to their individual
connections. Unlike the input layer, the hidden layers are considered active layers, meaning that they
can modify the incoming data. In the figure , each value is sent to all hidden neurons (represented by
the arrows), this is called a fully interconnected structure. The number of hidden layers differs for
each network and depends strongly on the type of problem the network is trying to solve. The
hidden layers can also use different types of transfer functions, such as ReLU, Tanh and Sigmoid. The
problem to be solved and the data to be processed are two decisive factors in choosing the right
transfer function.

Figure : An example of a three layered neural network , compromised of an input layer, a hidden
layer and an output layer .

Convolutional neural network

Deep learning networks are generally distinguished from shallow neural networks by their depth. A
neural network that consists of more than one hidden layer is generally defined as a deep neural
network. Convolutional neural networks are similar to the neural network in that they consist of
neurons with learnable weights and biases, and the entire network always expresses a single
differentiable score function. However, the convolutional neural network is generally defined by the
type of hidden layers it uses, such as convolution, fully connected, normalization, and pooling layers.
Using neural networks for image pattern recognition works well for small, single-layer images
consisting of simple shapes. However, using fully connected neural networks on high resolution,
three color channel images would result in a considerable number of parameters to train. Due to the
increased amount of parameters, the model will likely tend to overfit. The neurons in the
convolutional neural network are arranged in a three-dimensional structure: width, height and
depth. Rather than focusing on a single pixel at a time, the convolutional network uses square parcels
of pixels and passes them through a filter, called a kernel. The goal of this process is to transform the
image into a form that is easier to process, without losing the features that are essential for
prediction.

Convolution

The convolution operation is the main building block of a Convolutional neural network. The
convolution takes a filter or kernel of a specified size and slides it over the input with a given stride,
that specifies how many columns the filter should move on the input image. A dot product between
the filter and section of the input (the same size as the filter) is computed in each step, shown in
Figure. The output generated from each step are summed into a feature map which is put together
as a final output .In purely mathematical terms, the operation is defined as a linear operator that
transforms data from one domain to another:
+∞
( f . g ( t ) )= ∫ f ( τ ) g ( t−τ ) dτ
−∞

The input image I is on the left side of Figure 2.5 and the filter K is in the middle. Due to the shape of
the filter, it is annotated as a 3x3 convolution. To produce the feature map, element-wise matrix
multiplication is applied at every location, and the result is the sum, annotated as I * K on the right
side of the Figure.

Figure: An example of a convolution where I is the input, K the kernel and I * K is the output.

Pooling

Pooling, or down-sampling is used to reduce the dimensionality of feature maps. Pooling with a
2x2 feature map, for example, takes four pixel values as input and output a value for those pixels. As
shown in Figure below, 2x2 pooling on an 8x8 image will result in 16 feature maps presented in a 4x4
matrix. There are different types of pooling such as average, mean, max, min and stochastic pooling.
The pooling types are distinguished by how they choose pixel values to create their feature maps.
The less obvious of the pooling types; stochastic pooling picks a value randomly, where high pixel
values have a higher chance to be picked .

Figure: (a) Example of a 2x2 max-pooling with stride = 2, where (b) is the resulting output.

Prevent overfitting

Overfitting models are one of the most common problems encountered by researchers and
companies in the field of deep neural networks. One reason is that many state-of-the-art
architectures have a large number of parameters to learn during training. Overfitting occurs when a
trained model has high validation accuracy on training data, but does not perform well on test data.
Specifically, the trained model learns the behavior of the noise patterns in the training data, which
creates a large difference between the training error and the test error, which is the definition of
overfitting and is visualized in the figure.

Deep neural networks typically have a large number of trainable parameters. A complex model that
contains more parameters than warranted by the amount of training data leads to an overfitted
model. This is one of the major pitfalls of big data, which is becoming increasingly important. In the
figure, the model fits the data perfectly during training, but performs poorly in samples outside the
training data. To avoid overfitting, one can either increase the amount of training data or improve
the generalization ability of the networks by applying different techniques explained below:

Figure: Left - a figure where the line intersects the black dots perfectly(overfitted), and the right
image when the model performs poorly on the test data(green dots), where the error is huge(blue
lines).

1. Regularization is a technique used in order to prevent overfitting. The most commonly used
regularization algorithms are L1, L2 and dropout. The L1 and L2 regularization updates the
cost function by adding a regularization term which decreases the values of the weight
matrices. The assumption is that a network with smaller weights matrices leads to simpler
models. The difference between L2 and L1 regularization is that L2 decay the weights
towards zero, but not exactly zero, while L1 regularization could be reduced to zero .

The L1 regularization penalize the absolute value of the weights, meaning that a neuron with
a high weight will cost more than a neuron with a low weight, therefore L1 could be useful while
trying to compress the model .

where w = each weight in the neural network.

2. Dropout is a commonly used technique to enhance the network’s ability to generalize the
data. The method randomly drop neurons in the neural network and temporarily removes
the incoming and outgoing connections. The dropped neurons contribution to the activation
of downstream neurons are temporally removed. Neighboring units will have to compensate
for the removed unit and handle that specific representation of the missing neuron. The
effect of the dropout is that the network becomes less dependent of the specific weighted
neurons and therefore enhance the networks ability to generalize .

Loss

When optimizing an algorithm, a function is used to evaluate a candidate solution, also called an
objective function. By applying an objective function such as cost and loss functions one seek to
maximize or minimize the objective function. Normally in deep learning, it is preferred to minimize
the objective function which calculate a possible solution, in this case a set of weights with the
lowest cost. The value returned by the loss function is called "loss" . The loss is a measurement that
represent the classification models performance. For example, a model that predict a probability of
0.1 when the observation labels value is 1, would result in a high loss. Some of the functions that are
commonly used as loss functions are cross-entropy and mean square error. When calculating the
cross-entropy, one seeks the set of the model weights that minimize the difference between the
model’s predicted probability distribution given the dataset and its distribution of probabilities .
Depending on the task of the neural network, binary cross-entropy or categorical cross-entropy can
be applied. If the network is trying to classify multiple classes from the dataset, applying categorical
cross-entropy is sufficient. If the dataset contains only two classes one can apply binary cross-entropy
that predicts the probability of the test dataset belonging to one class .
Figure: This figure shows the effect of dropout. The left network is the original neural network. The
right network is the result after applying dropout .

References :

Daphne Cornelisse. A Comprehensive Guide to Convolutional Neural Networks. 2018. url: https : / /
medium . freecodecamp . org / an - intuitive - guide-to-convolutional-neural-networks-
260c2de0a050.

Shaeke Salman and Xiuwen Liu. “Overfitting Mechanism and Avoidance in Deep Neural Networks”.
In: (2019). url: https://www.youtube.com/watch?
v=Qi1Yry33TQE.%20http://arxiv.org/abs/1901.06566.

Shubham Jain. An Overview of Regularization Techniques in Deep Learning (with Python code). 2018.
url: https://www.analyticsvidhya.com/blog/ 2018/04/fundamentals-deep-learning-regularization-
techniques/.

Nitish Srivastava et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Tech.
rep. 2014, pp. 1929–1958. url: http://jmlr.org/ papers/volume15/srivastava14a/srivastava14a.pdf.

Hamed H. Aghdam and Jahani H. Elnaz. Guide to Convolutional Neural Networks - A Practical
Application to Traffic-Sign Detection and Classification. SPRINGER INTERNATIONAL PU, 2017, p. 282.
isbn: 9783319861906.

Ihab S. Mohamed et al. “Detection, localisation and tracking of pallets using machine learning
techniques and 2D range data”. In: (Mar. 2018). url: http: //arxiv.org/abs/1803.11254.

You might also like