0% found this document useful (0 votes)
17 views5 pages

House Dzone Refcard 383 Neural Network Essentials

The document discusses the essential components of neural networks. It explains that neural networks are composed of layers including an input layer, multiple hidden layers, and an output layer. The key components that connect these layers are perceptrons, activation functions, weights, and biases. Common neural network architectures and applications are also overviewed.

Uploaded by

Fernando
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

House Dzone Refcard 383 Neural Network Essentials

The document discusses the essential components of neural networks. It explains that neural networks are composed of layers including an input layer, multiple hidden layers, and an output layer. The key components that connect these layers are perceptrons, activation functions, weights, and biases. Common neural network architectures and applications are also overviewed.

Uploaded by

Fernando
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

383

CONTENTS

Neural Network
•  What Are Neural Networks?

•  Essentials of Neural Networks


−  Neural Network Components

Essentials
−  Common Neural Architectures
−  Neural Network Model
Optimization
−  Neural Network Chips
•  Conclusion
−  Additional Resources
DR. TUHIN CHATTOPADHYAY
FOUNDER & CEO, TUHIN AI ADVISORY

A massive interconnection of computing cells, also known as ESSENTIALS OF NEURAL NETWORKS


neurons, comprises a neural network, which is the heart and soul of This section will first illuminate the constituent components of the
all artificial intelligence (AI) algorithms. Almost 80 years ago, in 1943, neural network followed by the various network architectures that
neurophysiologist Warren McCullough and mathematician Walter Pitts illustrate how neurons are arranged. The subsequent deliberations
from the University of Illinois at Chicago first proposed the neural on neural network chips and optimizers will demonstrate a seamless
network. After a series of evolutions, the deep neural networks (DNNs) implementation of the network for improved accuracy and speed.
— which are usually those with more than two hidden layers — are now
used for image recognition, image classification, object detection, NEURAL NETWORK COMPONENTS
speech recognition, language translation, natural language processing It's critical to first appreciate the nuts and bolts of a neural network,
(NLP), and natural language generation (NLG). which is composed of the three layers — input, hidden, and output — as
well as the perceptron, activation functions, weights, and biases.
The present Refcard first introduces the concept of a neural network
by drawing an analogy with the biological neural network and INPUT, HIDDEN, AND OUTPUT LAYERS
subsequently walks readers through the essential components of The single input layer accepts external independent variables that help

neural networks, which are considered the building blocks. Common in predicting the desired outcome. All the independent variables of

neural architectures are discussed thereafter, defining the underlying the model are a part of the input layer. The one-to-many interconnected

arrangement of the neurons and the specific purposes they serve. hidden layers are configured based on the purpose that the neural
network is going to serve, like object detection and classification
Then we'll discuss the different AI accelerators specifically designed through visual recognition and NLP.
for the efficient processing of DNN workloads, along with the neural
network optimizers that work on the learning rate to reduce overall
loss and improve accuracy. Finally, we will cover various applications
of DNNs across industries and explore the power of leveraging high-
performance computing (HPC) with AI.

WHAT ARE NEURAL NETWORKS?


Our biological neural network receives sensations from the external
environment through our five senses. Then the neurons transmit
these sensations to our other nerve cells. Artificial neural networks
(ANNs), in the same manner, collect the data as inputs, and the neurons
consolidate all the information. Like the synapses of the biological
neural network, the interconnections in an ANN transform the input
data within the hidden layers. The output in the ANN gets generated
like the mechanisms of the axon in a biological neural network that
channelizes the output.

REFCARD | AUGUST 2022 1


REFCARD | NEUR AL NETWORK ESSENTIALS

Hidden layers are a function of the weighted sum of the inputs/ Table 1
predictors. When the network contains multiple hidden layers, each
TYPE DESCRIPTION FORMULA
hidden unit is a function of the weighted sum of the units of the previous
Rectified A linear function that will max(0,x)
hidden layer.
Linear output the input directly if it
Activation is positive, but if the input is
The output layer, as a function of the hidden layers, contains the
(ReLU) negative, the output is 0
target (dependent) variables. For any image classification, the output
Logistic An "S" curve that generates an 1
layer segregates the input data into multiple nodes as per the desired
(Sigmoid) output between 0 and 1 and is
1 + e –x
objective of the model. expressed as probability

Hyperbolic Like a Sigmoid function but ex – e –x


Figure 1: Input, hidden, and output layers
Tangent symmetrical in nature and
ex + e –x
(TanH) therefore produces better
results; takes real-valued
arguments and transforms
them to the range (–1, 1)

Linear Takes real-valued arguments f(x) = x


and returns them unchanged

Softmax Commonly used for a exp(z i)


classification problem with softmax(z i) =
multiple classes and returns Σ j exp(z j)
the probability of each class

In case there is any confusion about which activation function to use


that best suits your use case, it is advisable to use ReLU since it helps
to overcome the vanishing gradient problem and allows the models to
PERCEPTRON be better trained.
Frank Rosenblatt, an experimental psychologist from Cornell, was
intrigued by the ability of neurons to learn and created a simple WEIGHTS AND BIASES
perceptron with multiple inputs, a single processor, and a singular Weights signify the importance of the corresponding feature (input
output. Thus, the perceptron is a building block of a neural network variable) in predicting the output. They also explain the relationship
that comprises a single layer. between that feature and the target output. The figure below illustrates
that the output is a summation of the x (input) times the connection
Figure 2: Perceptron weight w0 and the b (bias) times the connection weight w1.

Figure 3: Weights and biases

Biases are like constants in a linear function y = mx+c where


ACTIVATION FUNCTIONS m = weights and c = bias. Without a bias, the model will pass through
An activation function, also known as a transfer function, controls the the origin only, and such scenarios are far from the reality. Thus, the
amplitude of a neuron's output. In a deep neural network with multiple bias helps in transposing the line and makes the model more flexible.
hidden layers, the activation function links the weighted sums of
units in a layer to the values of units in the succeeding layer. The same COMMON NEURAL ARCHITECTURES
activation function is used for all the hidden layers. The neural network architecture is composed of neurons, and the way
these neurons are arranged creates the structure that defines how the
Activation functions can be either linear or non-linear, and the most algorithm is going to learn. The arrangements have the input and the
commonly used ones are summarized in Table 1 in the next column. output layers with hidden layers in between that increase the model's

REFCARD | AUGUST 2022 2


REFCARD | NEUR AL NETWORK ESSENTIALS

computational and processing power. The key architectures are RECURRENT NEURAL NETWORKS
discussed below. Recurrent neural networks (RNNs) consider input as time series to
generate output as time series with at least one connection cycle.
RADIAL BASIS FUNCTION
RNNs are universal approximators: They can approximate virtually
The radial basis function (RBF) has a single non-linear hidden layer
any dynamical system. RNNs are used for time series analyses like
called a "feature vector," where the number of neurons in the hidden
stock predictions, sales forecasting, natural language processing and
layer should be more than the number of neurons in the input layer
translation, chatbots, image captioning, and music synthesis.
to cast the data into a higher dimensional space. Thus, RBF increases
the dimension of the feature vector to make the classification highly Figure 6: Recurrent neural networks
separable in high-dimensional space. The figure below illustrates
how the inputs ( x) are transformed to output ( y) with through a single
hidden layer (i.e., feature vector), which connects to x and y through
the weights.

Figure 4: Radial basis function

Long short-term memory (LSTM) — which is composed of forget,


input, and output gates — has several applications including time series
predictions and natural language understanding and generation. LSTM
is primarily used to capture long-term dependencies. The forget gate
decides whether to retain the information from the previous timestamp
or "forget" it. A less complex variant with a smaller number of gates
form the gated recurrent unit (GRU).

RESTRICTED BOLTZMANN MACHINES The GRU is a simplified variant of LSTM where forget and input gates
Restricted Boltzmann machines (RBMs) are unsupervised learning are combined into a single update gate, and the cell state and hidden
algorithms with two-layer neural networks comprising a visible/input state are also combined. Thus, a GRU uses less memory and is therefore
layer and the hidden layer without any intra-layer connections — i.e., no faster than LSTM.
two nodes in the layers are connected, which creates restriction. RBMs
are used for recommendation engines of movies, pattern recognition CONVOLUTIONAL NEURAL NETWORKS
(e.g., understanding handwritten text), and radar target recognition for Convolutional neural networks (CNNs) are widely popular for image

real-time intra-pulse recognition. classification. A CNN assigns weights and biases to objects in the image
for classification purposes. An image comprising a matrix of pixel
Figure 5: Restricted Boltzmann machines values is processed through the convolutional layer, pooling layer, and
fully connected (FC) layer. The pooling layer reduces the spatial size of
the convolved feature.

The final output layer generates a confidence score to determine how


likely it is that an image belongs to a defined class. CNNs are widely
used in Facebook and other social media platforms to monitor content.

DEEP REINFORCEMENT LEARNING


deep RL, short for deep reinforcement learning, creates a perfect
synergy by amalgamating the power of AI and reinforcement learning.
Learning through reinforcement refers to the algorithm of rewarding
for the right decision and punishing for the wrong one. Applications of
deep RL include load balancing, robotics, industrial operations, traffic
control, and recommendation systems.

REFCARD | AUGUST 2022 3


REFCARD | NEUR AL NETWORK ESSENTIALS

GENERATIVE ADVERSARIAL NETWORKS CPUs with MIMD architecture are brilliant in task optimization and are
Generative adversarial networks (GANs) use two neural networks, a more suitable for applications with limited parallelism, such as sparse
generator, and a discriminator. While the generator helps in generating DNNs, RNNs that have dependency on the steps, and small models with
image, voice, and video content, the discriminator classifies them as small effective batch sizes.
either from the domain or generated. The two models are trained for a
A TPU is Google's custom-developed application-specific integrated
zero-sum game until it's proven that the generator model is producing
circuit (ASIC) that is used to accelerate DL workloads. TPUs provide high
reasonable results.
throughput for large batch sizes and are suitable for models that train
NEURAL NETWORK CHIPS for weeks, dominated by matrix computations.
Neural network chips provide the power of computing infrastructure
AI ACCELERATORS FOR DEEP LEARNING INFERENCE
through processing speed, storage, and networking that make the
AI accelerators are required for DL inference for faster computation
chips capable of quickly running neural network algorithms on vast
through parallel computational capabilities. They have high bandwidth
amounts of data. Network chips break the tasks into multiple sub-
memory that can allocate four to five times more bandwidth between
tasks, which can run into multiple cores concurrently to increase the
processors than traditional chips. A couple of leading AI accelerators
processing speed.
for DL inference are AWS Inferentia, a custom-designed ASIC, and Open
TYPES OF AI ACCELERATORS Visual Inference and Neural Network Optimization (OpenVINO), an
Specialized AI accelerators have been designed that vary significantly open-source toolkit for optimizing and deploying AI inference.
depending on the model size, supported framework, programmability,
They both boost deep learning performance for performing tasks like
learning curve, target throughput, latency, and cost. Such hardware
computer vision, speech recognition, NLP, and NLG. OpenVINO uses
includes the graphical processing unit (GPU), vision processing unit
models trained in frameworks including TensorFlow, PyTorch, Caffe,
(VPU), field-programmable gate array (FPGA), central processing unit
and Keras, and optimizes model performance with acceleration from
(CPU), and Tensor Processing Unit (TPU).
CPU, GPU, VPU, and iGPU.
While some accelerators like GPUs are more capable of handling
computer graphics and image processing, FPGAs demand field NEURAL NETWORK MODEL OPTIMIZATION
programming using hardware description languages (HDLs) like VHDL Deep learning model optimizations are used for various scenarios,

and Verilog, and TPUs by Google are more specialized for neural including video analytics as well as computer vision. Since most

network machine learning. Let's look at each of them separately below of these computation-intensive analyses are done in real time, the

to understand their capabilities. following objectives are critical:

•  Faster performance
GPUs were originally developed for graphics processing and are now
widely used for deep learning (DL). Their benefit is parallel processing •  Reduced computational requirements

through the following five architectures: •  Optimized space usage

•  Single instruction, single data (SISD)


For example, OpenVINO provides seamless optimization of neural
•  Single instruction, multiple data (SIMD) networks with the help of the following tools:
•  Multiple instructions, single data (MISD) •  Model Optimizer – Converts models from multiple
•  Multiple instructions, multiple data (MIMD) frameworks to Intermediate Representation (IR); these can
then be concluded with OpenVINO Runtime. OpenVINO
•  Single instruction, multiple threads (SIMT)
Runtime plugins are software components that comprise
A GPU computes faster than a CPU as it devotes more transistors to data full implementation for inference on hardware such as
processing, which help to maximize the memory bandwidth for large CPUs, GPUs, and VPUs.
datasets with medium-to-large models and larger effective batch sizes.
•  Post-Training Optimization Toolkit (POT) – Accelerates
VPUs are optimized DL processors aimed at enabling computer vision the inference speed of IR models by applying post-training
tasks with ultra-low power requirements without compromising automated model quantization through the DefaultQuantization
performance. Thus, VPUs are optimized for deep learning inference by and AccuracyAwareQuantization algorithms.
leveraging the power of pre-trained CNN models. •  Neural Network Compression Framework (NNCF) – Integrates
with PyTorch and TensorFlow to quantize and compress the
An FPGA has thousands of memory units that run parallel architectures
model through pruning. The commonly used compression
at low power consumption. It consists of reprogrammable logic gates
algorithms are 8-bit quantization, filter pruning, sparsity,
to create custom circuits. FPGAs are used for autonomous driving and
mixed-precision quantization, and binarization.
automated spoken language recognition and search.

REFCARD | AUGUST 2022 4


REFCARD | NEUR AL NETWORK ESSENTIALS

CONCLUSION TensorFlow Graph Neural Networks (GNNs) is a library for graph-


Industries, and across specific sectors, have wide applications of AI in structured data using TensorFlow:
the current era. Keeping that in view, in this Refcard, we delved into •  "Introducing TensorFlow Graph Neural Networks" – https://
the roots of AI algorithms, as well as explored the fundamentals of blog.tensorflow.org/2021/11/introducing-tensorflow-gnn.html
neural networks, the architectural intricacies of DNNs, and hardware •  TensorFlow GNN GitHub – https://github.com/tensorflow/gnn
requirements for the best performance. The table below presents some
PyTorch is a Python package to use the power of GPUs and other
of the major applications of deep neural networks across the industry:
accelerators built on a tape-based autograd system:
Table 2 •  PyTorch website – https://pytorch.org

INDUSTRY EXAMPLES •  PyTorch GitHub – https://github.com/pytorch/pytorch

Retail •  Self-checkouts Keras is a high-level neural network application programming


•  Automated measurement of product dimensions interface (API) written in Python for DNNs that can run on top of CNTK,
for space optimization in planograms
TensorFlow, and Theano:
•  Automated stock replenishment
•  Keras website – https://keras.io
Healthcare •  Medical image classification
•  Keras GitHub – https://github.com/keras-team/keras
•  CANcer Distributed Learning Environment (CANDLE)
OpenVINO is an open-source toolkit for optimizing and deploying
Government Crime prevention
AI inference:
Logistics and Robots for material handling and delivery
•  OpenVINO overview – https://www.intel.com/content/www/us/
warehouse
en/developer/tools/openvino-toolkit/overview.html
Manufacturing Quality control and defect classification
•  OpenVINO Documentation – https://docs.openvino.ai/latest/
Automotive Autonomous driving index.html
Financial •  Fraud detection
services •  Anti-money laundering and risk analysis
•  Loan processing
WRITTEN BY DR. TUHIN CHATTOPADHYAY,
•  Clearing and settlement of trades FOUNDER & CEO, TUHIN AI ADVISORY
•  Options pricing
Tuhin spent the first 10 years of his career in
academia and research, teaching business
statistics, analytics, and technology at several
With advances in the sophistication of AI algorithms, it's become reputed B-Schools of India. As a corporate practitioner, Tuhin has
increasingly important to look beyond AI chips to reduce time to a proven record of accomplishment as a transformational leader in
organizations like The Nielsen Company. Currently, he runs his own
model/time to insight and increase accuracy. Since high-performance
consultancy for providing a full suite of AI, analytics, CTO/CAO-as-a-
computing is a shared resource, containerization solutions like Service, and digital transformation services to clients.
Kubernetes pave the way to give users more control and to process
data at scale.

600 Park Offices Drive, Suite 300


The National Center for Supercomputing Applications (NCSA) at Illinois
Research Triangle Park, NC 27709
is spearheading applications of the confluence between AI and HPC 888.678.0399 | 919.678.0300

across industries ranging from genome mapping to autonomous At DZone, we foster a collaborative environment that empowers developers and
tech professionals to share knowledge, build skills, and solve problems through
transportation.
content, code, and community. We thoughtfully — and with intention — challenge
the status quo and value diverse perspectives so that, as one, we can inspire
ADDITIONAL RESOURCES positive change through technology.

TensorFlow is an end-to-end open-source deep learning framework


developed by Google: Copyright © 2022 DZone, Inc. All rights reserved. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or
by means of electronic, mechanical, photocopying, or otherwise, without prior
•  TensorFlow website – https://www.tensorflow.org written permission of the publisher.

•  TensorFlow GitHub – https://github.com/tensorflow/tensorflow

REFCARD | AUGUST 2022 5

You might also like