Learning Parallel Computing Environment Bioengineering
Learning Parallel Computing Environment Bioengineering
Parallel Computing
Environment for
Bioengineering
Systems
Deep Learning and
Parallel Computing
Environment for
Bioengineering
Systems
Edited by
Arun Kumar Sangaiah
Elsevier
3251 Riverport Lane
St. Louis, Missouri 63043
Deep Learning and Parallel Computing Environment for Bioengineering Systems ISBN: 978-0-12-816718-2
Copyright © 2019 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photo-
copying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how
to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the
Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted
herein).
Notices
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information,
methods, compounds or experiments described herein. Because of rapid advances in the medical sciences, in particular, independent
verification of diagnoses and drug dosages should be made. To the fullest extent of the law, no responsibility is assumed by Elsevier,
authors, editors or contributors for any injury and/or damage to persons or property as a matter of products liability, negligence or
otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
v
vi LIST OF CONTRIBUTORS
Deep machine learning is an emergent area in the field and applications of deep learning approaches applied
of computational intelligence (CI) research that is con- to parallel computing environment in bioengineering
cerned with the analysis and design of learning algo- systems. Presently, there are many noteworthy issues
rithms, representations of data, at multiple levels of ab- (health informatics, bio-image informatics energy effi-
straction. Deep learning is a technique for implement- ciency, etc.) that need to be addressed in the context
ing machine learning that provides an effective solution of deep machine learning, parallel computing and bio-
for parallel computing environment in bi-engineering engineering. For the aforementioned reasons, this book
problems. It encompasses artificial intelligence (AI), focuses on addressing a comprehensive nature of cogni-
artificial neural network, reasoning, natural language tive neural computing, parallel computing and on em-
processing that will be helpful to human intelligence phasizing its character in human intelligence and learn-
and decision making process. The heterogeneous par-
ing systems, complex analysis tasks mimicking human
allel computing architectures have been significant for
cognition and learning behavior, prediction and control
real-time bio-engineering applications that needed a de-
of bio-engineering systems. This book intends to give an
sign of a high-level operating system for matching the
processing tasks to the appropriate machine learning overview of state-of-the-art of issues and solution guide-
paradigm in a mixed-machine parallel system. This ef- lines in the new era of deep machine learning paradigm
fort is exerted to investigate the feasibility of a deep ma- and its recent trends of techniques for bioengineering.
chine learning technique for implementing a high-level
operating system for heterogeneous parallel computers.
The new frontier research era and convergence of ORGANIZATION OF THE BOOK
deep machine learning and parallel computing with ref- The volume is organized into 15 chapters. A brief de-
erence to bio-engineering has three main streams need- scription of each chapter is given as follows:
ing to be addressed in the current scenario: bioinfor- Chapter 1 illustrates the parallel processing basic
matics, medical imaging, and sustainable engineering. concepts with examples in order to highlight the sig-
This book is integrating machine learning, cognitive nificance of parallel deep learning. The types of par-
neural computing, parallel computing paradigms, ad- allelization technique are addressed, and the relation
vanced data analytics and optimization opportunities between computational intelligence and parallel deep
to bring more compute to the bio-engineering problems learning, the challenges in combining them together
and challenges. Further, it is important to make a note (parallel computing, graphics processing unit and new
that convergence of parallel computing architectures,
hardware for deep learning in computational intelli-
deep machine learning and its intelligence techniques
gence research) and benefits are discussed in this chap-
has not been adequately investigated from the perspec-
ter.
tive of bioengineering research streams (bioinformatics,
Chapter 2 presents the big data analytics with re-
medical imaging, and sustainable engineering) and its
related research issues. Obviously, these challenges also gards to the Hadoop Big Data framework for storing and
create immense opportunities for researchers. processing big data, described in the context of bioinfor-
The book will present novel, in-depth and funda- matics. The authors have highlighted the importance of
mental research contributions either from a method- the machine learning approach for performing predic-
ological or application perspective in understanding tive and prescriptive analytics. Thus, machine and deep
the fusion of deep machine learning paradigms and learning approaches currently being used in the context
their capabilities in solving a diverse range of problems of big data analytics in the Hadoop framework have also
in bio-engineering and its real-world applications. The been presented, as well as the current uses of such tech-
overall objective of the book is to illustrate the state- niques and tools in bioinformatics are illustrated in this
of-the-art and recent developments in the new theories chapter.
vii
viii PREFACE
Chapter 3 deals with the survey of image fusion learning algorithms for real time medical imaging data
algorithms based on deep convolutional neural net- sets and justified their profound impact on the medical
work, and the results obtained by these methods are field.
interpreted and discussed. The chapter authors have ad- Chapter 9 describes the role of machine learning al-
dressed the significance in combining the outcomes of gorithms on both linear and nonlinear data in address-
different modalities to utilize the complementary infor- ing regression and classification problems. The results
mation from each modality to form a better image. With obtained in this chapter are applicable to address the
image fusion, the multi-sensor data with complemen- real time problems like classification and regression.
tary information about the particular region are com- The chapter results state that support vector machine
paratively analyzed in this chapter. (SVM) performs better than all other classification al-
Chapter 4 illustrates the necessity of integrating ma- gorithms and the neural network (NN) approach gives
chine and deep learning methodology with the diagno- the lowest mean squared error (MSE) in the regression
sis of brain tumor, and recent segmentation and classifi- problem.
cation techniques on magnetic resonance images (MRI) The main objective of Chapter 10 is to consolidate
are reviewed. This chapter addressed the current trends the benefits of the classifications using singular value
in the grading of brain tumor with a focus on gliomas decomposition (SVD-QR) and limited memory sub-
which include astrocytoma. The current state-of-the-art, space optimization SVD (LMSVD-QR) calculations for
software packages, evaluation and validation metrics preprocessing of profound learning in multilayer neu-
used in different approaches are discussed, along with ral systems. This chapter has indicated why singular
integration into the clinical environment. value decomposition (SVD)-QR calculation is required
Chapter 5 provides the essentials of deep learning for preprocessing of profound learning for vast scale in-
methods with convolutional neural networks and an- formation input.
alyzes their achievements in medical image analysis Chapter 11 presents the challenges in storing and
such as deep feature representation, detection, segmen- processing big data using Hadoop and Spark. The au-
tation, classification, and prediction. This chapter re- thors have highlighted the new analytical platforms
views the different deep learning convolution neural such as Hadoop, Spark, along with MapReduce pro-
network methods. The features, benefits, and applica- gramming. The objective of this chapter is to make the
tions of convolutional neural network methods are also readers understand the challenges in storing and pro-
discussed in this chapter. cessing big data and how to use different big data frame-
Chapter 6 investigated how deep learning could be works effectively to store and process big data.
applied to the classification of images on the CIFAR- Chapter 12 presents a novel mixed-integer linear
10 database. The chapter authors have specified deep programming (MILP) model to consider a location
learning technologies that are becoming more acces- routing problem (LRP) for multiple perishable prod-
sible for corporations and individuals and give better ucts with vehicles having multiple trips, intermediate
results than the convolution neuron network. In this depots, and soft time windows. To cope with the solu-
chapter, deep convolutional neural networks are used tion complexity of the problem, an efficient biography
for classification and GPU technology is used for paral- based optimization algorithm (BBO) is investigated in
lel processing. this chapter.
Chapter 7 discusses the basic deep learning network Chapter 13 gives a brief overview of evolutionary
models and outlines some of the applications in health procedures, systolic arrays and methods to transform an
informatics. In this chapter, biomedical data can be ef- iterative algorithm into architecture. Significance of pa-
ficiently processed by deep learning networks, which in rameters derived from GTDM is mentioned, and the pa-
turn increase the predictive power for many specific ap- rameters involved in selecting the best of addressed al-
plications in the health informatics domain. Thus, this gorithms are clearly justified. The chapter authors have
chapter highlights that deep learning algorithms can revealed that ant colony optimization performed the
provide better outcomes and prediction in health infor- best among the selected evolutionary algorithms in ad-
matics with the integration of advanced parallel proces- dressing a systolic array mapping of grey tone distribu-
sors. tion matrix (GTDM) computation.
Chapter 8 illustrates the role of deep learning and The ultimate aim of Chapter 14 is to design a com-
semi-supervised and transfer learning algorithms for plete combinatorial model for the results from vari-
medical imaging. The chapter authors have used classifi- ous screening experiments involving multimodal deep
cation of supervised, semi-supervised and unsupervised learning technique that projects into better solution of
PREFACE ix
autism identification. This chapter mainly focuses on plied to parallel computing environment in bioengi-
the emotional sequence identification of children who neering systems. The book aims to present concepts
are autism spectrum disorder (ASD) positive and ASD and technologies that are successfully used in the im-
negative (i.e., normal TD). plementation of today’s intelligent data-centric critical
Chapter 15 gives parallel machine learning and deep systems and multimedia cloud big data, having a good
learning approaches for bioinformatics. The authors chance to be used in future computing systems. The
have outlined the deep learning and other deep-based book will constitute teaching material for organizing
representative learning algorithms which have been the course titled Computational Intelligence for New
applied successfully in image understanding, speech Computing Environments, hence suitable for university
recognition, and text classification, etc. level courses as well as research scholars.
This book delivers a significant forum for the technical ploring the significance of deep learning systems and
advancement of deep learning in parallel computing en- bio-engineering in the next paradigm of computing.
vironment across bio-engineering diversified domains This book gives an intensive and in-depth cover-
and its applications. Pursuing an interdisciplinary ap- age of the use of deep learning in the field of bio-
proach, it focuses on methods used to identify and ac- engineering systems and various interesting findings.
quire valid, potentially useful knowledge sources. Man- This book is a significant step in this field’s maturation
aging the gathered knowledge and applying it to mul- and will serve to unify, advance, and challenge the scien-
tiple domains including health care, social networks, tific community in many important ways. In addition,
mining, recommendation systems, image processing, this book is well suited for researchers exploring the sig-
pattern recognition and predictions using deep learning nificance of deep-learning systems and bio-engineering.
This book integrates in fact the core ideas of deep learn-
paradigms is the major strength of this book. Effective
ing and its applications in bio-engineering application
data and knowledge management has become a key
domains, to be accessible to all scholars and academi-
to the success of engineering applications and business
cians. The proposed techniques and concepts in this
organizations that can offer a substantial competitive
book can be extended in the future to accommodate
edge. changing business organizations’ needs, as well as prac-
The book “Deep Learning and Parallel Computing titioners’ innovative ideas.
Environment for Bioengineering Systems” is focusing on I am pleased to appreciate the editors and authors on
domain experts and developers, who want to under- their accomplishment, and hope that the readers will
stand and explore the application of deep learning find the book worthwhile and a source of inspiration in
and computational intelligence aspects (opportunities their research and professional activity.
and challenges) for the design and development of
parallel computing environment in the context of bio- Prof. Vincenzo Piuri, PhD
engineering systems era and their related applications IEEE Fellow
such as smarter health care, homeland security, com- Department of Computer Science
putational biology, robotics, and intelligent assistance. University of Milan
This book is a significant collection of 15 chapters ex- Milan, Italy
xi
Acknowledgment
The editors would like to recognize the help of all the of the reviewers who helped us to refine the context of
people involved in this project and especially the au- this book. Most of the authors also served as referees;
thors and reviewers that took part in the peer review we highly appreciate their double task.
process. Without their support, this book would not Finally, our gratitude goes to all of our friends and
have become a reality. colleagues, who were so generous with their encourage-
First, the editors would like to thank each of the au- ment, advice and support.
thors for their contributions. Our sincere gratitude goes
to the chapter’s authors who contributed their time and
expertise to this book. Arun Kumar Sangaiah
Second, the editors wish to acknowledge the valu- School of Computing Science and Engineering
able contributions of the reviewers regarding the im- Vellore Institute of Technology
provement of quality, coherence, and content presen- Vellore, Tamil Nadu, India
tation of chapters. We deeply appreciate the comments
xiii
Contents
xv
xvi CONTENTS
6.7.5 Output of the Model, 116 8.6 Deep Learning Architecture, 141
6.7.6 Training a Model 8.6.1 Convolution Neural Network
Using Multiple GPU (CNN), 141
Cards/CUDA, 118 8.6.2 Recurrent Neural
6.8 Conclusions, 120 Network, 141
References, 121 8.6.3 Deep Neural Network, 142
8.7 Deep Learning in Medical Imaging
7 Efficient Deep Learning Approaches [5], 142
for Health Informatics, 123 8.7.1 Diabetic Retinopathy, 142
T.M. Navamani, ME, PhD 8.7.2 Cardiac Imaging, 143
7.1 Introduction, 123 8.7.3 Tumor Classification in
Machine Learning Vs. Deep Homogeneous Breast
Learning, 123 Tissue, 143
7.2 Deep Learning Approaches, 125 8.8 Developments in Deep Learning
Deep Autoencoders, 125 Methods, 144
Recurrent Neural Networks 8.8.1 Black Box and Deep
(RNNs), 126 Learning, 144
Restricted Boltzmann Machine 8.8.2 Semi-Supervised and Transfer
(RBM), 127 Learning Algorithms, 144
Deep Belief Network, 127 8.8.3 Applications of
Deep Boltzmann Machine (DBM), 127 Semi-Supervised Learning in
Convolutional Neural Network, 128 Medical Imaging, 145
7.3 Applications, 130 8.8.4 Method, 145
Translational Bioinformatics, 130 8.9 Transfer Learning, 146
Clinical Imaging, 130 8.9.1 Transfer Learning in Image
Electronic Health Records, 130 Data, 146
Genomics, 131 8.9.2 Transfer Learning Technique
Mobile Devices, 131 for the Detection of Breast
7.4 Challenges and Limitations, 131 Cancer, 149
7.5 Conclusions, 134 References, 151
References, 134
9 Survey on Evaluating the
8 Deep Learning and Performance of Machine Learning
Semi-Supervised and Transfer Algorithms: Past Contributions and
Learning Algorithms for Medical Future Roadmap, 153
Imaging, 139 Syed Muzamil Basha, MTech,
R. Mohanasundaram, PhD, Dharmendra Singh Rajput, PhD
Ankit Sandeep Malhotra, BE,
R. Arun, ME, 9.1 Introduction, 153
P.S. Periasamy, PhD 9.2 Methodology, 154
8.1 Introduction, 139 9.3 Linear Regression, 156
8.2 Image Acquisition in the Medical 9.4 Nonlinear Regression, 156
Field, 139 9.4.1 Support Vector Machine, 156
8.3 Deep Learning Over Machine 9.4.2 K-Nearest Neighbors, 157
Learning, 140 9.4.3 Neural Network, 158
8.4 Neural Network Architecture, 140 9.5 Nonlinear Decision Tree
8.5 Defining Deep Learning, 140 Regression, 158
CONTENTS xix
Parallel Computing,
Graphics Processing Unit (GPU) and
New Hardware for Deep Learning in
Computational Intelligence Research
M. MADIAJAGAN, MS, PHD • S. SRIDHAR RAJ, BTECH, MTECH
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00008-7 1
Copyright © 2019 Elsevier Inc. All rights reserved.
2 Deep Learning and Parallel Computing Environment for Bioengineering Systems
tency and other factors make this approach untenable. 1.2.1 Parallel Processing Concepts
As others have noted, GPUs are designed to handle Parallel processing concept arises to facilitate the anal-
high-dimensional matrices, which is a feature of many ysis of huge data and acquire meaningful information
ML models. TPUs are designed specifically for ML mod- from it. Speech processing, medical imaging, bioinfor-
els and don’t include the technology required for image matics and many similar fields are facing the difficulty
display. of analyzing huge amounts of complex data. There are
some problems in which the run-time complexity can-
1.1.3 Computational Intelligence not be improved even with many processors.
Computational intelligence deals with the automatic Parallel algorithms are called efficient when their
adaptation and organizes accordingly with respect to run-time complexity divided by the number of proces-
the implementation environment. By possessing the at- sors is equal to the best run-time complexity in sequen-
tributes such as knowledge discovery, data abstraction, tial processing. Not everything should be parallelized.
association and generalization, the system can learn and User experience, for example, is a serial task. If one
deal with new situations in the changing environments. thread redraws the screen when some other thread is
Silicon-based computational intelligence comprises hy- trying to click something that cannot be encouraged for
brids of paradigms such as artificial neural networks, parallel processing, it has to be sequential. Sometimes
fuzzy systems and evolutionary algorithms, augmented sequential processing is faster than parallel where the
with knowledge elements, which are often designed to latter requires gathering all the data in one place, but
mimic one or more aspects of carbon-based biological the former does not have to gather data [4].
intelligence [3]. In single processor systems, a set of inputs are given
to the processor and it returns the output after pro-
1.1.4 GPU, Deep Learning and cessing. The performance of the processor can be made
Computational Intelligence faster by increasing the frequency limit. But, there is a
GPU is basically based on parallel processing in na- certain limit beyond which the processor emits a huge
ture, which helps in improving the execution time amount of heat. The amount of heat emitted by the elec-
of the deep learning algorithms. By imparting the trons moving through the processor is very high, hence
parallel deep learning using GPU, all the computa- there is a certain frequency limit beyond which the pro-
tional intelligence research applications which involves cessor melts down.
images, videos, etc., can be trained at a very fast
rate and the entire execution time is reduced drasti-
cally.
The rest of the chapter is organized as follows. In
Section 1.2, we discuss the role and types of paralleliza-
tion in deep learning. Section 1.3 tells us the role of
GPU in parallel deep learning. Section 1.4 presents the
data flow of parallelization and a numerical example
on how the parallelization works in deep learning with FIG. 1.1 Single processor execution.
a real time application. Section 1.5 shows the imple-
mentation details and screenshots, while Section 1.6 To rectify the issue shown in Fig. 1.1, we move to par-
summarizes the entire contents discussed in the above allel processing where more than one processor is used
sections. to process the data. This way the workload is divided
between multiple processors. See Fig. 1.2.
Parallel computing has its own disadvantage such
1.2 DEEP LEARNING AND
as dependency between processors, i.e., one processor
PARALLELIZATION
might wait for the results of the process running on an-
In this section, we will discuss what is parallel pro- other processor. In modern computing, we address the
cessing and the algorithms which are suitable for deep number of processors by using the term core. Dual-core,
learning through analysis. The analysis is based on the multi-core, i3, i5, i7, etc., all denote the number of pro-
time and throughput of the algorithms. cessors.
CHAPTER 1 Parallel Computing, GPU and New Hardware for Deep Learning 3
1.2.2.1 Understanding the Needs and Benefits 1.2.3 Parallelization Methods to Distribute
of Parallel Algorithms in Deep Learning Computation Across Multiple
Machines
• Neural networks take a huge number of parameters
The methodologies to perform parallelized or dis-
from the datasets and learn to define the model. This
tributed computation on multi-core machines are given
learning of many parameters amounts to a very long
below.
computation time. The computation time is consid-
ered on order of days, and “q” denotes the number of 1.2.3.1 Local Training
cores in the processor. The VGGNet application takes
Local training means that the data is being trained on
about 10 hours for training even on an 8q machine.
a single machine which has multi-core processors. The
This is a computationally intensive process which
entire datasets are loaded onto the same machine, the
takes a lot of time [6]. cores inside take care of the processing task. Multi-core
• In some cases, the datasets are very large for a sin- machines can be used in two ways:
gle machine to store and process. Therefore we need • By loading multiple data in a single layer and pro-
parallel and distributed processing methodologies cessing them using the multi-core processor, which
which reduce the training time. is a lengthy parallelization process;
• The very nature of deep learning is distributed across • By using a batching system to separate the datasets
processing units or nodes. Using simulated par- into many small batches and sending each batch to
allelism is slow, but implementing deep learning a core for processing.
in it’s “natural form” would mean improvements
in training time from months to weeks, or even 1.2.3.2 Distributed Training
days. Of importance here is the acceleration, noth- When the datasets are so huge that they cannot be
ing else, one can run deep learning solutions on a stored on a single system, distributed training resolves
single processor or machine provided one can tol- this problem. The data is stored across many machines
erate the sluggishness [5]. Hence, the sure way of in a distributed manner. Here, either the model or data
speeding things up is to use hardware acceleration can be distributed, which is discussed below.
just like in computer graphics since both graphics • In data parallelism, data is distributed across multi-
and deep learning are inherently parallel in nature ple machines. When the data set is large or its faster
[7]. processing is required, data parallelism can be used.
4 Deep Learning and Parallel Computing Environment for Bioengineering Systems
• In model parallelism, the model is typically too big 1.2.4.1 Inter-Model Parallelism
to fit on a single system. When a model is placed into Generally, when inter-model parallelism is used, there
a single machine, one model demands the output of are different models, and each model can have differ-
another model. This forward and backward propaga- ent parameters such as equation function, layer types,
tion establishes communication between the models number of neurons per layer, etc. All three different
from different machines in a serial fashion [9]. model cases are trained with the same dataset. See
Fig. 1.5.
1.2.4 Methods to Train Deep Neural
Networks Using Parallelization
Deep neural networks or deep artificial neural networks
follow the structure of the actual brain and its functions.
They use multiple layers of artificial neurons for classifi-
cation and pattern recognition.
Fig. 1.3 shows the structure of a non-deep neural net-
work, having only one hidden layer, whereas Fig. 1.4
depicts a deep neural network with three hidden layers.
Networks having between 3 and 10 hidden layers are
called very deep neural networks. There are four ways to
parallelize neural network training. They are discussed
in what follows.
between model complexity and training data size can be may have to be made in close to real time. Here is an
found from our earlier work. We could still use Hadoop example where predictions are to be made in near real
or Spark. We can use a sequential learning algorithm time and a large amount of data is involved. Consider a
that will operate on the whole data set without any par- model that predicts the probability of a customer buy-
titioning [11]. ing something during the current visit to an e-commerce
site, based on real time click stream data [5]. This could
1.2.5.2 Function Partitioning be done with Spark Streaming with click stream data ar-
This is the flip side of data partitioning. A function is riving through a Kafka topic. To maximize throughput,
decomposed into several independent functions. Each the data could be processed with multiple Spark parti-
function operates in parallel on the whole data set. tions. Each Spark task processing a partition will load
Results are consolidated when all the functions have the predictive model [13]. The output of the prediction
been computed. There is no learning algorithm that could be written back to another Kafka topic. The web-
is amenable to functional decomposition. Moreover, site could personalize content based on prediction from
Hadoop and Spark provide parallel processing capabil- the model [6].
ities only through data partitioning.
1.2.6 Types of Speed-Up and Scaling
1.2.5.3 Hyperparameter Learning Speeding up and scaling the capacity of the processors
This is an area where we can exploit parallel process- leads to reducing the execution time of the processor.
ing in Hadoop and Spark very effectively. Generally, any The types of scaling the models are discussed below.
learning algorithm has many parameters that influence The number of resources added for scaling the pro-
the final result, i.e., test or generalization error. Our goal cessor is nearly propositional to the performance of the
is to select a set of parameter values that will give us best processor. The resources denote the processors, memory
performance, i.e., minimum error. size and bandwidth offered in case of distributed envi-
This is essentially an optimization problem where ronment. Adding “y” times more resources yields “y”
we want to minimize error on a multi-dimensional pa- times speed-up [3]. The idea is to scale the number of
rameter space. However, the error cannot be expressed processors and check the efficiency of the machine.
as a function of the parameters in closed form, and There are two scalability models:
hence many classical optimization techniques cannot • Problem constrained;
be used. • Time constrained.
Here are some of the optimization techniques that
can be used for finding the optimal set of parameter val- 1.2.6.1 Problem Constrained (PC) Scaling
ues. The number of optimization techniques available The size of the problem here is fixed, and the reduc-
is by no means limited to this list. For some parameter tion of the execution time is the aim. Therefore, without
value sets we build a predictive model and test it to find increasing the size of the problem, the number of pro-
the test or generalization error [12]. In ensemble learn- cessors and memory size are increased. The speed-up is
ing, multiple predictive models are built. Random forest computed by the equation below:
is a good example where an ensemble of decision trees
is used. The ensemble of models is used for prediction, SPC = Time (1 processor) / Time (“p” processors).
e.g., by taking a majority vote. With ensemble learning,
error due to variance can be reduced. The ratio of time taken by one processor and time
The models in the ensemble are generally built by taken by the total number of processors used yields the
using a subset of training data and a subset of features. speed-up value.
There are other generic ways to create ensembles, e.g., by
bagging, boosting and stacking. There are also specific 1.2.6.2 Time Constrained (TC) Scaling
ensemble techniques for learning algorithms. Since the Unlike the problem constrained case, here in the time
models in the ensemble are trained independently, they constrained situation, the execution time is fixed to the
can be trained in parallel. maximum limit. Increasing the problem size is the ob-
jective here. Speed-up is defined in terms of the work,
1.2.5.4 Prediction at Scale and the time is kept constant. “Speed-up” is then de-
Having built a predictive model, sometimes the model fined as
needs to be deployed to predict on a massive amount of
data and with low latency. Additionally, the prediction Src = Work (“p” processors) / Work (1 processor),
CHAPTER 1 Parallel Computing, GPU and New Hardware for Deep Learning 7
1.3.3 CPU vs. GPU architectures like Xeon Phis where this utilization is
• CPUs are designed for more general computing difficult to achieve and difficult to debug, which in
workloads. GPUs in contrast are less flexible; how- the end makes it difficult to maximize performance
ever, GPUs are designed to compute the same in- on a Xeon Phis. [5].
structions in parallel. See Fig. 1.10.
1.3.4 Advantages of Using GPU in Parallel
Deep Learning
• The advantage of the GPU here is that it can have
a small pack of registers for every processing unit
(steam processor, or SM), of which it has many. Thus
we can have a lot of register memory in total, which
is very small and thus very fast. This leads to the ag-
gregate GPU registers’ size being more than 30 times
larger compared to CPUs and still twice as fast, which
FIG. 1.10 Architecture difference between CPU and GPU.
translates into up to 14 MB register memory that op-
• In image processing applications, GPU’s graphics- erates at a whooping 80 TB/s.
specific capabilities can be exploited to speed up the • A neural network involves lots of matrix manipula-
calculations further. tions, such as multiplication, addition and element-
• The primary weakness of GPUs as compared to CPUs wise calculations. These manipulations can be sig-
is memory capacity on GPUs which is lower than nificantly sped up because they are highly paralleliz-
on CPUs. The best known GPU contains 24 GB of able.
RAM; in contrast, CPUs can reach 1 TB of RAM. A sec- • GPUs are massively parallel calculators that al-
ondary weakness is that a CPU is required to transfer low performing many mathematical operations very
data into the GPU card. This takes place through the quickly and at once. Using GPUs cuts down the train-
PCI-E connector which is much slower than CPU or ing time.
GPU memory. The final weakness is that GPUs’ clock • GPU programming must be vectorized to be effec-
speeds are one-third that of high-end CPUs, so on tive. This is because GPU processors are built to do
sequential tasks a GPU is not expected to perform computations on images which come in a form of
comparatively well. matrices, so vectorized operations are natural in this
• GPUs are so fast because they are so efficient in ma- domain.
trix multiplication and convolution, and the reason • Deep neural networks and most AI stuff in machine
for this is memory bandwidth and not necessarily learning (ML) can thus be cast as parallel prob-
parallelism. In short and in order of importance, lems, which means parallel computing solutions like
high bandwidth main memory, hiding memory ac- GPUs can speed up 90% or so of the algorithms in
cess latency under thread parallelism and large and AI, only few algorithms such as tree traversing or
fast register and L1 memory, which is easily pro- recursive algorithms are not parallelizable, so those
grammable, are the components which make GPUs can be handled on a CPU more efficiently.
so well suited for deep learning. • GPUs are best for speeding up distributed algorithms
• CPUs are latency optimized while GPUs are band- whereby each unit in the distributed system works
width optimized. independently of the other units. Thus, an ensemble
• The CPU L1 cache only operates at about 5 TB/s, of processing nodes in a neural network, like most AI
which is quite slow, and has the size of roughly algorithms, fall into this category.
1 MB; CPU registers usually have sizes of around
64–128 KB and operate at 10–20 TB/s. Of course, this 1.3.5 Disadvantages of Using GPU in Parallel
comparison of numbers is a bit flawed because regis- Deep Learning
ters operate a bit differently than GPU registers (a bit • Full register utilization in GPUs seems to be diffi-
like comparing apples and oranges), but the differ- cult to achieve at first because it is the smallest unit
ence in size here is more crucial than the difference of computation which needs to be fine-tuned by
in speed, and it does make a difference. hand for good performance. But NVIDIA has devel-
• It is easy to tweak the GPU code to make use of the oped good compiler tools here which exactly indi-
right amount of registers and L1 cache for fast per- cate when you are using too many or too few registers
formance. This gives GPUs an advantage over other per stream processor.
CHAPTER 1 Parallel Computing, GPU and New Hardware for Deep Learning 9
• One example algorithm that is hard to get sped up concurrently executed in a multiprocessor. The unified
from GPUs is the Fibonacci sequence calculation, virtual address is useful in establishing the connection
which is sequential. By speeding up the calculations, between two GPUs.
neural networks can be optimized using more data
and more parameters, thanks to progress in deep 1.3.6.2 AMD
learning. AMD accelerated parallel processing (APP) or ATI
• CPUs are better suited to perform a wider range of Stream is the technology which is used to execute gen-
operations, at the cost of slower performance for eral computations. Each APP device consists of multiple
some of the rendering-specific operations. compute units, each compute unit contains multiple
• A CPU usually has less cores, current CPUs com- stream cores, and each core contains multiple process-
monly have between 4 and 16, while newer high-end ing elements. The instances of GPU program are concur-
GPUs have more than a 1000. Each of these cores is rently executed, each instance is named as a work item.
essentially a computer in itself. So, why isn’t a GPU In lockstep, multiple work items are executed in parallel
always better than a CPU if it has way more cores? by all the cores of a compute unit. The total work items
One reason is that the clock speed is typically much are decided based on the hardware and requirement of
higher on a CPU, meaning that each of the individ- the programmer for a particular work group [3].
ual cores can perform more calculations per second
than the individual cores of the GPU. This makes
CPUs faster on sequential tasks. 1.4 GPU BASED PARALLEL DEEP
• There are other things to consider when calculating LEARNING ON COMPUTATIONAL
the benefit of using a GPU instead of a CPU, such as INTELLIGENCE APPLICATIONS WITH
memory transfer. If you want to multiply 1000 num- CASE STUDY
bers on the GPU, you need to tell it first what those
In this section, we discuss the computational intelli-
1000 numbers are, so the GPU tends to be more use-
gence applications and how GPU based parallel deep
ful in cases where you need to do a lot of things with
learning is applied over those applications by consider-
little change in input and a small volume of output.
ing examples.
1.3.6 Famous GPUs on the Market
1.4.1 Dataflow of the Deep Parallelized
NVIDIA and AMD are the two leading manufacturers,
Training and Testing of Computational
the GPUs of which are widely used in the technology
Intelligence Applications
world. The discussion about both the GPUs are carried
out below. Applying the parallel deep learning methodology over
the computational intelligence research applications
1.3.6.1 NVIDIA brings a greater challenge in implementation. The over-
All the NVIDIA products are under the standard com- heads, applicability problems and related issues are
pute unified device architecture (CUDA). CUDA is de- addressed in this section.
fined as an architectural register, where the binary files A general model has been designed in order to exe-
required for execution of one CUDA do not necessar- cute the computational intelligence applications data in
ily work for another CUDA based GPU. The CUDA parallel with deep learning algorithms. Fig. 1.8 depicts
GPU consists of multiprocessors which execute multi- the data flow of the parallel execution using deep learn-
ple threads in blocks. Multiple blocks can be executed ing. The data flow comprises of the following steps:
by a multiprocessor simultaneously. Each multiproces- 1. The required data is collected by means of a sensor
sor contains 8 CUDA cores with compute capability of or similar devices from the subject.
1x. As the capability increases to 2, 3, etc., the number 2. Once the data is ready for training, the entire data is
of CUDA cores also increases. separated into training and test data.
To hide the memory access and arithmetic laten- 3. The training data is fed into the model.
cies, multiple threads have to be concurrently executed. 4. In order to perform parallel processing, the dataset
NVIDIA runs 192 to 256 threads per multiprocessor for has to be separated into halves for parallel process-
the GPUs having compute capability 1x. It is better to ing. Certain scheduling algorithms can be used to
run more threads in case of data parallelism to free up schedule the processes based on the number of cores
the registers. High performance can be achieved only available in the process.
by knowing the optimal number of threads that can be 5. The dataset gets trained in the separate cores.
10 Deep Learning and Parallel Computing Environment for Bioengineering Systems
1.6 SUMMARY
GPUs work well on parallel deep neural network com-
putations because:
• GPUs have many more resources and faster band-
width to memory;
• Deep neural networks’ computations fit well with
GPU architecture. Computational speed is extremely
important because training of deep neural networks
can take from days to weeks. In fact, many of the suc-
cesses of deep learning may have not been discovered
if it were not for the availability of GPUs.
• Deep learning involves huge amounts of matrix mul-
tiplications and other operations, which can be mas-
sively parallelized and thus sped up on GPUs.
In this chapter, the basic concepts of parallel pro-
cessing are explained with examples in order to make
a clear way to parallel deep learning. Various paral-
lelization techniques are discussed with diagrammatic
explanation, and ways in which they can be internally
classified are focused on. The relation between com-
FIG. 1.22 Snippet while the dataset is getting trained in putational intelligence and parallel deep learning, the
Python shell. challenges in combining them together and benefits are
discussed. The applicability of the parallel deep learning
4. Expert systems have financial applications, robot
algorithms to the real time datasets are explained with
production, diagnostics and various industry based
simple numerical examples.
operations.
8. K.R. Foster, R. Koprowski, J.D. Skufca, Machine learning, 13. S.S. Raj, M. Nandhini, Ensemble human movement se-
medical diagnosis, and biomedical engineering research- quence prediction model with a priori based probability
commentary, Biomedical Engineering Online 13 (1) tree classifier (APTC) and bagged j48 on machine learning,
(2014) 94. Journal of King Saud University: Computer and Informa-
9. Y.B. Kim, N. Park, Q. Zhang, J.G. Kim, S.J. Kang, C.H. Kim, tion Sciences (2018).
Predicting virtual world user population fluctuations with 14. H. Greenspan, B. Van Ginneken, R.M. Summers, Guest ed-
deep learning, PLoS ONE 11 (12) (2016) e0167153. itorial deep learning in medical imaging: overview and
future promise of an exciting new technique, IEEE Trans-
10. I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep
actions on Medical Imaging 35 (5) (2016) 1153–1159.
Learning, vol. 1, MIT Press, Cambridge, 2016, p. 2016.
15. I.-H. Chung, T.N. Sainath, B. Ramabhadran, M. Picheny,
11. A. Ike, T. Ishihara, Y. Tomita, T. Tabaru, Technologies for J. Gunnels, V. Austel, U. Chauhari, B. Kingsbury, Paral-
practical application of deep learning, Fujitsu Scientific lel deep neural network training for big data on Blue
and Technical Journal 53 (5) (2017) 14–19. Gene/Q, IEEE Transactions on Parallel and Distributed Sys-
12. N. Friedman, M. Linial, I. Nachman, D. Pe’er, Using tems 28 (6) (2017) 1703–1714.
Bayesian networks to analyze expression data, Journal of
Computational Biology 7 (3–4) (2000) 601–620.
CHAPTER 2
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00009-9 17
Copyright © 2019 Elsevier Inc. All rights reserved.
18 Deep Learning and Parallel Computing Environment for Bioengineering Systems
genomics is the study of the structure, function, evolu- disbursed in research grants in its first phase (from 2014
tion, mapping, and editing of an organism’s genome. till 2017) to address some major data science challenges
DNA sequences constitute the most abundant data in and to stimulate data-driven discovery [11].
bioinformatics. DNA is made up of molecules called This chapter aims at presenting big data technolo-
nucleotides. The information in DNA is stored as a gies and big data analysis which integrates deep learn-
code made up of four chemical bases: adenine (A), ing for addressing complex computational needs using
guanine (G), cytosine (C), and thymine (T). The or- open source solutions, namely the Hadoop framework
der of these bases is what determines the genetic code. and ecosystem. In Section 2.2, the big data workflow
DNA sequencing is the process of determining the pre- is described and the Hadoop big data framework for
cise order of the A, G, C and T bases in a strand of processing, storing and analyzing big data is discussed,
DNA. A typical bacterial genome can have several mil- as well as its application in the field of bioinformatics.
lion bases. The human genome consists of about 3.2 Section 2.3 describes the machine and deep learning al-
billion bases, and the size of a single sequenced hu- gorithms and open source tools which can be integrated
man genome is approximately 200 gigabytes [4]. The in the Hadoop ecosystem for more intelligent analysis
first human genome was completely sequenced in June of big data and their applications in bioinformatics. Sec-
2000, and as of 2014, about 228,000 human genomes tion 2.4 concludes the chapter by discussing the future
have been sequenced [5]. Recently, the Illumina, the directions.
largest maker of DNA sequencers, has sequenced more
than 500,000 human genomes [6]. Eventually, biologi-
cal data will be sequenced at an ever-faster pace. Exam- 2.2 FROM BIG DATA TO KNOWLEDGE
ples of two large datasets are the Cancer Genome Atlas DISCOVERY WITH HADOOP
[7] and the Encyclopaedia of DNA Elements [8]. The Eu- There is tremendous potential and highly useful values
ropean Bioinformatics Institute (EBI) has biology-data hidden in the huge volume of biological data, which is
repositories with size of 40 petabytes [9]. Given that bi- now available and which is growing exponentially. Big
ological data from different sources are often used, such data analytics is inevitable for handling biological data
data are heterogeneous as they are stored in different for making evolutionary breakthroughs. The big data
formats. Biological and medical data is also generated in knowledge discovery process is as shown in Fig. 2.1.
real-time and fast (e.g., medical imaging in healthcare). Typically, big data has to be collected and ingested
Another characteristic of biological big data is that it is into the system. Such data is often structured, semi-
geographically distributed [10]. structured and mostly unstructured, thus different tools
Performing data analysis to harvest the wealth of and techniques are used for collecting data. Data col-
data from biological and biomedical data, such as ge- lected often go through a staging phase where the data
netic mapping on the DNA sequence, will only help is cleaned, i.e., inconsistent, incomplete or noisy data
to advance our understanding of the human condition, are discarded. Some data may require pre-processing
health and disease; which will consequently allow cur- so as to improve the quality of the data before analy-
ing diseases and improving human health and lives by sis. During the staging and pre-processing phase, data is
supporting the development of precision methods for stored in a temporary storage. Such pre-processing may
healthcare. This is a typical big data problem and or- include techniques such as data extraction, data anno-
ganizations, such as the National Institutes of Health tation, data integration, data transformation, and data
(NIH), recognize the need to address the big data chal- reduction. Data is then stored in a suitable storage, from
lenges related to the processing and data analysis of where it is accessed for analytics, after which the results
biological data. In 2012, the NIH has launched the Big of the analysis of data can be visualized. Such results can
Data to Knowledge initiative to enable biomedical re- then be interpreted accordingly. Performing data ana-
search and development of innovative approaches and lytics of big data usually requires high server processing
tools in the area of big data science for enhancing the capability often involving massively parallel processing
utility of biomedical big data. Some $200 million was (MPP) technologies. Big data processing and analysis
thus involves a shift in computing architecture to han- lytics. Hadoop is thus often integrated with other soft-
dle the challenges of storing, analyzing and extracting ware solutions. The Apache Hadoop ecosystem consists
meaningful and valuable data from the large volume, of dozens of projects such as Apache Hive, Apache Ma-
variety and high velocity data in the area of bioinfor- hout, and Apache Spark, providing various functionali-
matics. ties such that a number of these projects can be stream-
It is unlikely that big data can be stored on a single lined to deliver the required big data services. Hadoop is
server as the amount of storage required is prohibitive. a flexible platform, as it can also be integrated with non-
Similarly, it is unfeasible to process big data on a single Apache software solutions. The Gartner Magic Quadrant
server node unless multi-core high performance com- for Analytics and Business Intelligence Platforms 2018
puting (HPC) servers, which are quite costly, are used. [16] identifies Microsoft, Tableau and Qlik as the lead-
Thus, to collect, process and analyze big data, a cluster ers for analytics and business intelligence. All three big
of computing nodes may be more suitable than a single data analytics solutions support the Hadoop platform,
compute node. Cloud computing, which can provide a e.g., Azure HDInsight can be integrated with Hadoop
scalable and cost-effective solution for big data storage [17], Qlik solutions can be used with Cloudera, which
and computation, is becoming more and more popu- is a distribution of Hadoop, packaged with other tools
lar and has an important role in the development of [18], and Tableau can be very easily made to work
bioinformatics tools. According to [12], the cloud com- on Hadoop data [19]. With the growing popularity of
puting model is the only storage model that can provide Hadoop, more and more software solutions are con-
the elastic scale needed for DNA sequencing, whose rate stantly being developed to work with Hadoop. Research
of technology advancement could now exceed Moore’s work is also being carried out to make Hadoop become
Law.
more efficient and faster [20,21]. Besides, the Hadoop
The National Institute for Standards and Technol-
platform, especially the MapReduce module is com-
ogy (NIST) defines cloud computing as a model for
monly used in the field of bioinformatics for processing
enabling convenient, on-demand network access to a
data [22–26]. The following subsections describe the
shared pool of configurable computing resources (e.g.,
Hadoop platform to demonstrate why it is an impor-
networks, servers, storage, applications, and services)
tant platform for big data storage and processing. The
that can be rapidly provisioned and released with min-
various other tools that can be integrated with Hadoop
imal management effort or service provider interaction
to achieve the big data workflow for knowledge discov-
[13]. Cloud service providers like Amazon, Microsoft,
ery are also discussed, as well as their applications in the
Oracle and IBM have several geographically distributed
data centers which houses a massive array of compute area of bioinformatics.
server nodes as well as storage. By means of virtu-
2.2.1 Hadoop Big Data Framework
alization technology, the required hardware resources
and computational power can be provisioned instan- The Hadoop platform has several benefits, which makes
taneously. The cloud, thus, provides the storage and it the platform of choice for big data analytics. Hadoop
computing infrastructure for storing, processing and an- is flexible and cost-effective, as it has the ability to store
alyzing big data in a shared pool of resources, i.e., a and process huge amount of any kind of data (struc-
cluster of compute nodes. However, for big data ana- tured, unstructured) quickly and efficiently by using a
lytics, apart from the infrastructure, there is a need for cluster of commodity hardware. By means of resource
a middleware to enable distributed processing across pooling, more processing power is available in the clus-
the cluster of compute nodes. It should be possible to ter in a cost-effective manner than on a single server.
develop custom applications that can be executed in Moreover, Hadoop is massively scalable as more com-
parallel on distributed biological datasets. pute nodes can be easily added in the cluster if more
Hadoop is one of the most popular and significant processing power is required. Likewise, Hadoop has
open source platforms for big data storage, and process- a very high degree of fault tolerance; if one node in
ing [14]. It enables distributed processing across clusters the cluster fails, the processing tasks are redistributed
of commodity servers, scaling up from a single server to among the other nodes in the cluster, and multiple
thousands of servers in the cluster. According to [15], copies of the data is stored in the Hadoop cluster.
the Hadoop big data analytics market is expected to Hadoop is made up of 4 core modules: the Hadoop
grow at a compound annual growth rate (CAGR) of Distributed File System (HDFS), Yet Another Resource
26.5% from 2016 to 2022. The Hadoop platform by Negotiator (YARN), Hadoop Common and MapReduce
itself cannot perform all types of processing and ana- as shown in Fig. 2.2. The Hadoop common is simply a
20 Deep Learning and Parallel Computing Environment for Bioengineering Systems
set of libraries and utilities used by the other Hadoop non-Hadoop clusters are managed using Mesos. Cluster
modules. resources can be dynamically shared, i.e., a YARN cluster
can be resized as required. MapReduce is a program-
ming model for the parallel processing of large data
sets on the distributed computing nodes in the clus-
ter. MapReduce is the default processing framework for
Hadoop, but Hadoop can also be used with other pro-
cessing frameworks. MapReduce is further discussed in
Section 2.2.4.1.
that holds a huge amount of raw data in its native for- the imported table data or datasets. Such files are then
mat whereby the data structure and requirements are saved as comma-separated files with the name of the
not defined until the data is to be used. Thus, data source table to a directory on HDFS [45]. SequenceFiles
lakes have the schema-on-read characteristic and typ- are flat files consisting of binary key/value pairs, exten-
ically store data using a flat architecture unlike data sively used in MapReduce as input and output formats.
warehouses which store data in a highly structured Intermediate or processed data can also be exported
repository and which adopt a relational or dimensional from Hadoop to an RDBMS datastore using Sqoop.
data model. Data warehouses have the schema-on-write
characteristics, which means that the data structure is 2.2.2.2 Apache Flume
defined before the data is stored. Data lakes are thus Bioinformatics also involves high throughput streaming
more agile as data can be easily configured and recon- real-time data such as the output of DNA sequencing,
figured following different models during the analysis. data captured from health sensors. Such data consist of
Today, several data ingestion tools are available for in- a continuous stream of data at a specific rate. Flume
gesting a variety of data onto Hadoop. The following is a distributed and reliable ingestion tool that can be
three types of input data can be distinguished: used to collect, aggregate streaming data from many
• Structured data has a strong schema, e.g., from rela- different sources and to push out the serialized data, us-
tional databases and FASTA sequence files. ing mechanisms called data sinks, to a centralized data
• Unstructured data does not have any structure and store such as HDFS or HBase on Hadoop or Cassan-
can be of any form, e.g., medical imaging, electronic dra. Flume is more tightly integrated with the Hadoop
health records, clinical trials results, medical sensors. ecosystem, i.e., Flume has HDFS sinks to integrate data
• Semi-structured data has some structure but not onto HDFS. The Flume topology consists of the source,
strictly organized, e.g., XML files for electronic pa- channel and sink. Flume clients send data to the source,
tient record. which keeps the data in a temporary buffer called chan-
The complexity of ingestion tools thus depends on nel. Data flows from the channel to a sink. A typical
the format and the quality of the data sources. These Flume architecture is as shown in Fig. 2.3.
ingestion tools are capable of some pre-processing and
staging. Some of these tools are described as follows. 2.2.2.3 Apache Kafka
Apache Kafka is an open source system for ingesting
2.2.2.1 Apache Sqoop (SQL-to-Hadoop) data from several sources in real-time. While it was
If data to be processed is from structured datastores, not specifically designed for Hadoop, it can be used to
Apache Sqoop can be used for transferring bulk data in collect high throughput parallel data for loading into
both directions between relational databases, data ware- Hadoop. Kafka uses a publish–subscribe system simi-
houses and HDFS or Hadoop data stores such as HBase lar to a messaging system. The Kafka system is made
or Hive. Sqoop reads the relational database manage- up of publishers, the Kafka cluster, and subscribers (cus-
ment system (RDBMS) schema description to gather the tomers of data). Data (messages) emitted by publishers
metadata for the data to be imported and then it trans- are stored as logs in the Kafka cluster. A typical architec-
fers the table data required. The data is captured as a set ture of Kafka is shown in Fig. 2.4. Kafka forwards data to
of serialized files or SequenceFiles containing a copy of the subscriber as and when required. Messages are orga-
nized into topics, topics are further split into partitions, highly scalable and can be integrated with MapReduce
and partitions are replicated across the nodes – called for processing. HBase is a column oriented big data
brokers – in the cluster. Subscribers can be publishers store and being built on top of HDFS, the data stored
and vice-versa. Kafka is more easily scalable and more in HBase are eventually stored in HDFS. In [49], the
fault-tolerant than Flume. authors experimented using HBase for storing 9 billion
Apache Kafka has been used in several works related patient records.
to bioinformatics. In [46], Kafka was used to ingest data For structured data, there is Apache Hive, which is a
from the SeqRef dataset from the NCBI’s datastores. Us- data warehouse infrastructure built on top of Hadoop.
ing Kafka, data was lightly structured and stored, such Data from HDFS can be moved into Hive by using the
that the data is more amenable to parallel access and extract, transform and load (ETL) tools. Hive can also be
streamed processing. In [47], the authors proposed the used to query data stored in HDFS, HBase or other file
adoption of the Kafka stream processing to simplify systems or databases such as Cassandra. Hive consists
the genomic processing pipeline, to improve the per- of an SQL engine on top of Hadoop for processing big
formance and improve fault-tolerance. The European data using MapReduce jobs through SQL like queries,
Bioinformatics Institute (EMBL-EBI), which maintains a and thus is very convenient to data analysts who are
comprehensive range of molecular data resources, sup- more familiar with SQL than MapReduce programming
ports and encourages the use of the Kafka Streams API. [50]. Apache Hive is also often used with Apache Pig
A prototype using Kafka Streams API to ingest data, ag- to process, transform and analyze data in Hive. Apache
gregate logs and display results online in a dashboard Pig is a high-level language (Pig Latin) for processing
has been detailed [48]. data in Hadoop, the processing is eventually done using
MapReduce jobs [51]. Pig is specifically designed for the
2.2.3 Data Staging and Storage on Hadoop ETL data pipeline and iterative data processing, and it
Big data typically consists of data that is semi-structured supports user defined functions (UDF). In [52], BioPig
or unstructured and which cannot always be repre- – MapReduce and Pig – was used to analyze large-scale
sented in rows and columns as in traditional databases. sequence bioinformatics data.
With the Hadoop big data framework, data is stored as
files in the HDFS distributed file system, which allows 2.2.4 Data Processing and Analysis
storing data across multiple nodes in a Hadoop cluster. Frameworks
However, Apache Hadoop is not suitable for real-time After data has been collected and ingested, big data is
random-access capabilities. For applications or process- available for data processing and analysis. Data process-
ing that requires reading and writing data in real-time, ing for analysis varies depending on the type of insights
i.e., very low latency, Apache HBase NoSQL Hadoop desired and the flow of data along the data analysis
Database is preferred. HBase provides low latency, it is pipeline. Often data is processed a number of times us-
CHAPTER 2 Big Data Analytics and Deep Learning in Bioinformatics With Hadoop 23
ing a single tool or different tools to get useful insights. Blast (basic local alignment tool) algorithm in Hadoop
Big data processing frameworks are often classified into: MapReduce [57].
(i) batch only frameworks; (ii) stream only frameworks; Today, it is possible to access a Hadoop cluster on the
and (iii) hybrid frameworks. cloud. However, when using MapReduce based bioin-
formatics tools in the cloud, if the Hadoop parameters
2.2.4.1 Batch Processing Only Framework – are not set appropriately, there can be resource under-
MapReduce utilization while having to pay considerate cloud com-
Batch processing frameworks are ideal for processing ex- puting costs. Several recent research works have been
tremely large datasets that require significant computa- conducted on the deployment and use of MapReduce
tion. Such datasets are typically bounded (finite collec- on the cloud for bioinformatics computations. A cloud
tion of data) and persistent, i.e., stored on some perma- framework has been proposed in [58] to easily deploy
nent storage. Batch processing is suitable for processing bioinformatics tools (several MapReduce based tools)
which is not time sensitive as processing a large dataset on cloud virtualization platform based on Hadoop for
would take time. The most popular batch processing Bioinformatics-as-a-Service. In [59], the authors worked
framework is Apache Hadoop’s MapReduce. MapRe- on defining the Hadoop parameters for fine tuning
duce is a Java based system for processing large datasets MapReduce so as to have better performance on the
in parallel. It reads data from the HDFS and divides the cloud. In [60], to address the difficulty of compos-
dataset into smaller pieces. Each piece is then sched- ing complex workflows from multiple bioinformatics
uled and distributed for processing among the nodes MapReduce tools, the authors proposed that two ex-
available in the Hadoop cluster. Each node performs isting systems, namely Cloudgene and CloudMan, be
the required computation on the chunk of data and integrated to enable the delivery of MapReduce ap-
the intermediate results obtained are written back to the plications in the cloud. In [61], a novel implementa-
HDFS. These intermediate outputs may then be assem- tion of the partial order alignment (POA) algorithm
on a multi-node Hadoop cluster running on MapRe-
bled, split and redistributed for further processing, until
duce framework which is implemented in the Amazon
final results are written back to HDFS. The MapReduce
AWS cloud was proposed. In [62], a novel library for
programming model for processing data consists of two
the scalable manipulation of aligned next-generation
distinct tasks performed by programs: a Map job and a
sequencing data in the Hadoop distributed computing
Reduce job. Typically, the Map job starts by taking a set
framework on the cloud was proposed.
of data and converting it into another set of data where
individual elements of the data are broken into tuples 2.2.4.2 Stream Processing Only Framework
consisting of key value pairs. These key value pairs may
For datasets which require real-time processing, e.g.,
then be shuffled, sorted, and processed by one or more sensor data being captured in real-time, the MapRe-
Map jobs. The Reduce job usually takes the outputs of duce framework is not suitable. Often in such cases,
a Map job as its input and combines those data tuples the available data has to be processed immediately as
into a smaller set of tuples. soon as it is collected so as to be able to take reactive
Many bioinformatics applications and tools use measures based on the results of the output, e.g., con-
the MapReduce framework. In [53], the following trol system of a manufacturing plant. Such a dataset is
MapReduce-based tools and programming environ- an unbounded set and processing is done on the data
ments for the development of bioinformatics applica- which is available, i.e., the working dataset, which is the
tions are available: BioPig [52], Cloudgene, FASTdoop, amount of data that has been ingested by the system
GATK, Hadoop-BAM, SeqPig and SparkSeq. MapRe- so far. Stream processing frameworks usually continu-
duce has also been adopted in (a) algorithms for single ously process data and do not “end” unless explicitly
nucleotide polymorphism identification, e.g., BlueSNP stopped. The results of processing are available in near-
and Crossbow; (b) gene expression analysis, e.g., Eoul- real time and are continually updated in a dashboard
san, FX, MyRNA, YunBe; (c) sequence comparison, as new data is ingested and processed. A characteristic
e.g., CloudBLAST [54], bCloudBLAST [55], HAFS, K- of stream processing frameworks is in-memory comput-
mulus, Nephele, and Strand; (d) genome assembly, e.g., ing whereby most processing is done in the cluster’s
CloudBrush and Contrail; (e) sequencing reads map- memory and only the final output is stored on a stor-
ping, e.g., BlastReduce, CloudAligner, CloudBurst, and age disk. Popular stream processing frameworks, which
SEAL. Other MapReduce based application in bioinfor- can be integrated with Hadoop, are Apache Storm and
matics include Big-Bio [56], an implementation of the Apache Samza. Bioinformatics workloads usually do
24 Deep Learning and Parallel Computing Environment for Bioengineering Systems
not require real-time processing and thus such frame- for load balancing and faster processing. All processing
works would not be used by itself for processing biolog- is done in-memory, unlike with MapReduce where in-
ical data. However, such a stream processing framework termediate results are written to HDFS and have to be
may be adopted along the data processing pipeline to fetched for the next stage of the computation. Spark
improve the processing efficiency. is 100 times faster than MapReduce when data is pro-
cessed in memory and 10 times faster in terms of disk
2.2.4.3 Hybrid Processing Framework access than Hadoop. Only the end results are persisted
Hybrid processing frameworks are capable of handling on storage, which reduces the processing latency in
batch and stream processing. Two examples of hybrid Spark. The processing is further optimized by the use
frameworks are Apache Spark and Apache Flink, both of directed acyclic graph (DAG) for defining a graph of
of which are adopted in bioinformatics; Apache Spark tasks that allows implementing complex data process-
being the more popular out of the two. Both frame- ing algorithms more efficiently. Moreover, Spark is also
works offer lower data processing latencies as compared highly fault-tolerant; if one node fails, the failed tasks
to MapReduce and use in-memory processing. Both can are distributed across the other nodes. Fig. 2.5 depicts
be plugged in Hadoop and used instead of MapReduce, the Spark architecture. Apart from the data processing,
though they can also work on other underlying frame- Apache Spark also includes other components such as
works such as Mesos. an SQL engine, machine learning library and graph pro-
Apache Spark is mainly a batch processing frame- cessing engine built atop the Spark Core as shown in
work with stream processing capabilities which operates Fig. 2.6.
using a master/slave architecture. The master coordina- Several bioinformatics applications on Apache Spark
tor, called the driver, takes streaming data and converts exists. In a recent survey [63], the authors identified the
it into small microbatches. These microbatch datasets following Spark based applications: (a) for sequence
are stored in memory as a resilient distributed dataset alignment and mapping: SparkSW, DSA, CloudSW,
(RDD) and are dynamically distributed for processing SparkBWA, StreamBWA, PASTASpark, PPCAS, Spark-
across slave nodes (known as executors) in the cluster BLAST, and MetaSpark; (b) for the assembly phase in
the sequence analysis workflow: Spaler, SA-BR-Spark; algorithms for data analytics for solving bioinformat-
(c) for the sequence analysis: HiGene, GATK-Spark, and ics problems such as clustering, association rule mining,
SparkSeq. Spark is also used in other biological appli- logistic regression, support vector machine (SVM), and
cations such as in (a) epigenetics, for example, in [64] decision trees.
the authors proposed a novel CpG box model and a Several tools that can capture data from databases,
Markov model to investigate CpG island so as to make analyze them and display results on a dashboard ex-
the analytic process faster; (b) phylogeny, e.g., Cloud- ist for business intelligence. Typical examples include
Phylo; (c) drug discovery, e.g., S-CHEMO; (d) single-cell Tableau, Qlik, Pentaho, and Datameer. However, for
RNA sequencing (scRNA-seq), e.g., Falco; (e) variant bioinformatics, given that the processing is more di-
association and population genetics studies, e.g., Vari- verse and complex, the available almost “plug-and-
antSpark, SEQSpark. Moreover, the Biospark framework play” tools for business intelligence are not suitable. For
[65], which uses Hadoop and Spark, allows storing and the Hadoop environment a few open-source tools such
analyzing large numerical datasets generated from bio- as Elastic Stack, Apache Zoomdata, and Apache Zep-
logical simulations and experiments. pelin can be used for data analysis and visualization.
Apache Flink is still a new technology, unlike Spark In [70], it was reported that researchers at the Scripps
which is more mature. Flink is independent of Hadoop Research Institute are using Elasticsearch and Kibana to
but can be integrated with Hadoop. Just like Spark, analyze/track data from DNA. Elastic stack is a collec-
Flink supports in-memory computation which makes it tion of open-source tools for analyzing big data. It is
as fast as Spark, but Flink is more powerful as it can per- composed of the following tools: Beats and Logstash for
form batch, true stream as well as graph processing. collecting machine data, Elasticsearch which is based
A few works in the field of bioinformatics have on Apache Lucene for searching and analyzing data,
started using Flink. In [66], Apache Flink and MapRe- and Kibana for visualization. Using the Elasticsearch–
duce was used to constitute a sequence alignment Hadoop (ES–Hadoop) connector allows using Elastic
pipeline for processing raw data produced by Illumina stack on data processed and stored on Hadoop.
sequencers. The authors demonstrated that the pro-
posed pipeline has very good scalability and is fault
tolerant. In [67], the authors further exploited the use of 2.3 MACHINE LEARNING FOR BIG DATA
Apache Kafka together with Apache Flink to implement ANALYSIS
the first processing phases for Illumina sequencing data Today machine learning is commonly used for predic-
with positive results and improvements. tive analytics of big data. Machine learning is a sub-
field of artificial intelligence (AI) and is based on the
2.2.5 Big Data Analysis and Visualization idea that systems can learn from examples and expe-
According to [68], data analytics can be categorized into riences, by training on data inputs without relying on
three levels of analysis as follows: descriptive, predic- explicit programming. Machine learning can thus fa-
tive and prescriptive analytics. Descriptive data analysis cilitate data analysis as it automates analytical model
is used to provide summaries about the data, identify building. Today machine learning is being used exten-
basic features of the data, and identify patterns and sively in various industries such as automobile (e.g.,
relationships to describe the data properties. It is per- self-driving cars), genetics to immensely improve the
haps the easiest type of analysis that can be done on understanding of the human genome, healthcare, finan-
bioinformatics data. Predictive data analytics aims to cial services, environment climate change, retail, energy,
observe and determine patterns in the dataset so as to entertainment media, and social media [71]. According
be able to predict future outcomes such as viral evolu- to Gartner [72], data science and machine learning are
tion. Prescriptive data analytics is usually the final stage becoming critical for differentiation and sometimes sur-
of data analytics; it allows taking the course of action to vival in business.
bring improvements based on findings from descriptive Machine learning plays an important role in solving
and predictive analysis of data. Descriptive data anal- numerous bioinformatics problems such as gene find-
ysis can be easily handled by the tools presented in ing algorithms, gene expression, genome alignment,
Section 2.2.4. Predictive and prescriptive data analytics GWAS and genomic selection. There is widespread ap-
are still in their early stages in the area of bioinformat- plication of machine learning in bioinformatics as im-
ics. Mostly machine learning techniques are best suited mense amounts of molecular biology data are now
for such analytics of data as described in Section 2.3. In available [73] and due to the highly complex nature
[69], the authors summarize the different data mining of many problems in bioinformatics whereby manually
26 Deep Learning and Parallel Computing Environment for Bioengineering Systems
developing specialized algorithms that will solve them 2.3.1.1 Supervised Machine Learning
perfectly is impractical, if not impossible. Machine Supervised machine learning consists of training a pro-
learning algorithms have proven to be quite effective gram using a set of training dataset, comprising inputs
in detecting patterns and are being applied in bioinfor- and outputs (labeled with correct output), such that
matics applications with great success. In [74], four ma- when new data is input, the system can reach an ac-
chine learning models, namely neural networks (NNs), curate conclusion. The machine learning task is thus
cellular automata (CAs), random forests (RFs) and mul- to infer a function that maps an input to an output
tifactor dimensionality reduction (MDR), which have based on the input and output pairs from the exam-
been used to successfully detect and characterize gene– ple training data set. The algorithm learns by comparing
gene interactions, are discussed. In [75], a convolutional its actual output with correct outputs to find errors and
neural network (CNN) architecture has been proposed consequently improve the model iteratively until an ac-
and evaluated for detecting cancer metastases in gi- ceptable level of performance is reached.
gapixel microscopy images. In [76], machine learning Assuming input variable (x) and output variable (Y ),
techniques, namely the SVM, artificial neural network then the machine learning algorithm is used to learn the
(ANN), and a hybrid of these techniques, are reviewed mapping function f where Y = f (X), such that when
for DNA sequence classification. Many machine learn- new input data (x) is used, it can accurately predict
ing algorithms are readily available, and in [77], the the output variable (Y ) for that input. In practice, the
authors conducted a thorough analysis of 13 state-of- input x often represents multiple data points such as
the-art machine learning algorithms (e.g., support vec- x1 , x2 , x3 , . . . , in which case the predictor function f (x)
tor classifier (SVC), K-nearest neighbor (KNN), and has the following form, assuming three input compo-
nents, where a0 , a1 , a2 and a3 are constants:
decision tree (DT)) out of a set of 165 algorithms for
solving the problem of classification of data to help re-
f (x) = a0 + a1 x1 + a2 x2 + a3 x3 (2.1)
searchers identify the best algorithm for tackling similar
bioinformatics problems. In the following subsections, By finding the best possible values for a0 , a1 , a2 and
different machine learning approaches and deep learn- a3 iteratively, the predictor function f (x) is perfected.
ing techniques are introduced. Open source tools for Fig. 2.7 depicts the two-step supervised machine learn-
solving bioinformatics problems using machine learn- ing process.
ing on Hadoop are also described. Most practical machine learning uses the supervised
learning method. The supervised learning task can be
2.3.1 Machine Learning Methods either a regression or a classification problem. A classi-
The three main machine learning methods are super- fication problem is when the output variable is a cate-
vised machine learning, unsupervised machine learn- gory, i.e., a prediction can take a finite number of values.
ing, semi-supervised machine learning and reinforce- For example, given the set of input features, the predic-
ment learning. tor function should predict whether a tumor is benign
or malignant. The classification problem can be of two
types: binary classification or multi-class classification. associated, for instance, to address questions such as
Binary classification is where the output can be one of “What patterns exist in gene expression of cancers?”. The
two possible values or classes, usually 1 or 0 values; two popular unsupervised machine learning tasks are
whereas multi-class classification is where the output the clustering of data and dimensionality reduction of
can be one out of three or more classes, e.g., when pre- data.
dicting the type of cancer. Machine learning algorithms Clustering is the process of finding similarities in un-
for classification problems include decision trees, lo- labeled data so as to group similar data items together
gistic regression, naive Bayes, K-nearest neighbors, ran- into a cluster. Different types of clustering methods are
dom forest, and linear SVC (support vector classifier). In available whereby every methodology follows a differ-
[78], the authors describe the use of decision tree-based ent notion or set of rules for defining the degree of
methods in computational and systems biology. An im- similarity among data points. Fig. 2.8 depicts the differ-
portant classification task in bioinformatics is the classi- ent clustering techniques. According to [82], the most
fication of microarray data. In [79], a random forest has typical use of clustering in bioinformatics is the cluster-
been proposed for gene selection and classification of ing of genes in expression data. Typically, a few samples
microarray data. In [80], support vector machine (SVM) of DNA microarrays allow measuring the expression lev-
has been used as an effective method for gene classifica- els of a large numbers of genes. Clustering can be used
tion. to group genes with a similar expression level in all the
A regression problem is when the output variable to samples into a cluster.
be predicted takes a real or continuous value, such as The two most widely used clustering algorithms used
temperature, weight. Typical regression algorithms in- in machine learning are the K-means clustering and hi-
clude linear regression, regression trees (e.g., random erarchical clustering. K-means clustering is a type of
forest), and support vector regression (SVR). The sim- partitional clustering algorithm, more specifically, it fol-
plest model is a simple linear regression, which tries to lows the centroid model. It is an iterative clustering al-
find a statistical relationship between two continuous gorithm whereby the notion of similarity is based on
variables by drawing the line that best fits the data. In the closeness of a data point to a centroid of the clus-
[81], several regression approaches for microarray data ters. The K-means algorithm partitions the given data
analysis were presented, including the support vector into K clusters (K is defined by the user, which implies
machine (SVM). some prior knowledge of the dataset), where each clus-
ter has a cluster center known as centroid. Initially, the
2.3.1.2 Unsupervised Machine Learning K cluster centers are randomly set and the data items
In supervised machine learning, training datasets with are assigned to the nearest cluster center. The K cluster
labeled data are used, whereas in unsupervised machine centers are reevaluated based on the initial membership
learning no labeled datasets are used. With unsuper- of data items to the clusters. The closeness of the data
vised machine learning, the system is required to an- points to the new data center is evaluated, and the pro-
alyze the actual data to find similarities, patterns and cess is iterated until the data items do not change clus-
correlations in the data to explore and learn about rela- ter membership. Expectation maximization (EM), also
tionships within the data. Unsupervised machine learn- called soft clustering, is another popular clustering tech-
ing is suitable for data that may have little knowledge nique which is of the partitional type but model-based
algorithm. In [83], the authors propose a novel cluster- singular-value decomposition (SVD) which factorizes
ing algorithm which is based on the K-means algorithm the data matrix into 3 smaller matrices.
and incorporates gene information from the Gene On- A self-organizing map (SOM) can also be used
tology into the clustering process to obtain more bi- for dimensionality reduction. It uses an ANN that is
ologically meaningful clusters. Similarly, in [84], the trained using unsupervised learning to produce a low-
K-means clustering has been enhanced to obtain better dimensional representation of the input space of the
performance related to cancer subtype prediction from training samples, called a map. The authors of [88]
gene expression data. In [85], both the K-means and describe the PCA based methods in bioinformatics stud-
Markov clustering algorithm were used to identify key ies. In [89], SVD is used for pathway level analysis of
genes of interest to the study. gene expression. In [90], an improved prediction of
Unlike the partitional clustering techniques which protein functional associations through singular value
attempt to place each data item in exactly one clus- decomposition of phylogenetic profiles is presented.
ter, i.e., non-overlapping cluster, hierarchical clustering
is an approach that allows for subclusters, i.e., a set 2.3.1.3 Semi-Supervised Machine Learning
of nested clusters, that are organized as a tree. There In many real-world machine learning problems, typi-
are two types of hierarchical clustering: divisive and cally a small amount of labeled data and a large dataset
agglomerative. With agglomerative hierarchical cluster- of unlabeled data is available. Often a large amount of
ing, the algorithms initially assign all data points to a unlabeled data is easily acquired but the cost associ-
cluster of their own. Then the two nearest clusters are ated with labeling data is high for generating labeled
merged into one to form a subcluster. The algorithm training dataset for supervised learning. It is therefore
iterates until finally there is a single cluster. The result desirable to combine the explicit classification infor-
mation of labeled data and the information in the un-
of the clustering can be visualized as a dendrogram.
labeled data to construct an accurate learning model.
An example of the agglomerative hierarchical cluster-
Semi-supervised learning is a combination of super-
ing is the single-linkage clustering (SLC). A divisive hi-
vised and unsupervised learning which aims to ad-
erarchical clustering starts with all data points as one
dress such cases. Several algorithms have been proposed
cluster and then splits the cluster recursively into sub-
for semi-supervised learning, such as the EM based
clusters until, finally, subclusters consisting of only one
algorithms [91], self-training [92], co-training [93],
data point remain. In [86], a hierarchical clustering ap-
semi-supervised SVM (S3VM) [94], graph-based meth-
proach has been adopted whereby many hierarchical
ods [95], and boosting based semi-supervised learning
organizations of gene clusters corresponding to some
methods [96]. Deep generative models are also being
subhierarchies in gene ontology were successfully cap- used for semi-supervised learning [97,98].
tured. Another example of methods for cluster analysis Semi-supervised learning is widely adopted in bioin-
is a self-organizing map (SOM) which uses neural net- formatics. In [99], to be able to make the most out
works. of vast amount of microarray data that do not have
Another unsupervised learning task consists of di- sufficient follow-up information, the low-density sep-
mension reduction. Extremely large datasets of un- aration (LDS) semi-supervised learning technique has
labeled data are available in bioinformatics. These been applied to predict the recurrence risk in cancer
datasets may contain thousands of records with nu- patients. Similarly, in [100], the authors use the har-
merous attributes or features. Working with such large monic Gaussian, graph-based semi-supervised learning
numbers of high dimensional records of data is quite algorithm to predict disease genes that considers the im-
computing intensive, and often a lot of data is redun- balance between known disease genes and unknown
dant [87]. Dimensionality reduction refers to methods disease genes. Given the little amount of annotation
used to reduce or combine the complexity of the data of functional and structural attributes of protein se-
by using fewer features while keeping as much relevant quence data, in [101], it is shown that classification
structure as possible, i.e., with minimal loss of infor- based semi-supervised learning can increase the over-
mation. Often dimension reduction is done along the all accuracy of classifying partly labeled data to improve
data analytics pipeline before applying a supervised ma- predictive performance. In [102], the authors have in-
chine learning algorithm. Two popular algorithms used vestigated the use of semi-supervised learning and suc-
to reduce dimensionality are the principal component cessfully improving the amount of recall in the BioNLP
analysis (PCA), which aims to find the linear combina- Gene Regulation Network task. Likewise, to address the
tion that best preserves the variance in the data; and the task of gene regulatory network reconstruction from
CHAPTER 2 Big Data Analytics and Deep Learning in Bioinformatics With Hadoop 29
high-throughput data, in [103], the authors exploited hearing and speech translation, and automatic detec-
an iterative, semi-supervised ensemble-based algorithm tion of cancer cells.
for making inference on a prokaryotic model organism
(E. coli) and on a eukaryotic model organism (S. cere- 2.3.2.1 Artificial Neural Networks (ANNs)
visiae). Their approach shows improved performance as The way humans learn and process information is con-
compared to other techniques. trolled by the nervous system, which is made up of
neurons and the different connections between the neu-
2.3.1.4 Reinforcement Learning rons. ANNs consist of neurons connected to each other.
Reinforcement learning is a computational approach However, the ANNs have discrete layers (cascades of
of learning from action in the absence of a training nonlinear processing unit layers), connections and di-
dataset, i.e., learning from experience by trial and er- rections of data propagation. The output of one layer is
ror to determine which actions yield the greatest reward. used as input of the next layer. The simplest ANN con-
Reinforcement learning consists of three primary com- sist of three layers: the input layer, a hidden layer and an
ponents: (i) the agent (learning agent); (ii) the environ- output layer as shown in Fig. 2.9. The circles represent
ment (agent interacts with environment); and (iii) the the neurons, and the arrows represent the connections
actions (agents can take actions). An agent learns from between the different neurons. For more complex tasks,
the environment by interacting with it and receiving re- the ANNs will be composed of additional hidden lay-
wards for performing actions. Such learning is goal or ers. Fig. 2.10 depicts an ANN with two hidden layers.
task oriented; the agent learns how to attain its goal by Multiple hidden layers result in a larger or deeper neu-
taking the best actions so as to maximize the reward ral network, which usually results in enhanced learning
over a given time period. A task can be either episodic capability.
or continuous. Episodic tasks have a starting point and
an ending point (terminal state), whereas continuous
tasks are those that have no terminal state, i.e., agent
will continuously run until explicitly stopped. Rein-
forcement learning is often used for robotics and gam-
ing. Two popular methods of reinforcement learning
are the Monte Carlo and the temporal difference learn-
ing method. In bioinformatics, reinforcement learning
has been used for solving the fragment assembly prob-
lem [104], the bidimensional protein folding problem
[105], the RNA reverse folding [106], and the 2D-HP
protein folding problem [107], amongst others.
FIG. 2.9 Simple artificial neural network.
2.3.2 Deep Learning and Neural Networks
Deep learning is a subfield of machine learning focusing The ANN has certain parameters, i.e., each of the
on algorithms, which attempts to imitate the function connections has a number associated with it called the
of the human brain for learning. A deep learning ar- connection weight; the neurons have a threshold value
chitecture is thus inspired by the brain’s structure of and an activation function associated to it. Initially, the
neural networks. With the huge computing power avail- ANN is trained with labeled data, i.e., inputs and their
able today (e.g., graphics processing units GPUs), deep expected correct outputs. The ANN runs the inputs with
learning is powered by artificial neural networks (ANN) certain values of the parameters, the results obtained are
for analyzing big data and solving complex problems. then compared with the expected correct results. If the
Neural networks have been around for a while, but the computed results are far apart from the expected correct
modern ANNs are “deep”, i.e., a traditional neural net- results, the ANN adjusts the parameters of the ANN iter-
work typically consists of two or three hidden layers, atively by means of a special training algorithm such as
whereas ANNs nowadays can have as many as 150 lay- the gradient descent, or back propagation until the com-
ers. It is possible to use ANN to build and train models puted outputs are as close as possible to the expected
in a time efficient manner. With a deep learning model, correct outputs. This demonstrates the learning process
the algorithms not only learn but can also determine on of the ANN. After the training phase, when new inputs
their own if a prediction is accurate or not. Applications are run by the ANN, there is a high confidence that the
of deep learning include automatic driving, automatic predicted outputs will be close to the actual outputs.
30 Deep Learning and Parallel Computing Environment for Bioengineering Systems
The ANN structure is independent of the task, that is, ter insights and intelligence. In the Hadoop framework,
the neutral network designed for one task can be used data is stored on a cluster of nodes. Machine learning
for another task except that the parameters may have to tools can be easily integrated in the Hadoop ecosystem
be reconfigured as they may be different, and the ANN to exploit the scalability, reliability and resource pool-
will have to be retrained for the other task. ing characteristic offered by the distributed storage and
Typically, neural networks can be classified into processing solution offered by Hadoop. The most pop-
one of the following four types: classification neural ular machine learning tools that can be integrated with
network, prediction neural network, clustering neural Hadoop are Apache Mahout, Spark MLlib, and H20.
network and association neural network. Several neu-
ral network architectures are suitable for deep learn- 2.3.3.1 Spark, Mahout and MLlib
ing, namely feed-forward neural network (FNN), recur- Spark supports iterative computation and has improved
rent neural network (RNN), recursive neural network, processing speed compared to MapReduce as it utilizes
convolutional neural network (CNN), deep belief net- in-memory computation using the resilient distributed
works, and convolutional deep belief networks [108]. datasets (RDDs). The RDDs store data in memory for
An FNN has no feedback connections whereby outputs fast access to data during computation and provide
of the model can be fed back into the system. RNNs fault tolerance [110]. An RDD is an immutable dis-
are architectures whereby the outputs are looped back tributed collection of key–value pairs of data, stored
into the system. A CNN comprises one or more con- across nodes in the cluster. The RDD can be operated in
volutional layers and then is followed by one or more parallel. Moreover, recent versions of Spark also support
fully connected layers. The neurons in one convolu- DataFrames and Datasets that are built on top of RDDs.
tional layer do not connect to all the neurons in the A DataFrame is also an immutable distributed collec-
next layer but only to a small region of it. The inputs of tion of data but where data is organized into named
a CNN are images and thus the layers are organized in 3 columns, similar to a table in a relational database.
dimensions: width, height and depth. The final output Datasets are an extension of DataFrame which provides
will be reduced to a single vector of probability scores, type-safe, object-oriented programming interface. The
organized along the depth dimension. CNNs are very RDD, DataFrame and DataSets API make data process-
effective in areas such as image recognition and clas- ing easy and provide developers with more flexibility.
sification. In [109], applications of neural networks in Using machine learning programming languages such
protein bioinformatics is discussed and summarized. as Python, R, Java, or Scala, machine learning algo-
Protein bioinformatics applications include the predic- rithms and applications can easily be implemented with
tion of protein structure, binding sites and ligands, and Spark.
protein properties. The popular NN architectures iden- However, Spark is shipped with MLlib library for
tified for protein bioinformation are the FNN and RNN. machine learning. Prior to MLlib, Apache Mahout was
typically used for machine learning on Hadoop. How-
2.3.3 Machine Learning and Hadoop ever, Mahout is built atop MapReduce which is slower
Big data analytics requires algorithms based on machine compared to MLlib, which runs on Spark and is thus
learning techniques to process data in real-time for bet- faster. Both MLlib and Mahout provide numerous algo-
CHAPTER 2 Big Data Analytics and Deep Learning in Bioinformatics With Hadoop 31
rithms for machine learning such as classification (lo- network to predict asthma severity level or the immi-
gistic regression, linear support vector machine (SVM), nence of an asthma attack [118].
naive Bayes), regression (linear regression), collabora- Theano is another Python library for developing
tive filtering, clustering (k-means), decomposition (sin- deep learning models. To ease the use of Tensorflow
gular value decomposition (SVD)), principal compo- and Theano, which may be difficult, Keras can be used
nent analysis (PCA), though Mahout, being more ma- to quickly develop a deep learning model. Keras is a
ture, has a more extensive library of algorithms [111]. minimalist Python library for deep learning that can
Other machine learning libraries include Scikit- run on top of Theano or Tensorflow. Keras leverages the
Learn, which is a Python library, and H20, which is an Dist-Keras framework for achieving data parallelism on
open-source library. H2O is a fast, scalable, machine Apache Spark.
and deep learning library. Both can be used on Spark Caffe is a machine learning framework that was de-
for analyzing data on a Hadoop cluster. Sparkling Wa- signed with better expression, speed, and modularity as
ter connector allows the integration of H20 algorithms the focus points [119]. It was developed for computer
with the capabilities of the Spark platform [112]. Spark- vision/image classification by leveraging Convolutional
sklearn is the integration package for Scikit-Learn with Neural Networks (CNNs). CaffeOnSpark can be used to
Apache Spark. bring deep learning onto Hadoop and Spark clusters.
H2O’s Deep Learning is based only on a multi-layer
2.3.4 Distributed Deep Learning and Hadoop feedforward deep neural network that is trained with
stochastic gradient descent using back-propagation. In
Spark Cluster manager is a platform where Spark can be
[120], using the Extended-Caffe framework, a 3D con-
run. Spark supports three cluster managers: Standalone
volutional neural network was constructed to generate
cluster manager, Hadoop Yarn and Apache Mesos.
lung nodule proposals.
Thus, Spark can run stand-alone on one machine, on
Other deep learning frameworks include Torch,
a Hadoop cluster, or on a Mesos datacenter. Machine
Apache MXNet, Microsoft open-source Cognitive Tool-
learning may be possible on stand-alone Spark. How-
kit (CNTK) and Apache Singa. The latter is primarily
ever, as datasets increase in size and deep neural net-
focused on distributed deep learning using model par-
works grow in complexity, the computational inten-
allelism on a cluster of nodes.
sity and memory demands of deep learning increases,
and a cluster of machines with high-performance is re-
quired [113]. Two approaches may be adopted: data 2.4 CONCLUSIONS
parallelism and model parallelism. Data parallelism in-
Previously, biological data computations were mostly
volves partitioning data equally among several process-
done using HPC based multi-core processing archi-
ing nodes. Each node processes the data independently
tectures. Such computing infrastructure can be quite
in parallel. Data parallelism is more suitable when there
expensive and is not easily available. Moreover, with
is a large amount of data. Data parallelism is supported
the next generation sequencing technologies, massive
by MapReduce and Spark running on a cluster. Model amounts of bioinformatics data are available. Big data
parallelism attempts to partition the machine learning analytics and distributed computing, i.e., cloud com-
model itself. It is more complex and challenging. Model puting, are increasingly adopted in bioinformatics ap-
parallelism is more suitable for large learning models plications whereby a cluster of compute nodes is used
such as deep neural networks. In [114], BigDL, a dis- for processing and analyzing data. The Hadoop big data
tributed deep learning framework for big data based framework is one of the most popular frameworks for
on data parallelism, operating on Spark has been pro- processing big data as it provides fault tolerance, scala-
posed. Most of the deep learning frameworks available bility, and reliability, as well as being cost effective. In
today can be run on a cluster of computers. this chapter, we take a holistic approach to big data
Tensorflow [115] is the most popular framework for analytics and present the big data analytics workflow
deep learning. It’s TensorFlowOnSpark framework sup- with regards to the Hadoop framework. The emergence
ports distributed deep learning on the Spark clusters. of such an approach has changed the context of bioin-
According to [116], TensorFlow is a powerful and flex- formatics computation. We discussed the background
ible gateway to deep learning in biology. In [117], Ten- of Hadoop technology, its core components, as well as
sorflow has been used to implement a neural network other components, which form the Hadoop ecosystem.
for predicting the severity of Parkinson’s disease. Ten- The study shows that bioinformatics is fully embracing
sorflow has also been used to implement a deep neural the Hadoop big data framework.
32 Deep Learning and Parallel Computing Environment for Bioengineering Systems
Another significant technology, which can revolu- Roeland G.W. Verhaak, David W. Kane, Chris Wakefield,
tionize bioinformatics applications, is machine learning John N. Weinstein, Gordon B. Mills, Han Liang, TCPA:
techniques. Machine learning is widely proposed in the a resource for cancer functional proteomics data, Nature
literature for solving bioinformatics problems. The dif- Methods 10 (2013) 1046–1047.
ferent approaches of machine learning algorithms have 8. The ENCODE Project Consortium, An integrated encyclo-
pedia of DNA elements in the human genome, Nature
been presented in this chapter. To address more com-
489 (September 2012) 57–74.
plex problems in bioinformatics, deep learning is also
9. EMBL-European Bioinformatics Institute, EMBL-EBI An-
being used. Eventually, machine learning can be easily nual Scientific Report 2013, 2014.
plugged in the data processing and analysis pipeline of 10. H. Kashyap, H.A. Ahmed, N. Hoque, S. Roy, D.K. Bhat-
the Hadoop framework. It is expected that in the future tacharyya, Big data analytics in bioinformatics: a machine
the use of deep learning in the area of bioinformat- learning perspective, CoRR, arXiv:1506.05101, 2015.
ics will greatly improve the understanding of human 11. National Institutes of Health, Big data to knowledge
genome and help find a cure to numerous diseases. phase I & II, available at https://commonfund.nih.gov/
Finally, bioinformatics being a complex field, no sin- bd2k, June 2018. (Accessed 27 June 2018).
gle computational method will be optimal for every 12. Fabricio F. Costa, Big data in biomedicine, Drug Discovery
dataset and every task. A successful data analysis in this Today 19 (4) (Apr. 2014), Reviews.
area most certainly requires a likely combination of 13. Peter Mell, Tim Grance, The NIST Definition of Cloud
multiple data analysis methods. The Hadoop big data Computing Recommendations of the National Institute
of Standards and Technology, Sept. 2011, Special Publica-
framework, which can be easily coupled with other pro-
tion 800-145.
cessing, analytic or machine learning engine, is found
14. The Apache Hadoop project, http://www.hadoop.org.
to be most suitable. In the future, it is expected that
15. Market Research Future, Hadoop Big Data Analytics Mar-
a combination and orchestration of various solutions ket Research Report – Global Forecast to 2022, Report,
on the Hadoop big data framework would support en- July 2018.
hanced bioinformatic computations. Thus, researchers 16. Sisense, Gartner magic quadrant for analytics and
and practitioners should collaborate to work towards business intelligence platforms, available at https://
this goal. www.sisense.com/gartner-magic-quadrant-business-
intelligence/, Feb. 2018.
17. C. Arindam, Microsoft deepens its commitment to
REFERENCES Apache Hadoop and open source analytics, Mi-
1. D. Laney, 3D Data Management: Controlling Data Vol- crosoft Azure, https://azure.microsoft.com/en-us/blog/
ume, Velocity, and Variety, Technical report, META Group, microsoft-reaffirms-its-commitment-to-apache-hadoop-
available at https://blogs.gartner.com/doug-laney/files/ open-source-analytics/, June 2018.
2012/01/ad949-3D-Data-Management-Controlling- 18. Qlik, Why Cloudera + Qlik? Cloudera, https://www.
Data-Volume-Velocity-and-Variety.pdf, 2001, #Internet cloudera.com/partners/solutions/qlik.html.
Live Stats (2015), February 6, 2001. 19. Tableau, Just point at your Hadoop cluster to an-
2. Internet Live Stats. [online] Internetlivestats.com, avail- alyze your data, https://www.tableau.com/solutions/
able at http://www.internetlivestats.com/. (Accessed 24 workbook/hadoop_flavors.
April 2015). 20. WenTai Wu, WeiWei Lin, Ching-Hsien Hsu, LiGang He,
3. Dany, 37 Mind Blowing YouTube Facts, Figures and Statis- Energy-efficient Hadoop for big data analytics and com-
tics – 2018, available at https://merchdope.com/youtube- puting: a systematic review and research insights, Fu-
statistics/, April 2018. ture Generation Computer Systems 86 (Sept. 2018)
4. R.J. Robison, How big is the human genome? in: Preci- 1351–1367, https://doi.org/10.1016/j.future.2017.11.010.
sion Medicine, Jan. 2014. 21. M. Malik, K. Neshatpour, T. Mohsenin, A. Sasan, H.
5. Eugene Rosenberg, The human genome, Ch. 11, in: It’s Homayoun, Big vs little core for energy-efficient Hadoop
in Your DNA. From Discovery to Structure, Function and computing, in: 2017 Design, Automation & Test in
Role in Evolution, Cancer and Aging, 2017, pp. 97–98. Europe Conference & Exhibition (DATE), IEEE, 2017,
6. Matthew Herper, Illumina promises to se- pp. 1480–1485.
quence human genome for $100 but not quite 22. Giuseppe Cattaneo, Raffaele Giancarlo, Umberto Fer-
yet, Forbes (January 2017), available at https:// raro Petrillo, Gianluca Roscigno, MapReduce in computa-
www.forbes.com/sites/matthewherper/2017/01/09/ tional biology via Hadoop and Spark, in: Reference Mod-
illumina-promises-to-sequence-human-genome-for- ule in Life Sciences, Elsevier, 2018.
100-but-not-quite-yet/#2ce9b178386d. 23. Aisling O’Driscoll, Jurate Daugelaite, Roy D. Sleator, ‘Big
7. Jun Li, Yiling Lu, Rehan Akbani, Zhenlin Ju, Paul L. data’, Hadoop and cloud computing in genomics, Journal
Roebuck, Wenbin Liu, Ji-Yeon Yang, Bradley M. Broom, of Biomedical Informatics 46 (5) (Oct. 2013) 774–781.
CHAPTER 2 Big Data Analytics and Deep Learning in Bioinformatics With Hadoop 33
24. Q. Zou, X.B. Li, W.R. Jiang, Z.Y. Lin, G.L. Li, K. Chen, 44. Brendan Lawlor, Richard Lynch, Micheál Mac Aogáin,
Survey of MapReduce frame operation in bioinformatics, Paul Walsh, Field of genes: using Apache Kafka as a bioin-
Briefings in Bioinformatics 15 (4) (Feb. 2013) 637–647, formatic data repository, GigaScience 7 (4) (April 2018),
https://doi.org/10.1093/bib/bbs088. https://doi.org/10.1093/gigascience/giy036.
25. P. Singh, Big genomic data in bioinformatics cloud, Ap- 45. Benjamin Bengfort, Jenny Kim, Data ingestion, Ch. 7, in:
plied Microbiology, Open Access 2 (2016) 113, https:// Data Analytics With Hadoop: An Introduction for Data
doi.org/10.4172/2471-9315.1000113. Scientists, O’Reilly, 2016, pp. 157–173.
26. Lizhen Shi, Zhong Wang, Weikuan Yu, Xiandong Meng, 46. Brendan Lawlor, Richard Lynch, Micheál Mac Aogáin,
A case study of tuning MapReduce for efficient bioinfor- Paul Walsh, Field of genes: using Apache Kafka as a bioin-
matics in the cloud, Parallel Computing 61 (Jan. 2017) formatic data repository, GigaScience 7 (4) (April 2018).
83–95. 47. Francesco Versaci, Luca Pireddu, Gianluigi Zanetti, Kafka
27. Apache Mesos, What is Mesos? A distributed systems ker- interfaces for composable streaming genomics pipelines,
nel, http://mesos.apache.org/. in: 2018 IEEE EMBS International Conference on
28. Apache Myriad, Deploy Apache YARN applications using Biomedical & Health Informatics (BHI), Las Vegas, NV,
Apache Mesos, https://myriad.apache.org/. USA, Mar. 2018.
29. N. Peek, J.H. Holmes, J. Sun, Technical challenges for big 48. Szymon Chojnacki, Web Production Team EBI,
data in biomedicine and health: data sources, infrastruc- Genome campus software community, Apache
ture, and analytics, Yearbook of Medical Informatics 9 Kafka streams API, EMBL-EBI, available at https://
(2014) 42–47. github.com/ebi-wp/kafka-streams-api-websockets/blob/
30. The ENCODE Project Consortium, An integrated encyclo- jdisp/docs/kafka-streams-10-10-2017.pdf, Oct. 2017.
pedia of DNA elements in the human genome, Nature 49. Dillon Chrimes, Hamid Zamani, Using distributed data
489 (September 2012) 57–74. over HBase in big data analytics platform for clini-
31. Encode, ENCODE: Encyclopedia of DNA elements, cal services, Computational and Mathematical Methods
https://www.encodeproject.org/. in Medicine 2017 (2017), https://doi.org/10.1155/2017/
32. O. Gottesman, H. Kuivaniemi, G. Tromp, et al., The elec-
6120820 6120820, 16 pages.
tronic medical records and genomics (eMERGE) network:
50. A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, N.
past, present, and future, Genetics in Medicine 15 (2013)
Zhang, S. Antony, H. Liu, R. Murthy, Hive — a petabyte
761–771.
scale data warehouse using Hadoop, in: Proceedings of
33. National Human Genome Research Institute (NHGRI),
the International Conference on Data Engineering, 2010,
Electronic medical records and genomics (eMERGE)
pp. 996–1005.
network, https://www.genome.gov/27540473/electronic-
51. Hortonworks, Apache pig, https://hortonworks.com/
medical-records-and-genomics-emerge-network/.
apache/pig/#section_1.
34. G. Rustici, N. Kolesnikov, M. Brandizi, et al., ArrayExpress
52. Henrik Nordberg, Karan Bhatia, Kai Wang, Zhong
update – trends in database growth and links to data anal-
Wang, BioPig: a Hadoop-based analytic toolkit
ysis tools, Nucleic Acids Research 41 (2013) D987–D990.
for large-scale sequence data, Bioinformatics
35. ArrayExpress, Functional genomics data, https://www.ebi.
ac.uk/arrayexpress/. 29 (23) (December 2013) 3014–3019, https://
36. dbGaP, https://www.ncbi.nlm.nih.gov/gap. doi.org/10.1093/bioinformatics/btt528.
37. GEO DataSets, https://www.ncbi.nlm.nih.gov/gds. 53. G. Cattaneo, R. Giancarlo, S. Piotto, U. Ferraro Petrillo,
38. K.D. Pruitt, T. Tatusova, D.R. Maglott, NCBI reference G. Roscigno, L. Di Biasi, MapReduce in computational
sequence (RefSeq): a curated non-redundant sequence biology – a synopsis, in: F. Rossi, S. Piotto, S. Concilio
database of genomes, transcripts and proteins, Nucleic (Eds.), Advances in Artificial Life, Evolutionary Computa-
Acids Research 33 (2005) D501–D504. tion, and Systems Chemistry, WIVACE 2016, in: Commu-
39. RefSeq: NCBI reference sequence database, https://www. nications in Computer and Information Science, vol. 708,
ncbi.nlm.nih.gov/refseq/. Springer, Cham, 2017.
40. SEER-medicare linked database, https:// 54. Andréa Matsunaga, Maurício Atsugewi, José Fortes,
healthcaredelivery.cancer.gov/seermedicare/. CloudBLAST: combining MapReduce and virtualization
41. J.L. Warren, C.N. Klabunde, D. Schrag, et al., Overview of on distributed resources for bioinformatics applica-
the SEER-medicare data: content, research applications, tions, in: 2008 IEEE Fourth International Conference on
and generalizability to the United States elderly popula- eScience, Indianapolis, IN, USA, Dec. 2008, pp. 7–12.
tion, Medical Care 40 (2002), IV–3–18. 55. Zhen Meng, Jianhui Li, Yunchun Zhou, Qi Liu, Yong
42. I. Lobo, Basic local alignment search tool (BLAST), Na- Liu, Wei Cao, bCloudBLAST: an efficient mapreduce
ture Education 1 (1) (2008) 215. program for bioinformatics applications, in: 2011 4th
43. W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, The International Conference on Biomedical Engineering
globus striped GridFTP framework and server, in: SC’05: and Informatics (BMEI), vol. 4, Shanghai, China, 2011,
Proceedings of the 2005 ACM/IEEE Conference on Super- pp. 2072–2076.
computing Date of Conference, 12–18 Nov. 2005, Seattle, 56. Rania Ahmed Abdel, Azeem Abul Seoud, Mahmoud
WA, USA, ISBN 1-59593-061-2, 2005. Ahmed Mahmoud, Amr Essam Eldin, BIG-BIO: big data
34 Deep Learning and Parallel Computing Environment for Bioengineering Systems
Hadoop-based analytic cluster framework for bioinfor- Data (Big Data), December 5–8, 2016, Washington, DC,
matics, in: 2017 International Conference on Informat- USA, IEEE, 2016.
ics, Health & Technology (ICIHT), Riyadh, Saudi Arabia, 67. F. Versaci, L. Pireddu, G. Zanetti, Distributed stream
Feb. 2017, ISBN Information: INSPEC Accession Number: processing for genomics pipelines, PeerJ Preprints
16836504. 5 (e3338v1) (2017), https://doi.org/10.7287/peerj.
57. Tadist Khawla, Mrabti Fatiha, Zahi Azeddine, Najah Said, preprints.3338v1.
A Blast implementation in Hadoop MapReduce using low 68. B. Gavin, Analytics network – O.R & analytics, available
cost commodity hardware, Procedia Computer Science at https://www.theorsociety.com/Pages/SpecialInterest/
127 (2018) 69–75. AnalyticsNetwork_analytics.aspx, 2013.
58. Guan-Jie Hua, Chuan Yi Tang, Che-Lun Hung, Yaw-Ling 69. Kalyan Nagaraj, G.S. Sharvani, Amulyashree Sridhar,
Lin, Cloud computing service framework for bioinfor- Emerging trend of big data analytics in bioinformatics:
matics tools, in: 2015 IEEE International Conference on a literature review, International Journal of Bioinformat-
Bioinformatics and Biomedicine (BIBM), Washington, ics Research and Applications (ISSN 1744-5485) 14 (Jan.
DC, USA, IEEE, Nov. 2015, pp. 9–12. 2018) 144–205, EISSN: 1744-5493.
59. Lizhen Shi, Zhong Wang, Weikuan Yu, Xiandong Meng, A 70. Mark Harwood, Uncoiling the data in DNA with
case study of tuning MapReduce for efficient bioinformat- elasticsearch, big data zone, https://dzone.com/articles/
ics in the cloud, Parallel Computing (ISSN 0167-8191) uncoiling-the-data-in-dna-with-elasticsearch, June 2016.
61 (2017) 83–95, https://doi.org/10.1016/j.parco.2016. 71. Bernard Marr, 27 incredible examples of AI and ma-
10.002. chine learning in practice, Forbes (April 2018), https://
60. L. Forer, T. Lipic, S. Schonherr, H. Weisensteiner, D. Davi- www.forbes.com/sites/bernardmarr/2018/04/30/27-
dovic, F. Kronenberg, E. Afgan, Delivering bioinformat- incredible-examples-of-ai-and-machine-learning-in-
ics MapReduce applications in the cloud, in: Information practice/#7168f1f17502.
72. Rob van der Meulen, 5 ways data science and machine
and Communication Technology Electronics and Micro-
learning impact business, in: Smarter with Gartner, Febru-
electronics (MIPRO) 2014 37th International Convention
ary 2018, https://www.gartner.com/smarterwithgartner/
on, 2014, pp. 373–377.
5-ways-data-science-and-machine-learning-impact-
61. Nafis Neehal, Dewan Ziaul Karim, Ashraful Islam, Cloud-
business/.
POA: a cloud-based map only implementation of PO-
73. H. Bhaskar, D.C. Hoyle, S. Singh, Intelligent technologies
MSA on Amazon multi-node EC2 Hadoop Cluster, in:
in medicine and bioinformatics, Computers in Biology
2017 20th International Conference of Computer and In-
and Medicine 36 (2006) 1104.
formation Technology (ICCIT), 22–24 Dec. 2017, Dhaka,
74. B.A. McKinney, D.M. Reif, M.D. Ritchie, J.H. Moore, Ma-
Bangladesh, 2017.
chine learning for detecting gene–gene interactions: a re-
62. Matti Niemenmaa, Aleksi Kallio, André Schumacher,
view, Applied Bioinformatics 5 (2006) 77.
Petri Klemelä, Eija Korpelainen, Keijo Heljanko, Hadoop- 75. Y. Liu, K. Gadepalli, M. Norouzi, G.E. Dahl, T. Kohlberger,
BAM: directly manipulating next generation sequenc- A. Boyko, S. Venugopalan, A. Timofeev, P.Q. Nelson, G.S.
ing data in the cloud, Bioinformatics Applications Corrado, J.D. Hipp, L. Peng, M.C. Stumpe, Detecting Can-
Note 28 (6) (2012) 876–877, https://doi.org/10.1093/ cer Metastases on Gigapixel Pathology Images, 2017.
bioinformatics/bts054. 76. Pooja Dixit, Ghanshyam I. Prajapati, Machine learning
63. R. Guo, Y. Zhao, Q. Zou, X. Fang, S. Peng, Bioinformatics in bioinformatics: a novel approach for DNA sequenc-
applications on Apache Spark, GigaScience 7 (8) (2018), ing, in: 2015 Fifth International Conference on Advanced
https://doi.org/10.1093/gigascience/giy098. Computing & Communication Technologies, February
64. N. Yu, B. Li, Y. Pan, A cloud-assisted application 21–22, 2015, Haryana, India, 2015.
over Apache Spark for investigating epigenetic mark- 77. R.S. Olson, W. La Cava, Z. Mustahsan, A. Varik, J.H.
ers on DNA genome sequences, in: Big Data and Moore, Data-driven advice for applying machine learn-
Cloud Computing (BDCloud), Social Computing and ing to bioinformatics problems, in: Pacific Symposium
Networking (SocialCom), Sustainable Computing and on Biocomputing, vol. 23, 2018, pp. 192–203.
Communications, (SustainCom) (BDCloud-SocialCom- 78. Pierre Geurts, Alexandre Irrthum, Louis Wehenkel, Super-
SustainCom), 2016 IEEE International Conferences on, vised learning with decision tree-based methods in com-
IEEE, 2016, pp. 67–74. putational and systems biology, Molecular BioSystems
65. Max Klein, Rati Sharma, Chris H. Bohrer, Cameron 5 (12) (2009) 1593, https://doi.org/10.1039/b907946g.
M. Avelis, Elijah Roberts, Biospark: scalable analysis of 79. Ramón Díaz-Uriarte, Sara Alvarez de Andrés, Gene selec-
large numerical datasets from biological simulations and tion and classification of microarray data using random
experiments using Hadoop and Spark, Bioinformatics forest, BMC Bioinformatics 7 (1) (2006) 1.
33 (2) (January 2017) 303–305, https://doi.org/10.1093/ 80. C. Devi Arockia Vanitha, D. Devaraj, M. Venkatesulu,
bioinformatics/btw614. Gene expression data classification using support vector
66. Francesco Versaci, Luca Pireddu, Gianluigi Zanetti, Scal- machine and mutual information-based gene selection,
able genomics: from raw data to aligned reads on Apache Procedia Computer Science (ISSN 1877-0509) 47 (C)
YARN, in: 2016 IEEE International Conference on Big (2015) 13–21.
CHAPTER 2 Big Data Analytics and Deep Learning in Bioinformatics With Hadoop 35
81. M.R. Segal, K.D. Dahlquist, B.R. Conklin, Regression ap- 93. J. Tanha, M. van Someren, H. Afsarmanesh, Dis-
proaches for microarray data analysis, Journal of Compu- agreement-based co-training, in: Tools with Artificial In-
tational Biology 10 (6) (2003) 961–980. telligence (ICTAI), 2011, pp. 803–810.
82. Pedro Larrañaga, Borja Calvo, Roberto Santana, Concha 94. K. Bennett, A. Demiriz, Semi-supervised support vector
Bielza, Josu Galdiano, Iñaki Inza, José A. Lozano, Rubén machines, in: Proceedings of the 1998 Conference on Ad-
Armañanzas, Guzmán Santafé, Aritz Pérez, Victor Robles, vances in Neural Information Processing Systems (NIPS),
Machine learning in bioinformatics, Briefings in Bioin- vol. 11, MIT Press, Cambridge, 1999, pp. 368–374.
formatics 7 (1) (March 2006) 86–112, https://doi.org/10. 95. M. Belkin, P. Niyogi, V. Sindhwani, Manifold regulariza-
1093/bib/bbk007. tion: a geometric framework for learning from labeled
83. G. Macintyre, J. Bailey, D. Gustafsson, A. Boussioutas, I. and unlabeled examples, Journal of Machine Learning Re-
Haviv, A. Kowalczyk, Gene ontology assisted exploratory search 7 (2006) 2399–2434.
microarray clustering and its application to cancer, in: 96. J. Tanha, M. Van Someren, H. Afsarmanesh, Boosting for
M. Chetty, A. Ngom, S. Ahmad (Eds.), Pattern Recogni- multiclass semi-supervised learning, Pattern Recognition
tion in Bioinformatics, PRIB 2008, in: Lecture Notes in Letters 37 (2014) 63–77.
Computer Science, vol. 5265, Springer, Berlin, Heidel- 97. Andre S. Yoon, Taehoon Lee, Yongsub Lim, Deokwoo
berg, 2008. Jung, Philgyun Kang, Dongwon Kim, Keuntae Park,
84. N. Nidheesh, K.A. Abdul Nazeer, P.M. Ameer, An en- Yongjin Choi, Semi-supervised learning with deep gen-
hanced deterministic K-means clustering algorithm for erative models for asset failure prediction, in: KDD17
cancer subtype prediction from gene expression data, Workshop on Machine Learning for Prognostics and
Computers in Biology and Medicine 91 (December Health Management, August 13–17, 2017, Halifax, Nova
2017) 213–221, https://doi.org/10.1016/j.compbiomed. Scotia, Canada, Sept. 2017.
2017.10.014. 98. Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed,
85. B.A. Rosa, S. Oh, B.L. Montgomery, J. Chen, W. Qin, Com- Max Welling, Semi-supervised learning with deep gener-
puting gene expression data with a knowledge-based gene ative models, in: Proceedings of Neural Information Pro-
clustering approach, International Journal of Biochem- cessing Systems (NIPS), Oct. 2014.
istry and Molecular Biology 1 (1) (2010) 51–68. 99. Mingguang Shi, Bing Zhang, Semi-supervised learning
86. Jinze Liu, Wei Wang, Jiong Yang, A framework for improves gene expression-based prediction of cancer
ontology-driven subspace clustering, in: Proceedings of recurrence, Bioinformatics 27 (21) (November 2011)
the Tenth ACM SIGKDD International Conference on 3017–3023, https://doi.org/10.1093/bioinformatics/
Knowledge Discovery and Data Mining (KDD’04), ACM, btr502.
New York, NY, USA, 2004, pp. 623–628. 100. Thanh Phuong Nguyen, Tu Bao Ho, A semi-supervised
87. M. Verleysen, D. François, The curse of dimensionality in learning approach to disease gene prediction, in: 2007
data mining and time series prediction, in: J. Cabestany, IEEE International Conference on Bioinformatics and
A. Prieto, F. Sandoval (Eds.), Computational Intelligence Biomedicine (BIBM 2007), IEEE, November 2007.
and Bioinspired Systems, IWANN 2005, in: Lecture Notes 101. Brian R. King, Chittibabu Guda, Semi-supervised learn-
in Computer Science, vol. 3512, Springer, Berlin, Heidel- ing for classification of protein sequence data, Scientific
berg, 2005. Programming 16 (1) (January 2008), https://doi.org/10.
88. S. Ma, Y. Dai, Principal component analysis-based meth- 1155/2008/795010 5–29.
ods in bioinformatics studies, Briefings in Bioinformatics 102. Thomas Provoost, Marie-Francine Moens, Semi-
12 (6) (Nov. 2011) 714–722, https://doi.org/10.1093/bib/ supervised learning for the BioNLP gene regulation
bbq090, Epub January 17, 2011. network, BMC Bioinformatics 16 (Suppl 10) (2015),
89. John Tomfohr, Jun Lu, Thomas B. Kepler, Pathway level https://doi.org/10.1186/1471-2105-16-S10-S4 S4.
analysis of gene expression using singular value decom- 103. M. Ceci, G. Pio, V. Kuzmanovski, S. Džeroski, Semi-
position, BMC Bioinformatics 6 (2005), https://doi.org/ supervised multi-view learning for gene network re-
10.1186/1471-2105-6-225 225. construction, in: Thomas Wennekers (Ed.), PLoS ONE
90. Andrea Franceschini, Jianyi Lin, Christian von Mer- 10 (12) (December 2015), https://doi.org/10.1371/
ing, Lars Juhl Jensen, SVD-PHY: improved prediction of journal.pone.0144031 e0144031.
protein functional associations through singular value 104. Maria-Iuliana Bocicor, Gabriela Czibula, Istvan-Gergely
decomposition of phylogenetic profiles, Bioinformatics Czibula, A reinforcement learning approach for solving
32 (7) (April 2016) 1085–1087, https://doi.org/10.1093/ the fragment assembly problem, in: 2011 13th Inter-
bioinformatics/btv696. national Symposium on Symbolic and Numeric Algo-
91. K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Text clas- rithms for Scientific Computing, September 26–29, 2011,
sification from labeled and unlabeled documents using Timisoara, Romania, 2011.
EM, Machine Learning 39 (2–3) (May 2000) 103–134. 105. Gabriela Czibula, Maria-Iuliana Bocicor, Istvan-Gergely
92. Y. Li, C. Guan, H. Li, Z. Chin, A self-training semi- Czibula, A reinforcement learning model for solving
supervised SVM algorithm and its application in an EEG- the folding problem, International Journal of Computer
based brain computer interface speller system, Pattern Applications in Technology (ISSN 2229-6093) (2017)
Recognition Letters 29 (2008) 1285–1294. 171–182.
36 Deep Learning and Parallel Computing Environment for Bioengineering Systems
106. Parastou Kohvaei, Reinforcement Learning Techniques 114. Jason Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang,
in RNA Inverse Folding, Master’s Thesis, Albert-Ludwigs Yanzhang Wang, Xianyan Jia, Cherry Zhang, Yan Wan,
Universität Freiburg, Aug. 2015. Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan
107. Berat Doğan, Tamer Ölmez, A novel state space represen- Wu, Yang Wang, Yuhao Yang, Bowen She, Dongjie Shi, Qi
tation for the solution of 2D-HP protein folding prob- Lu, Kai Huang, Guoqiong Song, BigDL: a distributed deep
lem using reinforcement learning methods, Applied Soft learning framework for big data, arXiv:1804.05839, 2018.
Computing (ISSN 1568-4946) 26 (Jan. 2015) 213–223. 115. TensorFlow, An open source machine learning framework
108. Qingchen Zhang, Laurence T. Yang, Zhikui Chen, Peng for everyone, https://www.tensorflow.org/.
Li, A survey on deep learning for big data, Information
116. Ladislav Rampasek, Anna Goldenber, TensorFlow: biol-
Fusion (ISSN 1566-2535) 42 (2018) 146–157.
ogy’s gateway to deep learning? Cell Systems 2 (January
109. K. Chen, L.A. Kurgan, Neural networks in bioinformatics,
2016), https://doi.org/10.1016/j.cels.2016.01.009.
in: G. Rozenberg, T. Bäck, J.N. Kok (Eds.), Handbook of
Natural Computing, Springer, Berlin, Heidelberg, 2012, 117. Srishti Grover, Saloni Bhartia, Akshama, Abhilasha Ya-
pp. 566–583. dav, K.R. Seeja, Predicting severity of Parkinson’s disease
110. M. Zaharia, M. Chowdhury, T. Das, A. Dave, Fast and in- using deep learning, Part of special issue International
teractive analytics over Hadoop data with Spark, USENIX Conference on Computational Intelligence and Data Sci-
Login 37 (4) (2012) 45–51. ence, Procedia Computer Science 132 (2018) 1788–1794,
111. Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, https://doi.org/10.1016/j.procs.2018.05.154.
Tawfiq Hasanin, A survey of open source tools for ma- 118. Quan Do, Tran Cao Son, Jamil Chaudri, Classification
chine learning with big data in the Hadoop ecosystem, of asthma severity and medication using TensorFlow
Journal of Big Data 2 (Dec. 2015) 24, https://doi.org/10. and multilevel databases, Procedia Computer Science
1186/s40537-015-0032-1. (ISSN 1877-0509) 113 (2017) 344–351, https://doi.org/
112. Michal Malohlava, Nidhi Mehta, Machine learn- 10.1016/j.procs.2017.08.343.
ing with sparkling water: H2O + spark, in: Michal 119. Caffe, Deep learning framework by Blair, http://caffe.
Malohlava, Nidhi Mehta, Brandon Hill, Vinod Iyen- berkeleyvision.org/.
gar (Eds.), Machine Learning with Sparkling Water: 120. Hui Wu, Matrix Yao, Albert Hu, Gaofeng Sun, Xiao Kun,
H2O + Spark, H2O.ai, Inc., Feb. 2016, https://
Yu Jian Tang, A systematic analysis for state-of-the-art 3D
h2o-release.s3.amazonaws.com/h2o/rel-tukey/2/docs-
lung nodule proposals generation, Part of special issue:
website/h2o-docs/booklets/SparklingWaterVignette.pdf.
Recent Advancement in Information and Communica-
113. Bilal Jan, Haleem Farman, Murad Khan, Muhammad
tion Technology: Proceedings of the International Con-
Imran, Ihtesham Ul Islam, Awais Ahmad, Shaukat Ali,
ference of Information and Communication Technology
Gwanggil Jeon, Deep learning in big data analytics: a
comparative study, Computers & Electrical Engineering – 2018, Procedia Computer Science 131 (2018) 302–310.
(ISSN 0045-7906) (2017).
CHAPTER 3
3.1 INTRODUCTION modalities, since data formations are not similar and
To impart exhaustive medical information to clini- statistically not correlated. In order to enhance the per-
cians for effective diagnosis and treatment, the image formance in real time applications, image fusion for
fusion of different modalities has been of concern guiding diagnosis and disease prognosis [4] has been
in medical image analysis. Multi-modal medical im- addressed to assist doctors in making decisions because
ages can be categorized into functional and anatomi- of limited human exposition of medical images.
cal imaging. Positron emission computed tomography To overcome the above difficulties in image fusion,
(PET) and single photon emission computed tomogra- deep learning (DL) based methods, and in recent years
phy (SPECT) are the two types of functional imaging convolutional neural network (CNN) based techniques,
modality that render metabolic information without have been widely employed in the study of natural
any anatomical context. Anatomical imaging modalities super-resolution image or video processing [5]. The
such as sonography imaging, computed tomography great extent of deep learning approaches is due to its
(CT), and magnetic resonance imaging (MRI), represent strong capability in describing complex relationship
the morphologic details of a human body. In multi- among various signals. In [6], to measure the local
modal medical image fusion, a new fused image is similarity between given two source image patches, a
formed by affiliating complimentary features of func- CNN-based method has been presented and the re-
tional and anatomical modalities. The possibility of sults revealed the advantages of the CNN-based ap-
delivering bone information mutual to the normal and proaches over conventional image fusion techniques.
pathological soft tissue information can be extended by Henceforth, the direction of CNN for designing effec-
the fusing MRI and CT images [1]. The fusion of an tive fusion methods seemed propitious.
anatomical image with a functional image such as PET
with MRI or PET with CT is preferred in localization
for radiation therapy treatment planning and for tumor 3.2 IMAGE FUSION
segmentation in oncology [2]. Before consolidating the The motivation of medical image fusion from different
information obtained from different modalities, proper modalities is to obtain a high quality image by intel-
alignment of input images is paramount, and this pro- ligently combining the collected essential information
cess is referred to as image registration. In the registra- from multi-modal input images [4]. In general, a clas-
tion process, each pixel location in a reference image sification of image fusion, as sketched in Fig. 3.1, de-
can be mapped to a new location in the image that is to scribes three basis levels: feature, pixel, and decision
be registered. The optimality criterion of the mapping levels [7,8]. Fusion based on pixels is the simplest im-
depends on the anatomy of the two input images that age fusion method where the fusion is performed at the
need to be matched. A concise summary of different pixel level to merge the physical parameters. The limi-
medical imaging modalities is given in Table 3.1 [3]. tation in pixel based methods is the effect of large pixel
Further, for professing the vital objective of image intensity variations on the resultant image. At the de-
fusion, a few necessary actions should be considered: cision level, each input is processed separately and the
essential information present in any of the source im- information is extracted; then, based on decision rules,
ages should not be discarded, any artifacts or incom- the extracted features are combined. Thus, in a high-
patibilities have to be eliminated, and the final image level fusion scheme, considerable precision in making
upon fusion must be robust and also have authentic decisions is achieved by the support of feature based de-
information. However, the hindrances for research in cision analysis [9]. Henceforth, the fusion at the highest
image fusion are generally image artifacts, rendering es- level renders the ground work of controls where the ex-
sential features of each modality and similarity between tracted features are the inputs and, in order to meet the
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00010-5 37
Copyright © 2019 Elsevier Inc. All rights reserved.
38 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 3.1
Multi-modal imaging modalities.
S.NO Modality Functions
1 CT SCAN Principle: Image data are obtained using X-ray equipment from different angles around the
human body.
Outcomes: Cross-section of all types of body tissue and organ.
Usage: Best suited for studying the chest and abdomen.
2 MRI SCAN Principle: Uses magnetic and radio waves to get a detailed picture of the inside of a body, and
hence it is free from X-rays or damaging forms of radiation.
Outcomes: T1 weighted images render excellent anatomic detail, and T2 weighted images
provide excellent contrast between normal and abnormal tissues.
Usage: Best suited for finding brain tumors and for examining spinal cord.
3 SPECT SCAN Principle: Works on the principle of nuclear medicine used in both diagnostic and therapeutic
procedures.
Outcomes: Delivers details about the blood flow, temperature of the body, etc.
Usage: Diagnosis of the kidneys, heart, brain, etc., can be done, and also used for detecting
tumors.
specific decisions, the extracted features are classified us- image as shown in Fig. 3.2. Image registration involves
ing a cluster of classifiers. four processes as depicted in Fig. 3.3.
3.3.1.4 Resampling
The transformation procedure of the floating image can
be done in a forward or backward manner. By the use of
the estimated mapping functions, every pixel from the
sensed image can be transformed directly.
TABLE 3.2
Comparison of feature detectors [15].
S.NO Detectors Year Performance Computation time
1 Moravec Corner 1980 Poor Poor
2 Harris Corner 1988 Good Fair
3 SUSAN 1997 Better Better
4 Harris-Affine 2004 Best Better
5 SIFT 2004 Best Better
6 SURF 2006 Good Excellent
7 FAST 2006 Good Good
8 BRISK 2011 Good Good
9 ORB 2011 Good Excellent
3.3.2.1 SURF Based Registration ture matching results and their computation time is less
SURF feature descriptor was proposed by H. Bay [11], compared to SURF.
which works on the principle of Gaussian scale space
analysis. SURF is considered over SIFT because of its fast 3.3.2.3 Implementation
computation time, which entitles various ongoing ap- As an example, image registration based on SURF and
plications, such as object tracking, image fusion, object BRISK are experimented in MATLAB 2017, and the re-
detection and image mosaicing [14]. The SURF descrip- sults are sketched in Figs. 3.6 and 3.8. Here MRI T1
tor exploits integral images [16] to speed-up the feature and T2 are taken as input images, as shown in Fig. 3.4,
detection time [17]. The implementation of SURF algo-
rithm includes following fundamental steps:
• Selecting salient features such as blobs, edges, in-
tersections at particular regions and corners in the
integral image. SURF uses fast-Hessian detector for
detecting feature points.
• Using descriptor to portray the surrounding neigh-
borhood of each feature point. This feature vector
must be unique. Simultaneously, the selected feature
vector ought to be resilient to errors, geometric de-
formations and noise.
• Orientation assignment of key point descriptor is im-
plemented by calculating the Haar wavelet responses
along the coordinates of an image.
• Finally, SURF matching is done by adopting the near-
est neighbor approach.
where T1 is considered as a reference image and T2 liers and outliers of both algorithms are given in
is the floating image. The transformation is applied Figs. 3.5 and 3.7. The detailed study on SURF and
on the T2 image with reference to T1. In both cases, BRISK feature detectors can be found in [19], and SURF
the algorithm employed to remove the outliers is ran- based medical image registration can be studied in
dom sample consensus (RANSAC). The matched in- [20].
3.4 EXISTING IMAGE FUSION METHODS – methods are supported by machine learning, neural
OVERVIEW networks and fuzzy logic [23]. On the decision level, the
The dimensionality reduction methods frequently em- weight map is constructed by measuring activity levels
ployed without any restraint comprise principal compo- of each wavelet coefficients and, based on the resultant
nent analysis (PCA), independent component analysis weight map, the fusion rule is framed. Finally, based on
(ICA), intensity–hue–saturation (HIS), and methods of the constructed fusion rule, the images are fused.
multi-resolution based analysis. When any of hybrid Multi-scale transforms (MST), like contourlets,
methods are implemented by combining two among wavelets and curvelets, have been considered to be un-
the above mentioned methods, it will result in good fitting in case of locating directional features. Hence, the
spatial resolution without any color distortion. In [21], use of shearlets in image fusion enabled researchers to
the pixel level fusion activities are reorganized in four observe anisotropic features and capture sharp transi-
families: model-based algorithms, component substitu- tions at different scales and orientation [24]. The edge
tion (CS), multi-resolution analysis (MRA), and hybrid representations are more evident in shearlet based ap-
methods combining CS and MRA. Indeed, the prob- proaches than in traditional MST methods. But the
lem of image registration can be solved by using meth- problem of shift invariance is not imparted by shear-
ods of MRA including contourlet, ridgelet, curvelet, lets because of the subsampling process. The chal-
shearlet learning-based approaches. The spatial distor- lenges involved in shearlets were surmounted by non-
tion in the fused image due to spatial correlation be- subsampling shearlet transform (NSST) [25]. With ref-
tween image pixels is the drawback of pixel level meth- erence to the above discussions, the challenges that
ods. In our opinion, the low-level methods like pixel endured in traditional image fusion research can be out-
level fusion are efficient in computation and simple lined based on different aspects:
to execute with high rendition of authentic informa- • Obstruction in formulating image transforms and fu-
tion, yet they are prone to misalignment of images and sion strategies to prosecute state-of-the-art results.
noise. • Inadequacy of image fusion methods for successful
The various algorithms based on pixel and feature image representation.
levels are compared in [22]. The enhancement of a • Lack of standard fusion metrics for result evaluation.
fused image with high spatial information is achieved The above mentioned difficulties of traditional medi-
by performing fusion at the feature level. The first pro- cal image fusion techniques can be excluded by imple-
cess to be encountered in the feature level is the extrac- menting a fusion mechanism in deep learning.
tion of objects, and then features with similar attributes
are fused together in order to extend the classification
performance. Finally, at the interpretation level, after 3.5 DEEP LEARNING
preliminary classification of each data source, the fea-
tures are converged. Other prominent approaches for Machine learning techniques, a subject of artificial in-
feature level medical image fusion are summarized in telligence, have revolutionized the computer vision re-
[8], which includes neural networks, fuzzy logic, trans- search field as a determinant factor to upgrade perfor-
formations in multi-scale images, and classifiers such mance. They have supported image processing based
as support vector machines (SVM). Image fusion based association for decades, and several specialized areas in
on fuzzy logic can be implemented either as a deci- imaging fields like content-based image retrieval, im-
sion level function or as a feature level function. Most age segmentation face recognition, and multimodal-
of the feature processing methods like neural networks ity image fusion were studied. Progressively, these ap-
and SVM exploit wavelets as a fusion function. Such plications of image processing perceive a method in
approaches include neuro-fuzzy wavelet, wavelet-SVM, deep learning (DL), a machine learning topic that for-
wavelet-edge feature, etc. mulates perspicacity of data by segregating multiple
Conventional multi-resolution analysis (MRA)- stages of representation. By extracting high-level fea-
based methods typically perform the following proce- tures from low level features, DL forms a hierarchical
dures: they begin with a decomposition, then apply a description [26]. Architectures of DL can be reshaped
fusion rule, and finally proceed to reconstruction. Activ- by following four networks: convolutional neural net-
ity level measurement, grouping of coefficients, coeffi- works (CNNs), sparse coding, restricted Boltzmann ma-
cient combination and consistency verification are the chines (RBMs), and auto-encoders [27]. Among these
major components that are exploited in a fusion rule. architectures, CNN achieved good results in image fu-
The major application of decision based image fusion sion.
CHAPTER 3 Image Fusion Through Deep Convolutional Neural Network 43
• The construction of a DSCNN is done by stacking resentations is one of fundamental visual understand-
multiple well-trained basic units, and the parameters ing methods in either low- or high-level vision. Since
of the complete network are fine-tuned by means of a CNN is fully convolutional, the convolution is taken
end-to-end. as the similarity metric between detected feature maps
• Images S and R that are ready for fusion are de- F PS of the given patch PS and the F PR feature maps
composed into their corresponding high and low related to the second patch PR as represented by
frequency images, respectively, through the same
DSCNN. Xcorr (F PS , F PR ) = F PS ∗ F PR . (3.1)
• The fusion of high and low frequency images of S
and R is done by using appropriate fusion rules to The advantage of using knowledge transfer from the
obtain the fused high and low frequency images. spatial to frequency domain is that it depends on fully
• The results obtained by the above steps are put back converged pre-trained models and also it does not re-
into a DSCNN in order to reconstruct and get the quire random initialization of weights. This paves the
final fused image. way to a reduction of computation time in transfer
learning and fine tuning.
3.7.2 CNN Based Similarity Learning Let S and R represent the unregistered CT and MRI
In [4], a CNN based image fusion through transfer source images of size m × n and capturing the same
learning has been performed in the shearlet domain. anatomical structure. The fusion framework in Fig. 3.12
The perception of measuring similarity between rep- starts by decomposing source images into approxima-
CHAPTER 3 Image Fusion Through Deep Convolutional Neural Network 45
FIG. 3.13 CNN based image fusion using Laplacian pyramid method.
tion and detailed coefficients with non-subsampled method specified in [35]. During the training process,
shearlet transform (NSST). In the shearlet domain, fu- the spatial size of the input patch is set to 16 × 16 ac-
sion rules are framed in such a way that algorithm cording to the analysis. Based on multi-scale Gaussian
protocols to decide which extracted coefficients should filtering and random sampling, the training examples
be considered for fusion. By fine-tuning the extrac- were created. The softmax loss function is encountered
tion section of CNN, the high-frequency subnets cor- as the optimization objective and, to reduce it, the
respond to the fusion of extracted feature maps which stochastic gradient descent (SGD) algorithm is adopted.
are consolidated. To detect which of the correlation co- The training process is operated on the deep learning
efficients has high impact on the fused subbands, the framework Caffe [36]. Since the Siamese network has a
normalized cross-correlations are performed between fully-connected layer that has pre-defined dimensions
resultant shearlet coefficient maps. Low-frequency co- on input and output data, the size of the input present
efficients are fused based on the calculation of local in the network must have fixed size in order to en-
energy. To obtain the final fused image, the inverse of sure that the input data of a fully-connected layer is
the NSST is applied in accordance to the fused coeffi- fixed.
cients. In medical image fusion, by separating images into
overlapping patches, inconsistency in the size of in-
3.7.3 CNN Based Fusion Using Pyramidal
put images can be managed, and each patch pair is
Decomposition
fed as an input to the DL architecture, but this will
The CNN architecture used in [34] is a Siamese network,
cause a large number of repeated calculations. To rectify
in which the weights of the two branches are restrained
this problem, the fully-connected layer has to be con-
to be the same. Each branch of the CNN contains three
verted to an equivalent convolutional layer containing
layers for performing convolution and one for max-
two kernels of size 8 × 8 × 512 [37]. Once the con-
pooling, which resembles the architecture used in [35].
version is done, the network can process input images
For the reduction of memory consumption, as well as
as a whole image to generate a huge decision map, in
increase of the computational efficiency, the fully con-
which each prediction (a 2D vector) contains the rel-
nected layer has been eliminated in the defined architec-
ative clarity information of the original patch pair at
ture. The 512 feature maps upon progression are directly
connected to a 2D vector. This 2D vector is fed as an its respective position. The result can be simplified as
input to a 2-class softmax layer that provides a probabil- the weight of the first (or second) source, because there
ity distribution over two defined classes. The two classes are only two dimensions in each prediction and their
represent obtained different normalized weight assign- sum is normalized to 1. Finally, a weight map of the
ment results, namely, “first patch 1 and second patch same size as for input images is acquired, by assign-
0” and “first patch 0 and second patch 1”, respectively. ing the value as the weights of all the pixels within
The probability distribution of each class indicates the the patch location and by averaging the overlapped pix-
possibility of each weight assignment. In case the sum els.
of two output probabilities is 1, the probability of each
class just shows the weight assigned to its corresponding
input patch. 3.8 EVALUATION METRICS
The high-quality image patches and their blurred Fusion quality metrics are employed in various CNN
versions have been taken as training data for the based image fusion works in order to evaluate its effi-
Siamese network in Fig. 3.13 and are trained using the ciency [4].
46 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 3.3
Result analysis of similarity based image fusion
[4].
Metrics Values
S 5.5572
σ 80.4659
MI 3.4492
SF 8.8984
FIG. 3.15 Comparative analysis of DCNN using Laplacian
I QI 0.7104
pyramidal decomposition on different datasets [34].
SSI M 0.8123
QCB 0.4505
different modalities such as CT+PET and MRI+CT are
investigated in [33]. The computation time for fusion
of CT and PET is longer compared to the fusion time of
MRI and CT. See Fig. 3.16.
TABLE 3.5
Result analysis of DSCNN architecture [33].
Metrics CT and MRI CT and PET CT and MRI
(abdomen) (brain)
S 7.622 6.188 7.16
σ 75.422 21.386 45.907
C 33.973 64.426 11.503
MI 6.344 3.464 5.147
FIG. 3.14 Analysis of similarity based image fusion [4].
AG 6.545 3.395 7.112
The architecture produces good values for SSI M, SF 11.354 8.031 13.640
I QI , SF , but there is a lack of entropy and also time I QI 0.632 0.832 0.611
cost is high. A graphical representation is also given in
T 8.763 s 11.046 s 3.115 s
Fig. 3.14. The objective assessment of the method pro-
posed in [34] is listed in Table 3.4 and the correspond-
ing graphical representation is in Fig. 3.15.
TABLE 3.4
Result analysis of CNN architecture using
pyramidal decomposition [34].
Metrics MRI and CT
S 6.1741
F SI M 0.8872
QG 0.6309
QE 0.6547
T 12.1 s
Table 3.5 shows the performance metrics of the FIG. 3.16 Comparative analysis of DSCNN on different
DSCNN based image fusion method. The datasets from datasets [33].
CHAPTER 3 Image Fusion Through Deep Convolutional Neural Network 49
TABLE 3.6
Result analysis of CNN architecture using pyramidal decomposition [42].
Metrics Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5
AG 0.0844 0.0822 0.886 0.0892 0.0835
σ 0.7782 0.8031 0.954 0.8023 0.7541
QRS/F 0.8662 0.6524 0.7172 0.6012 0.7327
MI 0.8753 1.7621 1.516 1.5721 1.374
T 2.094 s 2.014 s 2.051 s 3.021 s 2.016 s
TABLE 3.7
Result analysis of CNN architecture using pyramidal decomposition [42].
Metrics Dataset 6 Dataset 7 Dataset 8 Dataset 9
AG 0.0837 0.089 0.0874 0.0893
σ 0.7951 0.8023 0.7231 0.998
QRS/F 0.6251 0.5872 0.7517 0.7832
MI 1.6821 1.2451 0.8026 1.4351
T 1.962 s 2.127 s 2.172 s 2.274 s
FIG. 3.17 Comparative analysis of DCNN using pyramidal decomposition on different datasets [42].
The evaluation metrics of Laplacian pyramidal de- AG of DSCNN shows better results, and also the mu-
composition based CNN for different datasets are given tual information between source images is well de-
in Tables 3.6 and 3.7. fined. The value of I QI is high in similarity based
The experimented datasets are from CT, MRI and learning but less for DSCNN. When comparing com-
PET. The computation time is very small, which is ev- putation time of each network, the method proposed
ident from Fig. 3.17.
in [42] requires less processing time. Regarding spatial
The results indicate that the method of similarity
based image fusion has a higher σ value than that ob- frequency, the DSCNN method shows improvement.
tained by DSCNN but comparatively lesser than for Overall, the DSCNN based image fusion and similarity
DCNN based on pyramidal decomposition. In terms based method seemed to have better performance on
of entropy, similarity learning has a smaller standard medical image fusion compared to CNN using pyrami-
deviation value than for the other two methods. The dal decomposition.
50 Deep Learning and Parallel Computing Environment for Bioengineering Systems
3.10 ISSUES IN EXISTING CNN BASED and multi-focus images, in which the source images are
IMAGE FUSION METHODS captured by the same imaging modality, there exists a
The results from all the tables indicate that the advan- way to generate the ground truth images for DCNN
tages of CNNs in the field of image fusion of medical training in an artificial way. The CNN-based methods
images have been widely verified. Specifically, the su- can resolve the challenges of manually designing com-
pervised learning of CNN based classification-type net- plicated algorithms for ghost-removal in conventional
work had great potential for different image fusion chal- methods by preventing motion/ghosting artifacts via a
lenges. Manual design of accurate fusion rules and com- deep learning network, and are more likely to achieve
plicated activity level measurements can be avoided by better performance.
direct generation of a weight map from the input images
in CNN. In particular, a normalized weight assignment
result can be represented by each output neuron of the 3.11 CONCLUSIONS
DCNN, in which the sum of the weights equals to 1, Despite of the aforementioned superiority, the study of
and it represents the corresponding probability. Corre- DCNN based medical image fusion is now at an initial
spondingly, the probability distribution of weight as- stage, and there exists great space for further improve-
signment is defined by the output vector of the network, ment of DCNN in the field of image fusion. In this
and the weight assignment to be measured is based on chapter, some prospects have been put forward for the
its mathematical expectation. Therefore, image fusion study of deep learning models for image fusion. On the
can also be viewed as a two-class classification prob- other hand, designing objective fusion metrics based on
lem. The approach can also be applied to various mul- DL framework still requires more attention.
timodal image fusion issues, because the generalized
multi-class output provides greater flexibility. Moreover,
besides the output weight map, the intermediate feature REFERENCES
maps formed by hidden layers also include the clar- 1. R.C. Krempien, S. Daeuber, F.W. Hensley, M. Wan-
ity or authenticate information of source images, which nenmacher, W. Harms, Image fusion of CT and MRI
is more prominent to a variety of image fusion issues data enables improved target volume definition in 3D-
[5]. brachytherapy treatment planning, Brachytherapy 2 (3)
To achieve high fusion performance, traditional im- (2003) 164–171.
2. A.C. Paulino, W.L. Thorstad, T. Fox, Role of fusion in ra-
age fusion approaches, including MRA, consistency ver-
diotherapy treatment planning, in: Seminars in Nuclear
ification, activity level measurement, etc., existing in Medicine, vol. 33, Elsevier, 2003, pp. 238–243.
conventional image fusion techniques, must be taken 3. P. Shajan, N. Muniraj, J.T. Abraham, 3D/4D image registra-
into consideration. On the other hand, for a certain spe- tion and fusion techniques: a survey.
cific image fusion issue, the conventional approaches 4. H. Hermessi, O. Mourali, E. Zagrouba, Convolutional neu-
should be included to frame the whole fusion scheme ral network-based multimodal image fusion via similarity
together with the CNN-based approach. Meanwhile, learning in the shearlet domain, Neural Computing & Ap-
mutuality between the fused data is still cumbersome, plications 30 (7) (2018) 2029–2045.
which can be rectified when selecting similarity based 5. Y. Liu, X. Chen, Z. Wang, Z.J. Wang, R.K. Ward, X. Wang,
Deep learning for pixel-level image fusion: Recent ad-
learning in CNN. The input manner of the DCNN archi-
vances and future prospects, Information Fusion 42 (2018)
tecture is also not restrained to the Siamese architecture 158–173.
mentioned in [45], yet the advantages of other mod- 6. S. Zagoruyko, N. Komodakis, Learning to compare image
els like pseudo-Siamese and 2-channel also deserve to patches via convolutional neural networks, in: Computer
be compatible with DCNN [5]. The criticality of such Vision and Pattern Recognition (CVPR), 2015 IEEE Con-
methods relies on creating an effective large training ference on, IEEE, 2015, pp. 4353–4361.
dataset. One of the possible solutions is to use the 7. S. Li, X. Kang, L. Fang, J. Hu, H. Yin, Pixel-level image fu-
method mentioned in [45] based on multi-scale Gaus- sion: a survey of the state-of-the-art, Information Fusion
sian filtering, but further investigation is essential to 33 (2017) 100–112.
8. A.P. James, B.V. Dasarathy, Medical image fusion: a survey
provoke more compatible algorithms for complex CNN
of the state-of-the-art, Information Fusion 19 (2014) 4–19.
models. 9. D. Wu, A. Yang, L. Zhu, C. Zhang, Survey of multi-sensor
Moreover, there is no ground-truth fused image in image fusion, in: International Conference on Life System
most of the image fusion schemes. A major issue in such Modeling and Simulation and International Conference
category is the creation of training datasets. Fortunately, on Intelligent Computing for Sustainable Energy and En-
for a few image fusion problems such as multi-exposure vironment, Springer, 2014, pp. 358–367.
CHAPTER 3 Image Fusion Through Deep Convolutional Neural Network 51
10. D.G. Lowe, Distinctive image features from scale-invariant 26. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M.S. Lew, Deep
keypoints, International Journal of Computer Vision learning for visual understanding: a review, Neurocomput-
60 (2) (2004) 91–110. ing 187 (2016) 27–48.
11. H. Bay, T. Tuytelaars, L. Van Gool, Surf: speeded up robust 27. W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F.E. Alsaadi, A
features, in: European Conference on Computer Vision, survey of deep neural network architectures and their ap-
Springer, 2006, pp. 404–417. plications, Neurocomputing 234 (2017) 11–26.
12. C. Harris, M. Stephens, A combined corner and edge de- 28. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classifi-
tector, in: Alvey Vision Conference, vol. 15, Citeseer, 1988, cation with deep convolutional neural networks, in: Ad-
pp. 147–151. vances in Neural Information Processing Systems, 2012,
13. S. Leutenegger, M. Chli, R.Y. Siegwart, Brisk: binary ro- pp. 1097–1105.
bust invariant scalable keypoints, in: Computer Vision 29. G. Kutyniok, D. Labate, Shearlets: Multiscale Analysis for
(ICCV), 2011 IEEE International Conference on, IEEE, Multivariate Data, Springer Science & Business Media,
2011, pp. 2548–2555. 2012.
14. P. Ghosh, A. Pandey, U.C. Pati, Comparison of differ- 30. H. Greenspan, B. van Ginneken, R.M. Summers, Guest ed-
ent feature detection techniques for image mosaicing, AC- itorial deep learning in medical imaging: overview and
CENTS Transactions on Image Processing and Computer future promise of an exciting new technique, IEEE Trans-
Vision 1 (1) (2015) 1–7. actions on Medical Imaging 35 (5) (2016) 1153–1159.
15. R.M. Kumar, K. Sreekumar, A survey on image feature de- 31. N. Tajbakhsh, J.Y. Shin, S.R. Gurudu, R.T. Hurst, C.B.
scriptors, International Journal of Computer Science & In- Kendall, M.B. Gotway, J. Liang, Convolutional neural net-
formation Technologies 5 (2014) 7668–7673. works for medical image analysis: full training or fine tun-
16. P. Viola, M.J. Jones, D. Snow, Detecting pedestrians using ing? IEEE Transactions on Medical Imaging 35 (5) (2016)
patterns of motion and appearance, International Journal 1299–1312.
of Computer Vision 63 (2) (2005) 153–161. 32. H.-C. Shin, H.R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues,
J. Yao, D. Mollura, R.M. Summers, Deep convolutional
17. S.A.K. Tareen, Z. Saleem, A comparative analysis of Sift,
neural networks for computer-aided detection: CNN ar-
Surf, Kaze, Akaze, Orb, and Brisk, in: Computing, Math-
chitectures, dataset characteristics and transfer learning,
ematics and Engineering Technologies (iCoMET), 2018 In-
IEEE Transactions on Medical Imaging 35 (5) (2016)
ternational Conference on, IEEE, 2018, pp. 1–10.
1285–1298.
18. E. Mair, G.D. Hager, D. Burschka, M. Suppa, G. Hirzinger,
33. Kai-jian Xia, Hong-sheng Yin, Jiang-qiang Wang, A novel
Adaptive and generic corner detection based on the acceler-
improved deep convolutional neural network model for
ated segment test, in: European Conference on Computer
medical image fusion, Cluster Computing (2018) 1–13.
Vision, Springer, 2010, pp. 183–196.
34. Y. Liu, X. Chen, J. Cheng, H. Peng, A medical image fusion
19. K. Sharma, A. Goyal, Classification based survey of image
method based on convolutional neural networks, in: Infor-
registration methods, in: 2013 Fourth International Con-
mation Fusion (Fusion), 2017 20th International Confer-
ference on Computing, Communications and Networking
ence on, IEEE, 2017, pp. 1–7.
Technologies (ICCCNT), IEEE, 2013, pp. 1–7. 35. Y. Liu, X. Chen, H. Peng, Z. Wang, Multi-focus image fusion
20. S. Sergeev, Y. Zhao, M.G. Linguraru, K. Okada, Medical with a deep convolutional neural network, Information
image registration using machine learning-based interest Fusion 36 (2017) 191–207.
point detector, in: Medical Imaging 2012: Image Process- 36. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R.
ing, vol. 8314, International Society for Optics and Pho- Girshick, S. Guadarrama, T. Darrell, Caffe: convolutional
tonics, 2012, p. 831424. architecture for fast feature embedding, in: Proceedings of
21. H. Ghassemian, A review of remote sensing image fusion the 22nd ACM International Conference on Multimedia,
methods, Information Fusion 32 (2016) 75–89. ACM, 2014, pp. 675–678.
22. D.E. Nirmala, V. Vaidehi, Comparison of pixel-level and 37. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y.
feature level image fusion methods, in: Computing for Sus- LeCun, Overfeat: integrated recognition, localization and
tainable Global Development (INDIACom), 2015 2nd In- detection using convolutional networks, arXiv preprint,
ternational Conference on, IEEE, 2015, pp. 743–748. arXiv:1312.6229.
23. J. Du, W. Li, K. Lu, B. Xiao, An overview of multi-modal 38. P. Jagalingam, A.V. Hegde, A review of quality metrics for
medical image fusion, Neurocomputing 215 (2016) 3–20. fused image, Aquatic Procedia 4 (2015) 133–142.
24. G. Easley, D. Labate, W.-Q. Lim, Sparse directional im- 39. S. Singh, D. Gupta, R. Anand, V. Kumar, Nonsubsampled
age representations using the discrete shearlet transform, shearlet based ct and mr medical image fusion using bio-
Applied and Computational Harmonic Analysis 25 (1) logically inspired spiking neural network, Biomedical Sig-
(2008) 25–46. nal Processing and Control 18 (2015) 91–101.
25. H. Hermessi, O. Mourali, E. Zagrouba, Multimodal image 40. C. Xydeas, V. Petrovic, Objective image fusion performance
fusion based on non-subsampled shearlet transform and measure, Electronics Letters 36 (4) (2000) 308–309.
neuro-fuzzy, in: International Workshop on Representa- 41. S. Li, X. Kang, J. Hu, Image fusion with guided filter-
tions, Analysis and Recognition of Shape and Motion FroM ing, IEEE Transactions on Image Processing 22 (7) (2013)
Imaging Data, Springer, 2016, pp. 161–175. 2864–2875.
52 Deep Learning and Parallel Computing Environment for Bioengineering Systems
42. B. Rajalingam, D.R. Priya, A novel approach for multi- 44. Y. Chen, R.S. Blum, A new automated quality assessment
modal medical image fusion using hybrid fusion algo- algorithm for image fusion, Image and Vision Computing
rithms for disease analysis, International Journal of Pure 27 (10) (2009) 1421–1432.
and Applied Mathematics 117 (15) (2017) 599–619. 45. G. Bhatnagar, Q.J. Wu, Z. Liu, Directive contrast based
43. G. Piella, H. Heijmans, A new quality metric for image multimodal medical image fusion in NSCT domain, IEEE
fusion, in: Image Processing, 2003. ICIP 2003. Proceed- Transactions on Multimedia 15 (5) (2013) 1014–1024.
ings. 2003 International Conference on, vol. 3, IEEE, 2003,
pp. III–173.
CHAPTER 4
KEY TERMS
Magnetic Resonance Imaging (MRI), Magnetic Resonance (MR), World Health Organization (WHO), Central Nervous Sys-
tem (CNS), Hospital-Based Brain Tumor Registry (HBBTR), Central Brain Tumor Registry of the United States (CBTRUS),
American Brain Tumor Association (ABTA), Confidence Interval (CI), Computed Tomography (CT), Proton Density (PD),
Fluid Attenuation Inversion Recovery (FLAIR), Cerebro Spinal Fluid (CSF), Magnetic Resonance Spectroscopy (MRS),
N-acetyl aspartate (NAA), Choline (Cho), Creatine (Cr), myo-Inositol (mI), Lipid (Lip), Lactate (Lac), Dynamic Susceptibil-
ity Contrast (DSC), relative Apparent Diffusion Coefficient (rADC), Advanced Normalization Tools (ANTs), Virtual Skeleton
Database (VSD), Brain Tumor Segmentation (BRATS), Statistical Parametric Mapping (SPM), Convolutional Neural Net-
work (CNN), Random Forest (RF), Conditional Random Field (CRF), Region of Interest (ROI), Fully Convolutional Neural
Network (FCNN), Conditional Random Fields (CRF)
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00011-7 53
Copyright © 2019 Elsevier Inc. All rights reserved.
54 Deep Learning and Parallel Computing Environment for Bioengineering Systems
FIG. 4.1 All primary brain tumors with other CNS tumors distribution [52].
specific brain tumor histology have all together made germ cell tumors, tumors of meninges and of the sellar
a great contribution towards the therapeutics of a brain region [12].
tumor.
The World Health Organization (WHO) scheme 4.2.1 Why Are Studies Concentrated Mostly
classifies and grades the central nervous system (CNS) on Gliomas?
tumors for clinical and research purposes. Based on the A type of brain tumor that grows from glial cells is called
predicted biological behavior and histological type, the glioma. The various types of glial cells are astrocyte,
tumors are segregated. Over 100 types of primary CNS oligodendrocyte, microglia and ependymal cells, each
tumors that are histologically distinct are marked by having different functions. Collectively the term glioma
their own band of clinical presentations, treatments, is used to describe the different types of glial tumors:
and outcomes. In 1979, the first edition of the WHO oligodendroglioma, astrocytoma and glioblastoma. Mi-
Classification of Tumors of the CNS was issued. It has croglia is a part of the immune system and not truly a
been revised four times, most recently in 2016, and is glial cell. Pilocytic astrocytoma is of grade I, while grade
considered as the international standard for the classi- II includes astrocytoma, oligodendroglioma and oligoa-
fication of CNS tumors. This classification system acts trocytoma, grade III includes anaplastic astrocytoma,
as a benchmark for conversation between basic science anaplastic oligodendroglioma, anaplastic oligoatrocy-
investigators and clinicians worldwide [35]. The latest toma, and grade IV is called glioblastoma multiforme
update used the new methods of diagnosis based en- [40].
tirely on microscopy by including molecular parameters The primary malignant brain tumors are gliomas
for the classification of CNS tumor entities [37]. [66], so it is crucial to identify the conditions that can
Other types of cancer are staged while brain tumors be modified/enhanced to avert this disease. These tu-
are classified according to the WHO classification of mors have different shape, size, and contrast and can
tumors of the CNS. Based on predicted clinical behav- develop anywhere in the brain, but mostly affect the
ior, they assign a grade from I through IV [52]. On cerebral hemispheres [17]. Broadly categorized, glioma
the basis of histopathology, brain tumors are classi- forms approximately 24.7% of all primary brain and
fied into: tumors of neuroepithelial tissue (hereafter other CNS tumors (Fig. 4.1) and accounts for 74.6% of
referred to as glioma which includes the grade II astrocy- malignant tumors (Fig. 4.2).
toma, the grade III anaplastic astrocytoma, the grade IV Among all malignant tumors, the number of cases is
glioblastoma, oligodendroglioma, and ependymoma), estimated to be the highest for glioblastoma as shown
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 55
FIG. 4.2 Malignant primary tumors with other CNS tumors distribution [52].
TABLE 4.1 ies of CNS tumor data and also a comparison with
Overall estimation of astrocytoma cases [52]. the international registries. The registry generated from
the years 2010 to 2014, reported a total of 4295 cases.
Histology Estimated Estimated Over this five year period, tumors in the registry had
new cases new cases
18.9% grade IV tumors, 20% grade III, 11.4% grade
by 2016 by 2017
II, and 36.3% grade I. The most common tumors
Pilocytic astrocytoma 1100 1120
among adults were reported to be glioblastomas (38%)
Diffuse astrocytoma 1180 1110 followed by anaplastic oligodendrogliomas (24.5%).
Anaplastic astrocytoma 1330 1340 While in the pediatric group, pilocytic astrocytomas
Glioblastoma 12,150 12,390 (44%) contributed almost half of all gliomas seen in
children, followed by ependymomas (31%). On making
an overall comparison between CBTRUS and HBBTR,
in Table 4.1, with 12,150 cases in 2016 and 12,390 in the most frequently reported histology is meningioma
2017 [52]. Glioblastoma forms 46.6% of primary ma- with 36% vs. 20% followed by glioblastoma with 16%
lignant brain tumors (Fig. 4.2) and 14.9% of all primary vs. 14%.
brain and other CNS tumors (Fig. 4.1). Being more The childhood CNS tumors are the most severe
commonly seen in older adults and rarely in children, group of tumors due to their high incidence and mortal-
the incidence of glioblastoma increases with age, show- ity rate [23]. Data was collected from seven tertiary care
ing maximum rates for the age group from 75 to 84 hospitals in India, of 3936 pediatric patients regarding
years. the rate of occurrence of various primary brain tumors.
The increased incidence of brain tumors is at least The results revealed astrocytic tumors to be the most fre-
partially traceable with improved radiographical diag- quent primary pediatric brain tumors (34.7%) in which
nosis rather than histological analysis of tumors [52]. To pilocytic astrocytoma was 23.1% followed by grade II
contribute to large-scale national studies the Hospital- with 5.1%, grade III with 2.1%, and grade IV with 4.4%.
Based Brain Tumor Registry (HBBTR) data was gener- When comparing the incidence of CNS tumors in differ-
ated, examining the brain tumor distribution within ent countries with India, it has been reported that the
the Indian population [24]. This provided a clinical highest occurrence is for astrocytomas in other coun-
benchmark for comparison of previous and future stud- tries also.
56 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 4.2
Grading of astrocytoma.
Tumor Atypia Mitosis Endothelial Necrosis WHO grade General
proliferation grade
Pilocytic I Low-grade
astrocytoma gliomas
Diffuse II Low-grade
astrocytoma gliomas
Anaplastic III High-grade/
astrocytoma malignant
gliomas
Glioblastoma IV High-grade/
malignant
gliomas
4.2.2 Grading The pilocytic astrocytomas (grade I), being the most
For patient management, the grading of the CNS neo- benign of astrocytomas, are frequently encountered in
plasms has profound importance [37]. Based on mor- the first and second decades of life. On MRI they appear
phological features, the grading scheme predicts the as large cysts with low proliferative potential which can
clinical behavior of the tumor. The concept of CNS tu- be cured by resection. The survival rate at 5 years from
mor grading is the progressiveness of neoplasias from diagnosis is 87%. Diffuse astrocytomas (grade II), often
localized and benign tumors to infiltrating and malig- referred to as low-grade gliomas, show less proliferation
nant tumors. Histological grading is used to predict the and often recur. On MRI they give an appearance of an
biological behavior of a neoplasm. Clinically, a tumor area of low density with little mass effect. The survival
grade has a major influence on the selection of therapies rate at 5 years from diagnosis is 35%. These infiltra-
(radiotherapy and chemotherapy). The WHO grading tive lesions are localized in the white matter of cere-
scheme is a “malignancy scale” defining a broad vari- bral hemispheres in adults, between 30 and 40 years.
The infiltrating tumors, anaplastic astrocytomas (grade
ety of neoplasms [36]. The treatment and prognosis of
III) have an average age of diagnosis between low-grade
a brain tumor are linked with the need of an accurate
astrocytomas (grade I and II) and glioblastoma multi-
pathological grading.
forme. Despite originating as primary tumors, most of
To provide clarity for diagnoses, the WHO scheme
the grade III was formed from grade I and II and can
provides a four-tiered histological grading for astrocy-
possibly advance to grade IV. At 5 years from diagno-
tomas. They are designated by a grade ranging from 1
sis, the survival rate for this tumor is 31%. Grade III
to 4, allotting 4 to be the most aggressive while 1 to lesions show nuclear atypia and brisk mitotic activity.
be the least aggressive. This system is established on the Therapy for grade III tumor patients usually includes
basis of the appearance of attributes like atypia (struc- radiation and/or chemotherapy. The highly malignant
tural abnormality of the cell), mitosis (a division of the and fast-growing brain tumor, glioblastoma multiforme
nucleus), endothelial proliferation (an apparent multi- (grade IV) has 3% survival rate at 5 years from di-
layering of endothelium), and necrosis (cell injury re- agnosis. On MRI, it displays heterogeneous enhance-
sulting in premature cell death). These features show ment with centrally non-enhancing regions and mass
the tumor malignancy level in terms of growth rate and effect. These mitotically active, cytologically malignant,
invasion as shown in Table 4.2. WHO describes grade necrosis-prone neoplasms have a fatal outcome as they
I tumors as tumors having these features absent, grade develop within the main mass of the brain, invading
II (diffuse astrocytoma) having only cytological atypia, nearby tissues. Grade IV neoplasms are usually charac-
grade III (anaplastic astrocytoma) also having anaplasia terized by widespread infiltration into neighboring tis-
(morphological changes in a cell) and mitotic activity, sue [7,82].
and grade IV additionally exhibiting microvascular pro- Grade I–II tumors are primarily located in midline
liferation and/or necrosis. locations, such as the diencephalic region and cerebel-
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 57
TABLE 4.3
Astrocytoma distribution, 2009–2013.
Histology 5-year Annual % of all Median Proportion of Median
total average tumors age all gliomas survival
Pilocytic 5106 1021 1.4% 12.0 <5% >10
astrocytoma
Diffuse 8081 1616 2.2% 48.0 25–30% >5 yrs.
astrocytoma
Anaplastic 6245 1249 1.7% 53.0 25–30% 3
astrocytoma
Glioblastoma 54,980 10,996 14.9% 64.0 40–50% 1
TABLE 4.4
Distribution of histologically confirmed astrocytomas, 2011–2013 [52].
Histology Number of newly Histologically Assigned grade
diagnosed tumors confirmed Grade 1 Grade 2 Grade 3 Grade 4
Pilocytic 3078 79.1% 92.6% 6.2% 0.8% 0.5%
astrocytoma
Diffuse 4523 79.2% 4.2% 58.1% 22.7% 15.0%
astrocytoma
Anaplastic 3867 92.8% 0.1% 0.9% 90.2% 8.7%
astrocytoma
Glioblastoma 33,631 79.1% 0.2% 0.2% 1.0% 98.7%
TABLE 4.5
Age-specific incidence rates for astrocytoma, 2009–2013 [52].
Histology Age at diagnosis
0–19 years 20–34 years 35–44 years 45–54 years 55–64 years 65–74 years 75–84 years 85+ years
Rate (95% Rate (95% Rate (95% Rate (95% Rate (95% Rate (95% Rate (95% Rate (95%
CI) CI) CI) CI) CI) CI) CI) CI)
Grade I 0.88 (0.85– 0.24 (0.22– 0.12 (0.11– 0.09 (0.08– 0.08 (0.07– 0.06 (0.04– 0.07 (0.05– – –
0.91) 0.25) 0.14) 0.10) 0.10) 0.07) 0.09)
Grade II 0.27 (0.25– 0.50 (0.48– 0.56 (0.53– 0.58 (0.55– 0.77 (0.73– 0.97 (0.91– 1.08 (1.00– 0.60 (0.52–
0.29) 0.53) 0.60) 0.61) 0.81) 1.03) 1.16) 0.70)
Grade III 0.09 (0.08– 0.30 (0.28– 0.41 (0.38– 0.46 (0.44– 0.65 (0.61– 0.92 (0.86– 0.91 (0.84– 0.42 (0.34–
0.10) 0.31) 0.44) 0.49) 0.68) 9.98) 0.99) 0.50)
Grade IV 0.16 (0.15– 0.42 (0.40– 1.21 (1.16– 3.55 (3.47– 8.11 (7.98– 13.09 (12.87– 15.27 (14.97– 9.16 (8.81–
0.17) 0.45) 1.26) 3.63) 8.24) 13.31) 15.57) 9.52)
58 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 4.6
Average annual age-adjusted incidence rates of astrocytomas, 2009–2013 [52].
Histology Age at diagnosis
0–14 Years 15–39 Years 40+ Years
Rate (95% CI) Rate (95% CI) Rate (95% CI)
Pilocytic 0.98 (0.95–1.02) 0.28 (0.27–0.30) 0.08 (0.08–0.09)
astrocytoma
Diffuse 0.26 (0.24–0.28) 0.45 (0.43–0.47) 0.68 (0.66–0.70)
astrocytoma
Anaplastic 0.09 (0.08–0.10) 0.29 (0.27–0.30) 0.62 (0.60–0.64)
astrocytoma
Glioblastoma 0.15 (0.14–0.17) 0.48 (0.46–0.50) 6.95 (6.89–7.01)
lum, including the hypothalamus and visual pathway. 4.2.4 Diagnosis and Treatment of a Brain
Grade III–IV tumors are usually located in the pontine Tumor
areas of the brain stem or cerebral hemispheres. Most An invasive method for brain tumor diagnosis is the
low-grade astrocytomas are curable by surgical resection spinal tap. A biopsy is another method where the tissue
alone while, despite the addition of radiotherapy and is taken out to check for tumor cells. The only assured
chemotherapy, the prognosis remains poor for high- way for brain tumor diagnosis, grade determination and
grade astrocytomas. treatment planning is a biopsy. Being an invasive tech-
According to ABTA, a patient’s treatment response nique, a needle biopsy is the only reliable diagnosis
depends on the patient age, tumor malignancy grad- currently for a brain tumor, and is not generally recom-
ing, the amount of tumor removed, and his/her general mended in initial stage of diagnosis [69].
health. This shows the requirement of an efficient grad- Treatment of a brain tumor in early stages is a chal-
ing system [65]. See Tables 4.3–4.6. lenging task due to size, shape and location variation
and can only be performed by trained professional neu-
4.2.3 Symptoms of a Brain Tumor roradiologist [60]. A system proposed in [55] and based
The most common brain tumor symptoms are diffi- on qualitative information determines the degree of tu-
culty in thinking, finding words or speaking, seizures mor abnormality using the stained hematoxylin–eosin
or convulsions, headaches, weakness or paralysis in one tissue biopsies (the gold standard in biopsy). These
part or one side of the body, personality or behavior stains are examined by a histopathologist. Although
changes, changes in vision, hearing loss, dizziness or the identification of tumor grade is accurately done by
loss of balance, disorientation or confusion and mem- WHO grading scheme, there is significant intra- and
ory loss. See Table 4.7. inter-observer variability that significantly influences di-
agnosis quality [55].
TABLE 4.7 The prognosis for a brain tumor depends on its lo-
Symptoms of brain tumors [7]. cation, type, grade, the spreading of a tumor inside the
Symptoms Tumor type brain, how long the symptoms existed prior to diagno-
(percentage with symptoms) sis and how much the patient functionality is affected
by the tumor. Similarly, the treatment of brain tumors
Low-grade Malignant
glioma glioma will rely specifically on different factors like the tu-
mor location, type, size, patient symptoms and general
Headache 40 50
health, whether the tumor is malignant or benign, and
Seizure 65–95 15–25
treatment preferences. Surgery, chemotherapy and radi-
Hemiparesis 5–15 30–50 ation therapy are the preeminent treatments for a brain
Mental status 10 40–60 tumor. Depending on the severity and various other fac-
abnormalities tors, patients may undergo only one treatment method
or a combination of treatments.
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 59
4.3 IMAGING TECHNIQUES trast along with high spatial resolution, direct multipla-
A cranial MRI is the only non-invasive test required for a nar imaging – sagittal, coronal and axial planes, display-
brain tumor diagnosis [7]. On the contrary, a computed ing many images and oblique cuts [50].
tomography (CT) may fail to show structural lesions, There are various formats used for storing an MR im-
particularly non-enhancing tumors like the low-grade age. We can broadly classify them it into two groups.
gliomas. Diagnostically considering, the best choice to The output of the machine which captures the MR im-
rule out the possibility of a brain tumor is an MRI ages is known as the scanner format. The other type is
with gadolinium enhancement. As MRI can be sensi- known as the image processing format. It is obtained
tized to various contrast parameters, this enables a com- by a conversion of the original MRI scanner format. The
prehensive assessment of normal and abnormal brain work in [14] uses Neuroimaging Informatics Technol-
physiology. MRI being in vivo, longitudinal, and multi- ogy Initiative (NIFTI-1.1) as the image processing for-
parametric provides unique opportunities for character- mat of MRI. See Fig. 4.3.
izing and delineating experimental models of neurolog- 3D volumes made up of a set of slices are the general
ical diseases like stroke, brain tumors, etc. output of MRI systems. The latest MRI systems provide
The plethora of available MR contrast mechanisms, 3D images of 16-bit depth [51]. An MRI contains an
in conjunction with its superior dynamic functional enormous amount of information but our eyes are un-
range, bestow on MRI the potential to be a formidable able to discriminate beyond several tens of gray levels.
tool in the noninvasive, in vivo, multilevel assessment This inability can be overcome by taking the aid of com-
of tumor physiology [20]. The main advantages of MRI puters to attain entire information contained in an MRI.
are the absence of harmful ionizing radiation, being The grading of a glioma histopathologically with in-
a non-invasive technique, painless, possibly also per- accurate and limited tissue samples, subjective grading
formed without contrast, showing great soft tissue con- criteria and tumor inaccessibility paves the path towards
60 Deep Learning and Parallel Computing Environment for Bioengineering Systems
FIG. 4.4 An axial slice of high-grade glioma. From left to right: T1-w, T1-wc, T2-w, and T2 FLAIR.
the immense requirement of automatic segmentation The frequently used sequence for structural analy-
procedure [11]. The high contrast of soft tissues, high sis is a T1-w sequence which gives an easy explanation
spatial resolution, the absence of harmful radiation, of healthy tissues. The appearance of tumor border is
non-invasiveness of MR imaging methodology all aid brighter in T1-weighted contrast-enhanced images. This
in the development of automated diagnostic tools [9]. is because of the accumulation of the contrast agent due
to the blood–brain barrier disruption in the prolifera-
4.3.1 Reading an MR Image tive tumor region. In a T2-weighted MRI, the brightly
The various attributes of a tumor region can be obtained appearing region is the edema that encircles the tumor.
from the different MRI sequences. Hence between se- A special sequence is the T2 FLAIR (FLAIR) that aids in
quences, the intensity profiles of tumor tissues change. the separation of edema region from the CSF [1]. See
Observing, analyzing image features and interpreting Fig. 4.4.
multispectral MR images turns out to be a time consum- T1-w images describe gray matter as gray colored
ing and challenging task for radiologists. Furthermore, and white matter as white colored, while white mat-
the heterogeneous intensity profiles, tumor orientation, ter is gray colored and gray matter is white colored in
T2-w images. FLAIR images guide in viewing the tis-
shape, and overlapping intensity add on to the difficulty
sues by suppressing the cerebrospinal fluid and water
in diagnosis. This results in a differential diagnosis. Si-
content in the brain [62]. Typically, astrocytomas are
multaneously differentiating between distinctive tumor
iso-intense on T1-w and hyperintense on T2-w images.
types, each having identical features is a demanding task
While low-grade astrocytoma rarely enhances on MRI,
[33,83].
most anaplastic astrocytoma enhances when using con-
T1-, T2-weighted and proton density MRIs are sen- trast agents. T2-w MRI has been the most preferred
sitized to, the longitudinal MR relaxation time (T1), method for delineating lesions in clinical and experi-
the transverse MR relaxation time (T2) of tissue water mental diagnostic studies because of its histologically
and the concentration, respectively. T2*-weighted MRI validated superior sensitivity in detecting tissue damage
is sensitized to transverse MR relaxation that is not cor- [20]. To perceive the extent of the tumor and delineate
rected for phase shifts caused by the local field inhomo- its presence, the contrast-enhanced T1-w MRI is used.
geneities. These MR techniques are commonly applied A single ROI in a single MR slice should not be used to
to detect brain lesions, because pathologies like tumors, identify tumor volume and growth because a tumor is
edema, and hemorrhage, are associated with changes in a 3D object. Hence it should be assessed from 3D data,
water content and relaxation rates. i.e., a stack of 2D data [11]. So multi-modals of MRI are
The infiltrative tumors, glioblastomas, have their to be used.
borders often fuzzy and hard to distinguish from
healthy tissues. Hence, more than one MRI modal- 4.3.2 Advanced Magnetic Resonance
ity is often employed, e.g., T1 (spin-lattice relaxation), Imaging
T2 (spin-spin relaxation), T1-contrasted (T1C), proton 4.3.2.1 MRS
density (PD) contrast imaging, diffusion MRI (dMRI), While MRI is the most sensitive modality available for
and fluid attenuation inversion recovery (FLAIR) pulse the detection of brain tumors, it has low specificity, and
sequences. The contrast between these modalities gives many types of tumor share a similarity in appearance
uniqueness to each tissue type [17]. on an MRI. It is difficult to find the grade and type of a
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 61
tumor using MRI [38]. These demerits can be overcome margins defined by conventional MR sequences. Many
by the use of MRS. studies show a good correlation of WHO grade and
The magnetic resonance spectroscopy is another metabolite ratios (Cho/NAA, Cho/Cr, and Lip-Lac/Cr).
noninvasive technique that allows quantitative and Many gliomas display high levels of citrate (not present
qualitative assessment of specific metabolites in the in the normal brain), particularly in the pediatric popu-
brain parenchyma or intracranial extra-axial spaces. lation. Pilocytic astrocytoma is found to have decreased
MRS analysis of brain tumors can be done using 1H Cr levels and variable degrees of Cho/Cr ratios. Rapalino
(proton) MRS or, less frequently, with 31P (phospho- and Ratai [59] state that MRS complements the infor-
rus) or 13C (carbon) MRS techniques. For 1H MRS, mation provided by the conventional MR imaging se-
the most common metabolites evaluated in routine quences and should be always used in conjunction with
clinical practice are N-acetyl aspartate (NAA), choline- the other imaging studies.
containing compounds (Cho), creatine (Cr), myoinos- In astrocytomas, Cho concentrations are more for
itol (mI), lipid (Lip), and lactate (Lac) [59]. MRS does the higher grade of tumors, yet it is noticed that the
not utilize high-energy radiation and contrast agents or high-grade tumors like glioblastoma multiforme (grade
labeled markers. The metabolite spectra from the MRS IV) show lower levels of Cho compared to astrocytoma
imaging add new dimension towards discrimination of grade II or III. This is because high-grade tumors have
lesions. In [48], three metabolite markers of neuronal necrotic cores and necrosis is related to diminished lev-
integrity were evaluated (Cho, NAA, and Cr). els of all metabolites. In non-necrotic, high-grade brain
Metabolic inhomogeneity is a notable feature of tumors Cho levels are typically seen to be high.
brain tumors. The spectrum from the necrotic core of MRS can precisely provide quantitative metabolite
a high-grade tumor varies from that of an actively grow- maps of the brain. It enables viewing the tumor’s het-
ing rim [21]. The spectra of brain tumors were different erogeneous spatial extent outside and inside the MRI-
from the spectra of normal brain tissue. NAA is a neu- detectable lesion. The studies in [38] show that the ac-
ronal metabolite which is decreased in processes with curacy of brain tumor classifiers can be improved by use
neuronal destruction or dysfunction. Most of the brain of image intensities and spectroscopic information. In
tumors have reduced NAA signals and heightened lev- [11], a combination of conventional MR imaging and
els of Cho, resulting in elevated Cho/NAA ratios. Cho dynamic susceptibility contrast MR imaging is a posi-
is increased in brain tumors due to increased mem- tive pre-surgical glioma grade indicator. Generally, the
brane turnover. Cho correlates well with the degree of grading of glioma using dynamic susceptibility contrast
tumor infiltration into neighboring tissue and tumor is done on the basis of cerebral blood volume (CBV)
cellular density. Cr is a metabolite related to the cellular value analysis inside the tumor area, using either a his-
energy metabolism and is relatively stable in different togram analysis method or a hot-spot method. An ex-
pathologic processes affecting the CNS. Hence it is use- perienced operator with sound anatomical knowledge
ful as a reference metabolite. Lip peaks are an indica- does precise identification of glioma tissue. This causes
tion of areas of necrosis and Lac peaks originate from the current grading approaches to be inherently time-
processes resulting in anaerobic metabolism. mI is a consuming and operator-dependent. Conventional MR
marker of astrocytic metabolism and is elevated in cer- imaging protocols for brain tumor consists of 2D MR
tain pathologic processes [59]. Specifically, mI has been images which are suboptimal for tumor segmentation
proclaimed to be high in grade II gliomas. Neoplas- in comparison to 3D MR images. As the appearance of
tic processes have metabolic byproducts related to their tumor on anatomical MR images (signal heterogeneity,
mitotic activity (Cho) and neuronal dysfunction (NAA) edema, T1-w contrast enhancement, and necrosis) cor-
that can be detected by MRS and hence improve the ac- relate well with grade, the low and high-grade gliomas
curacy of the clinical diagnosis. Rapalino and Ratai [59] can be separately evaluated. In [11], the appearance of
give an elaborate idea about the metabolites evaluated tumor high- and low-grade gliomas were evaluated sep-
with MRS in brain tumor imaging. arately
Various studies show that the values of MRS are The major challenges in combining MRI and MRS
used for predicting the histological grade of gliomas. A signals are (i) the spatial resolution in MRI (high) and
higher Cho/NAA ratio represents higher WHO grades MRS (low) and (ii) low computational complexity with
among glial tumors. The Cho/Cr ratio is more accu- best discrimination accuracy. So the combination of the
rate for differentiating the high-grade from low-grade features of metabolite distribution from MRS with the
gliomas. MRS can potentially identify areas with abnor- 3D volumetric texture features from MRI is significant
mal Cho/NAA ratios that can extend beyond the tumor [48]. MRI and MRS can be equally used for the purpose
62 Deep Learning and Parallel Computing Environment for Bioengineering Systems
of classification and provide comparable results, but the MR scanner [70]. Powerful 7 T scanners provide images
collection of MRS data is laborious and needs expertise having a high signal-to-noise ratio and high inhomo-
in signal conditioning [63]. geneity at the same time. The inhomogeneity of voxel
Magnetic resonance spectroscopy (MRS), perfusion- intensities within similar tissue types causes improper
weighted imaging (PWI) and diffusion-weighted imag- parameter initialization and placement of seed points,
ing (DWI) are advanced MR techniques which have resulting in bad segmentation. A statistical modeling
added value over conventional MRI in predicting neo- approach and the level set segmentation method are
plastic histology. They provide additional information combined to overcome this problem. The image voxels
about tumor histological features such as neovascu- are multiplied with the bias field in order to correct the
larization, grade of cellularity and mitotic index [15]. inhomogeneity. Then the bias corrected image is seg-
The imaging features, recognized as independent pre- mented using the level set method.
dictors of tumor grade, were enhancement and necro- At the same instance, [77] gives a wide review of
sis with a specificity of 76% and sensitivity of 97.6% the clinical applications of 7 T brain MRI. Contrast-
when the variables of conventional MRI, PWI, and DWI rich images with high resolution of diverse pathologies
were combined [15]. Despite MRI being highly accu- can be procured. Compared to low field strength meth-
rate in tumor grade assessment, a combination of tra- ods, additional pathophysiological information can be
ditional and advanced MR imaging features resulted in obtained for these diseases. The most relevant imag-
enhanced grading of the tumor by the addition of rADC ing marker to differentiate between high- and low-grade
to variables of conventional MRI. tumors is the absence or presence of MRI enhance-
ment due to post- and pre-gadolinium-based contrast
4.3.2.2 Ultra-High-Field 7 T MRI agent. But in comparison to high-grade tumors, low-
MRI has become a standard diagnostic tool in the grade brain tumors do not show contrast enhancement,
late 1970s from 0.6 T systems to 1.5 T scanners by except for pilocytic which nearly always enhance. The
mid-1980s. The clinical 3 T systems emerged in 2000. In correlation between lower field strengths and 7 T MRI
1998, for research purposes, the first human 8 T scanner with respect to tumor enhancement, post contrast ad-
was introduced. By early 2014, approximately 45 UHF ministration has shown no variations in the presence
scanners at or above 7 T, with around 10% working at and size of enhancing region. Using 7 T along with al-
9.4 T have been operational [73]. Siemens developed ternate MRI methods like T*2-weighted imaging may
its first actively shielded 7 T whole body magnet scan- give supplementary information relative to 1.5 and 3 T
ner MAGNETOM Terra, which was installed at the Er- MRI. Currently, only very few studies have been con-
langen University Clinic, Germany, in April 2015. The ducted on brain tumor patients using 7 T MRI, and yet
units were scheduled for delivery in the early 2016, and it is not evident whether it can overcome the limitations
serial production was scheduled to begin by 2017. An of conventional MRI.
accelerating feature of MAGNETOM Terra is its eight
parallel transmitter channels while clinical MRI scan-
ners worked with only one transmitter channel. Multi- 4.4 MACHINE LEARNING (ML) –
ple channels excite a scanned anatomical structure more SUPERVISED AND UNSUPERVISED
uniformly so as to get an improved image contrast. The METHODS
multi-channel transmitter feature is only available in Basically, machine learning focuses on bringing out in-
the research mode on the MAGNETOM Terra providing formation from an image. After extraction of the fea-
very high spatial resolution of up to 0.2 millimeters in tures, this valuable information is further processed at
all directions. a higher level to make cognitive decisions [49]. Ma-
Various studies have already been conducted using chine learning algorithms become relevant when a clear
ultra-high-field (7 T) MRI in [43,73,77]. The proposed learning problem requires an unambiguous task, per-
algorithm in [26] was experimented using 3 and 7 T formance metric, and experience. While the generative
MRIs. All the remaining works in this review used con- methods involve modeling, the discriminative methods
ventional 1.5/3 T MRI scanners. Imaging at 7 T provided solve classification directly. The generative and discrimi-
advantages in signal-to-noise ratio, image contrast, res- native method pair – naïve Bayes and logistic regression
olution, improved sensitivity and spectral dispersion – is an analogous pair for classification, while hidden
[43]. Markov model (HMM) and conditional random field
The gray matter tissue of the brain was precisely seg- (CRF) is a corresponding pair for sequential data. In
mented from a 3D MRI obtained from a high field (7 T) pattern classification, neural networks are commonly
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 63
VSD was developed [32]. It is a system with a data- P. McBee et al. [45] explain about the intrusion on ML
centric concept and search option for anatomical struc- followed by DL into radiology.
tures. The medical image community utilizes it for sci- DL uses multiple layers and multiple units within
entific collaboration and it is a useful tool for enhancing layers to represent highly complex functions. These hi-
segmentation algorithms. The Multimodal Brain Tumor erarchical layers extract low and high level features using
Segmentation (BRATS) challenge organizer had selected the nonlinear processing units in the layers. Hence they
VSD to host their data and challenge. do supervised learning of the features. When highlight-
To analyze the brain imaging data sequences, the ing the merits of DL, the time consuming process of
SPM software package was designed. These sequences creating hand-crafted features can be skipped, DL gen-
may be series of images from various time-series or co- eralizes well with a new problem and the performance
horts obtained from the same subject. SPM5 is used in improvement is sufficiently higher than in traditional
[11]. It is outdated now and the latest is SPM12. The lat- methods. DL networks include the convolutional neu-
est release is created for the analysis of PET, fMRI, EEG, ral network (CNN), which does supervised learning on
SPECT and MEG. variable and fixed length data and recurrent neural net-
The FSL (FMRIB’s Software Library contains statis- work (RNN), which works on variable length sequential
tical and image analysis tools for structural, diffusion data only. Hence CNN dominates when the challenge
and functional MRI brain imaging) Linear Image Regis- is image classification. This multi-layered network uses
tration Tool FLIRT is an automated accurate and robust spatial relationship for feature generation from two- or
tool for linear inter- and intra-modal brain image regis- three-dimensional image data. Where ML uses matrix
tration. It was embedded to automatically calculate the multiplication operations, DL uses convolution, detec-
transformation between T2-w and T1-w images for each tion and pooling operation. The common form of CNN
patient [85]. FMRIB’s Automated Segmentation Tool architecture is shown in Fig. 4.6. The fundamental deep
called FAST segments the brain 3D image into various learning architectures are deep belief networks (DBN),
tissue types, i.e., CSF, gray and white matter, etc. It does stacked auto encoder (SAE) and multilayer perceptron
correction of spatial intensity variations (also known as (MLP). See Table 4.8.
RF or bias field inhomogeneities) simultaneously. It is An adaptation of the traditional ANN architecture is
based on an associated expectation-maximization algo- the so-called CNN. In a CNN, a multidimensional in-
rithm and a hidden Markov random field model [85]. put image is transformed into the desired output using
Various other software applied in this area of re- nonlinear activation functional and stacks of convolu-
search are TumorSim which is used for simulation of tional filter kernels. The number of layers and units in
synthetic brain tumor images, DoctorNo suite of tools each layer and the between layer connections together
as a plugin for GUI-based “DoctorEye” software plat- form the structure of the neural network, resulting in
form, GLISTR toolkit, etc. the CNN architecture. The depth of the network is deter-
mined by the number of layers. Based on the problem at
hand, this generalized network architecture can be mod-
4.5 DEEP LEARNING (DL) ified. For example, if the input is an image, it is a matrix
The drawbacks of ML algorithms when compared with multiplied with the weighted convolution kernel. Us-
DL algorithms are that human knowledge is required ing a small kernel, fewer parameters are extracted from
for feature extraction, curse of dimensionality, and poor the input. The layer outputs are calculated using kernels.
generalization capability. Mrudang D. Pandya et al. [47] Convolution operation is followed by linear activation
explain about the need of DL over ML, tools and tech- operation. Commonly rectified linear unit (ReLU) is
nology available for DL, applicability of DL in medical used in the detection stage, followed by the pooling
image analysis, hybrid architectures of DL in the field of stage that modifies the output. Finally, a softmax func-
medical image processing and challenges of DL. A com- tion, i.e., a normalized exponential function is used at
parison of ML vs. DL is given in a table. Also Morgan the output layer.
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 65
TABLE 4.8
A comparison of ML vs. DL.
ML DL
Works even on low end machines High end machine dependent, hence uses GPUs
ML algorithms works well with large amount of data Performance increases as the scale of data increases only
Handpicked features by experts Learns high level features from input data
Comparatively takes lesser training time (few seconds to Take more training time (at times weeks) as it uses multiple
few hours) parameters
Solves the problem part by part Does end to end problem solving
Easy to interpret the solution Hard to defend the solution
The data variables are analyzed by an analyst and the al- The DL algorithms are self-directed once they are imple-
gorithm is directed by them mented
(NCC), normalized absolute error (NAE) [6], Williams’ nos Kamnitsas employed DeepMedic, a 3D CNN archi-
index [29], etc. tecture for lesion segmentation, extended with resid-
ual connections. Varghese Alex proposed 5-layer deep
stacked denoising autoencoders (SDAE) for segmenta-
4.7 EMBEDDING INTO CLINICS tion of gliomas, where the training was done using
Despite enormous research in this field over decades, patches of size 21 × 21, extracted from various MRI se-
applications of these methodologies to clinics are lim- quences like T1, T2, FLAIR, and T1 post-contrast im-
ited as clinicians still trust manual tumor delineations. ages. Tseng Kuan Lun presented a fully-automatic seg-
This could have been because of the communication mentation method by utilizing a CNN. Laszlo Lefkovits
gap between clinicians and researchers. The research used a discriminative model based on RF to accom-
tools developed till now are not familiar to clinicians; plish brain tumor segmentation in multimodal MR im-
hence efforts must be concentrated on making them ages wherein a feature vector with 960 elements was
user-friendly in the future. This may be because of the obtained from 240 image features extracted from each
challenge that the transfer of technology from the bench modality. Loic Le Folgoc proposed cascades of lifted
to bedside is valid only when the efficient outputs ob- decision forests for segmentation which used an SMM-
tained in a controlled research environment are repro- MRF (Student mixture-MRF) layer to locate the ROI
ducible in clinical routine. Robustness is a crucial factor for the whole tumor. Richard McKinley proposed a
for daily use of these protocols. Robustness must be Nabla-net: a deep DAG-like convolutional architecture
towards slight changes in acquisition protocols and flex- for applying to high- and low-grade glioma segmen-
ibility for upgrades. tation. A nabla net is a deep encoder/decoder net-
work. Raphael Meier proposed a dense CRF which can
overcome the shrinking bias inherent in many grid-
4.8 CURRENT STATE-OF-THE-ART structured CRFs. The focus was on the segmentation
The advanced technologies in automated brain tumor of glioblastoma. Xiaomei Zhao integrated an FCNN
segmentation are compared in Multimodal Brain Tu- and CRF for segmentation, rather than adopting CRF
mor Image Segmentation (BRATS) MICCAI challenges as a post-processing step of the FCNN. Balaji Pan-
since 2012. An annotated data set with about 60 high dian proposed a fully automated approach based on
and low-grade cases are publicly available from the a 3D CNN with subvolume training procedures for
VSD and the MIDAS webpages, two online platforms brain tumor segmentation in multi-modal MRIs, while
for hosting and evaluating image segmentation bench- Adria Casamitjana proposed two fully convolutional
marks. The MICCAI 2016 emphasized on longitudinal 3D CNN architectures which are a variant of the two-
segmentation tasks, estimated the size of relevant tumor pathway DeepMedic net. Bi Song proposed anatomy-
structures, predicted whether the tumor was progress- guided brain tumor segmentation and classification to
ing, shrinking, or remained stable for a set of two scans delineate tumor structures into the active tumorous
of a given patient [41]. core, necrosis, and edema.
In [41], Mustafa Arikan used an anisotropic diffusion The proposals in [41] provide a broad spectrum of
filter for noise reduction, a bounding box containing segmentation methodologies. Apparently, it is clear that
tumor was extracted, and a certain number of seeds in the last two years there was an increase in the use
were randomly selected from the dataset. Finally, seg- of deep learning methods, specifically CNN in several
mentation was done using SVM on Flair, T1, T1c and computer vision tasks. The pre-processing stage com-
T2 modalities. Peter D. Chang proposed a fully con- monly used N4ITK for bias correction and also the focus
volutional neural network with hyper-local features for was on gliomas. A detailed survey on the current state-
brain tumor segmentation. The network was composed of-the-art techniques in this field is provided in [44].
of only 130,400 parameters and could complete seg-
mentation for an entire brain volume in less than one 4.8.1 Deep Learning Concepts for Brain
second. Dimah Dera proposed a non-negative matrix Tumor Grading
factorization level set segmentation technique, i.e., a de- In the last few years, there has been an increase in the
composition technique that reduces the dimensionality use of deep learning methods, especially CNN. Rather
of an image, and segmentation of 465 images took 240 than using hand-crafted features, DL models learn com-
minutes. Abdelrahman Ellwaa introduced an iterative plex, task adaptive and high level features from the data
random forest approach which tried to improve its accu- directly. Due to these benefits, DL models are used for
racy by iteratively choosing the best patients. Konstanti- brain tumor detection, segmentation and classification.
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 67
Convolutional neural network (CNN), stacked denois- design a system providing better diagnosis and higher
ing auto-encoder (SDAE) and recurrent neural network accuracy than when individually considered.
(RNN) are the common deep learning models used Computer aided tumor detection along with pattern
[74]. Schmidhuber [67] and Z.W. Weibo Liu et al. [79] recognition software [32,76,85] helps radiologists re-
provide a good review about deep learning networks. view results easier. According to the above review, fusing
Usually, the training sample size is of great consider- several good algorithms gave results that steadily ranked
ation in case of deep learning methodologies. Various above most of the individual algorithms [42]. For sub-
methods using deep learning were proposed for brain sequent fusion, user interaction proved to be useful
tumor classification. See Table 4.9. in selecting the best segmentation maps. Subsequently
The relevant literature has a vast collection of the use driving beyond the limits of individual tumor segmen-
of DL models for tissue, tumor, lesion, subcortical struc- tation algorithms, future advance is possible by inspect-
ture and whole brain segmentation. Bradley J. Erickson ing how to design and combine algorithms by fusion
et al. [3], Z.W. Weibo Liu et al. [79], Geert Litjens et al. strategies. Furthermore, we should look into the dataset
[13], Rupal R. Agravat and Mehul S. Raval [61] and Jose consisting of a class like glioma and its subclasses to
Bernal et al. [27] give a detailed survey for MRI brain be collected for classification. The MICCAI-BRATS Chal-
tumor segmentation and MR image analysis using deep lenge suggests that methods based on random forests
learning. Bradley J. Erickson et al. [3] explain how much are among the most accurate [41]. At the same time, re-
data is really required when we use DL methods for search shows a good performance on CNN-based algo-
medical image analysis. The work concludes that vari- rithms, especially in the field of 2D data classification.
ability in the process being studied actually decides the This is because of its merit that each kernel in different
amount of data. Two ways to work well with decreased layers is learned spontaneously so that no prior feature
training data are transfer learning and data augmenta- setting is needed, because of which the number of train-
tion. In transfer learning, we train the first layers using ing examples becomes critical [81].
images having similar features while in data augmen- DL models have had a huge impact in the domain
tation variants of original data are created, i.e., rotated of natural language processing, speech recognition and
images or images with added noise. computer vision for problem solving. Researchers now
work on small image patches rather than the whole
volume/slice using the computationally efficient CNNs
4.9 DISCUSSION to obtain accurate segmentation results. These DL net-
Gliomas being the most commonly reported type of works have been accepted more, and the architecture
brain tumor, this review sheds light on the grading of is growing more sophisticated by including more layers
gliomas and hence facilitating easy diagnosis. The WHO and better optimization ideas.
classification of CNS tumor [35], its upgrade and grad- For good generalization, use an architecture with op-
ing [12,52] were elaborated initially. As well as histo- timized layers, select best hyperparameters, advanced
logical and clinical grading based on malignancy level training methods, and overcome class imbalance prob-
[36], the biological characteristics that aided grading [7, lems. When considering drawbacks of DL models, com-
65,82] and the need for grading was explained. With putational requirements are to be considered by using
the statistical studies in [17,23,24,40,66], apparently, it quicker matrix multiplication methods and FFT algo-
is clear why gliomas are still the area of great interest rithms. Above all, there is more room to consider the
for research. The increasing need for non-invasive grade distributed and parallelized implementations.
determination of a brain tumor leads to in-depth mul-
tidisciplinary studies.
As the prevalence of brain tumors has increased over 4.10 CONCLUSIONS
time, various approaches incorporating a combination The necessity of integrating machine learning and deep
of MRI and MRS [11,15,38,48,63], supervised and un- learning methodology with the diagnosis of brain tu-
supervised learning methods [11,39,48,60,84], as well mor and the recent segmentation and classification
as hybrid systems using more than one classifier, is a techniques on brain MR images was reviewed. The cur-
growing requirement to build an efficient and accurate rent trends in the grading of brain tumor with a focus
system. These approaches aid to clarify the ambiguity on gliomas which include astrocytoma were elucidated.
prevailing in current methods. The features that can The current state-of-the-art, software packages, evalua-
be studied by MR imaging [7,14,20,33,50,51,62,78,83] tion and validation metrics used in different approaches
and those by MRS [21,38,48,59] can be combined to were discussed along with integration into the clinical
68 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 4.9
Summary of deep learning methodologies for brain tumor classification.
Paper Pre-processing Feature Classes Dataset Performance
extraction/
classification
[54] intensity CNN HGG, LGG BRATS 2014 specificity and sensitivity with
normalization – (170 high intersected value of 0.6667
grade, 25 low
grade)
[68] intensity CNN normal BRATS 2013 Dice – (BRATS 2013 dataset – for
normalization, tissue, – (65 scans), the complete, core, and enhancing
bias field necrosis, BRATS 2015 regions are 0.88, 0.83, 0.77, BRATS
correction by edema, non- – (327 scans) 2015 dataset – 0.78, 0.65, 0.75),
N4ITK method enhancing, speed – 8 min
enhancing
tumor
[17] N4ITK for bias CNN non-tumor, BRATS 2013 Dice – (complete – 0.88, core –
correction necrosis, – (20 HGG, 10 0.79, enhancing – 0.73), specificity
edema, LGG) – (complete – 0.89, core – 0.79,
enhancing enhancing – 0.68), sensitivity –
tumor and (complete – 0.87, core – 0.79,
non- enhancing – 0.80), speed – 25 s to
enhancing 3 min
tumor
Peter D. intensity CNN enhancing BRATS 2016 Dice – (enhancing tumor – 0.72,
Chang normalization tumor, core – (144 HGG core tumor – 0.81, complete tumor
[2] algorithm tumor and patients) – 0.87), speed – 0.52 s for 64
complete images
tumor
Vargh- normalization SDAE whole tumor, BRATS 2015 Dice – (whole tumor-0.84, tumor
ese Alex tumor core, – (9 image core – 0.71, active tumor – 0.81)
[2] active tumor volumes)
Tseng nil SegNet complete, BRATS 2015 Dice – (complete – 0.75, core –
Kuan core, 0.77, enhancing – 0.76)
Lun [2] enhancing
Richard nil Nabla-net high and low BRATS 2012 Dice – (whole tumor – 0.87, tumor
McKin- grade glioma – (30 images) core – 0.69, enhancing – 0.56)
ley [2]
Balaji N4ITK bias 3D CNN whole tumor, BRATS 2016 Dice – (whole tumor – 0.725, tumor
Pandian correction, tumor core, – (40 images) core – 0.611, active tumor – 0.572)
[2] normalization active tumor
Raman- nil DNN HGG, LGG BRATS 2015 – Dice – (complete tumor – 0.87,
deep (275 patients) tumor core – 0.75, enhanced tumor
Rand- – 0.71)
hawa
[2]
Adria normalization 3D CNN HGG, LGG BRATS 2015 Dice – (whole – 0.89, core – 0.76,
Casamit- active-0.37)
jana [2]
Xiaomei N4ITK bias FCNN + CRF HGG, LGG BRATS 2013, Dice – (BRATS 2013 – complete –
Zhao [2] correction, BRATS 2015 0.87, core – 0.82, enhancing – 0.76,
normalization BRATS 2015 – complete – 0.8, core
– 0.68, enhancing – 0.65), average
run time – 8 min
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 69
14. M. Gupta, B.V.V.S.N. Prabhakar Rao, V. Rajagopalan, A. 26. Z. Ji, Q. Sun, Y. Xia, Q. Chen, D. Xia, D. Feng, Gen-
Das, C. Kesavadas, Volumetric segmentation of brain tu- eralized rough fuzzy c-means algorithm for brain MR
mor based on intensity features of multimodality magnetic image segmentation, Computer Methods and Programs
resonance imaging, in: IEEE International Conference on in Biomedicine 108 (2012) 644–655, https://doi.org/10.
Computer Communication and Control, IC4 2015, 2015. 1016/j.cmpb.2011.10.010.
15. J.A. Guzmán-De-Villoria, J.M. Mateos-Pérez, P. Fernández- 27. Jose Bernal, Kaisar Kushibar, et al., Deep convolutional
García, E. Castro, M. Desco, Added value of advanced neural networks for brain image analysis on magnetic reso-
over conventional magnetic resonance imaging in grading nance imaging: a review, Artificial Intelligence in Medicine
gliomas and other primary brain tumors, Cancer Imag- (April 2018) 1–18, https://doi.org/10.1016/j.artmed.2018.
ing 14 (2014) 1–10, https://doi.org/10.1186/s40644-014- 08.008.
0035-8. 28. Jose Dolz, Nacim Betrouni, Mathilde Quidet, Dris Khar-
16. Hao Chen, Qi Dou, Lequan Yu, Jing Qin, Pheng-Ann roubi, Henri A. Leroy, Nicolas Reyns, Laurent Massoptier,
Heng, VoxResNet: deep voxelwise residual networks for Maximilien Vermandel, Stacking denoising auto-encoders
brain segmentation from 3D MR images, NeuroIm- in a deep network to segment the brainstem on MRI in
age 170 (April 2018) 446–455, https://doi.org/10.1016/j. brain cancer patients: a clinical study, Computerized Med-
neuroimage.2017.04.041. ical Imaging and Graphics (2016), https://doi.org/10.1016/
17. M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, j.compmedimag.2016.03.003.
Y. Bengio, C. Pal, P.M. Jodoin, H. Larochelle, Brain tu- 29. T. Kalaiselvi, K. Somasundaram, S. Vijayalakshmi, A novel
mor segmentation with deep neural networks, Medical Im- self initiating brain tumor boundary detection for MRI,
age Analysis 35 (2017) 18–31, https://doi.org/10.1016/j. Communications in Computer and Information Science
media.2016.05.004. 283 (2012) 54–61, https://doi.org/10.1007/978-3-642-
18. M. Havaei, P.M. Jodoin, H. Larochelle, Efficient interac- 28926-2.
tive brain tumor segmentation as within-brain kNN clas- 30. Ken C.L. Wong, Tanveer Syeda-Mahmood, Mehdi Moradi,
sification, in: Proceedings – International Conference on Building medical image classifiers with very limited data
Pattern Recognition, Institute of Electrical and Electronics using segmentation networks, Medical Image Analysis
Engineers Inc., 2014, pp. 556–561. (2018), https://doi.org/10.1016/j.media.2018.07.010.
19. Heba Mohsen, El-Sayed A. El-Dahshan, El-Sayed M. El- 31. Keras, n.d., https://keras.io/.
Horbaty, Abdel-Badeeh M. Salem, Classification using 32. M. Kistler, S. Bonaretti, M. Pfahrer, R. Niklaus, P. Büchler,
deep learning neural networks for brain tumors, Future The virtual skeleton database: an open access repository for
Computing and Informatics Journal (2018), https://doi. biomedical research and collaboration, Journal of Medical
org/10.1016/j.fcij.2017.12.001. Internet Research 15 (2013) 1–14, https://doi.org/10.2196/
20. M. Horn, Magnetic Resonance Imaging Methods and Bi- jmir.2930.
ologic Applications, Methods in Molecular Medicine, Hu- 33. S. Koley, A.K. Sadhu, P. Mitra, B. Chakraborty, C.
mana Press, New Jersey, 2006. Chakraborty, Delineation and diagnosis of brain tumors
21. A. Horska, P.B. Barker, Imaging of brain tumors: MR spec- from post contrast T1-weighted MR images using rough
troscopy and metabolic imaging, Neuroimaging Clinics granular computing and random forest, Applied Soft Com-
of North America 20 (2011) 293–310, https://doi.org/10. puting 41 (2016) 453–465, https://doi.org/10.1016/j.asoc.
1016/j.nic.2010.04.003.Imaging. 2016.01.022.
22. Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, 34. Konstantinos Kamnitsas, Christian Ledig, Virginia F.J. New-
Vincent Dumoulin, Mehdi Mirza, Razvan Pascanu, James combe, Joanna P. Simpson, et al., Efficient multi-scale
Bergstra, Frederic Bastien, Yoshua Bengio, Pylearn2: a ma- 3D CNN with fully connected CRF for accurate brain le-
chine learning research library, http://deeplearning.net/ sion segmentation, Medical Image Analysis (2016), https://
software/pylearn2, August 2013. doi.org/10.1016/j.media.2016.10.004.
23. A. Jain, M.C. Sharma, V. Suri, S.S. Kale, A.K. Mahapatra, M. 35. L.R. Lym, Q.T. Ostrom, C. Kruchko, M. Couce, D.J. Brat,
Tatke, G. Chacko, A. Pathak, V. Santosh, P. Nair, N. Husain, D.N. Louis, J.S. Barnholtz-Sloan, Completeness and con-
C. Sarkar, Spectrum of pediatric brain tumors in India: a cordancy of WHO grade assignment for brain and central
multi-institutional study, Neurol India 59 (2011) 208–211, nervous system tumors in the United States, 2004–2011,
https://doi.org/10.4103/0028-3886.79142. Journal of Neuro-Oncology 123 (1) (2015) 43–51, https://
24. J. Jaiswal, A.H. Shastry, A. Ramesh, Y.T. Chickabasaviah, doi.org/10.1007/s11060-015-1775-4.
A. Arimappamagan, V. Santosh, Spectrum of primary in- 36. D.N. Louis, H. Ohgaki, O.D. Wiestler, W.K. Cavenee, P.C.
tracranial tumors at a tertiary care neurological institute: Burger, A. Jouvet, B.W. Scheithauer, P. Kleihues, The 2007
a hospital-based brain tumor registry, Neurol India 64 WHO classification of tumours of the central nervous sys-
(2016) 494–501. tem, Acta Neuropathologica 114 (2007) 97–109, https://
25. Javeria Amin, Muhammad Sharif, Mussarat Yasmin, Steven doi.org/10.1007/s00401-007-0243-4.
Lawrence Fernandes, Big data analysis for brain tumor 37. D.N. Louis, A. Perry, G. Reifenberger, A. von Deimling, D.
detection: deep convolutional neural networks, Future Figarella-Branger, W.K. Cavenee, H. Ohgaki, O.D. Wiestler,
Generation Computer Systems (2018), https://doi.org/10. P. Kleihues, D.W. Ellison, The 2016 world health organiza-
1016/j.future.2018.04.065. tion classification of tumors of the central nervous system:
72 Deep Learning and Parallel Computing Environment for Bioengineering Systems
a summary, Acta Neuropathologica 131 (2016) 803–820, 48. D.S. Nachimuthu, A. Baladhandapani, Multidimensional
https://doi.org/10.1007/s00401-016-1545-1. texture characterization: on analysis for brain tumor tissues
38. J. Luts, A. Heerschap, J.A.K. Suykens, S. Van Huffel, A com- using MRS and MRI, Journal of Digital Imaging 27 (2014)
bined MRI and MRSI based multiclass system for brain 496–506, https://doi.org/10.1007/s10278-013-9669-5.
tumour recognition using LS-SVMs with class probabilities 49. A. Nandi, Detection of human brain tumour using MRI im-
and feature selection, Artificial Intelligence in Medicine 40 age segmentation and morphological operators, in: 2015
(2007) 87–102, https://doi.org/10.1016/j.artmed.2007.02. IEEE International Conference on Computer Graphics, Vi-
002. sion and Information Security (CGVIS), 2015, pp. 55–60.
39. U. Maya, K. Meenakshy, Unified model based classification 50. M.S. Norhashimah, S.A.R. Syed Abu Bakar, A. Sobri Muda,
with FCM for brain tumor segmentation, in: 2015 IEEE M. Mohd Mokji, Review of brain lesion detection and clas-
International Conference on Power, Instrumentation, Con- sification using neuroimaging analysis techniques, Jurnal
trol and Computing (PICC), 2015, pp. 7–10. Teknologi 6 (2015) 73–85.
40. C. McPherson, Glioma Brain Tumors, Mayfield Clinic & 51. A. Ortiz, J.M. Gorriz, J. Ramirez, D. Salas-Gonzalez, Im-
Spine Institute, Ohio, 2016. proving MRI segmentation with probabilistic GHSOM and
41. B. Menze, M. Reyes, Multimodal brain tumor image seg- multiobjective optimization, Neurocomputing 114 (2013)
mentation benchmark: “change detection”, in: Proceed- 118–131, https://doi.org/10.1016/j.neucom.2012.08.047.
ings of MICCAI-BRATS 2016 Multimodal, Munich, 2016. 52. Q.T. Ostrom, H. Gittleman, J. Xu, C. Kromer, Y. Wolin-
42. B.H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Fara- sky, C. Kruchko, J.S. Barnholtz-Sloan, CBTRUS statistical
hani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, L. report: primary brain and other central nervous system
Lanczi, E. Gerstner, M.A. Weber, T. Arbel, B.B. Avants, N. Ay- tumors diagnosed in the United States in 2009–2013,
ache, P. Buendia, D.L. Collins, N. Cordier, J.J. Corso, A. Cri- Neuro-Oncology 18 (2016) v1–v75, https://doi.org/10.
minisi, T. Das, H. Delingette, C. Demiralp, C.R. Durst, M. 1093/neuonc/now207.
Dojat, S. Doyle, J. Festa, F. Forbes, E. Geremia, B. Glocker, P. 53. P. Shanthakumar, P. Ganeshkumar, Performance analysis
Golland, X. Guo, A. Hamamci, K.M. Iftekharuddin, R. Jena, of classifier for brain tumor detection and diagnosis, Com-
N.M. John, E. Konukoglu, D. Lashkari, J.A. Mariz, R. Meier, puters & Electrical Engineering 45 (2015) 302–311, https://
S. Pereira, D. Precup, S.J. Price, T.R. Raviv, S.M.S. Reza, doi.org/10.1016/j.compeleceng.2015.05.011.
M. Ryan, D. Sarikaya, L. Schwartz, H.C. Shin, J. Shotton, 54. Y. Pan, W. Huang, Z. Lin, W. Zhu, J. Zhou, J. Wong, Z. Ding,
Brain tumor grading based on neural networks and convo-
C.A. Silva, N. Sousa, N.K. Subbanna, G. Szekely, T.J. Taylor,
lutional neural networks, in: Engineering in Medicine and
O.M. Thomas, N.J. Tustison, G. Unal, F. Vasseur, M. Win-
Biology Society (EMBC), 2015 37th Annual International
termark, D.H. Ye, L. Zhao, B. Zhao, D. Zikic, M. Prastawa,
Conference of the IEEE, 2015, pp. 699–702.
M. Reyes, K. Van Leemput, The multimodal brain tumor
55. E.I. Papageorgiou, P.P. Spyridonos, D.T. Glotsos, C.D.
image segmentation benchmark (BRATS), IEEE Transac-
Stylios, P. Ravazoula, G.N. Nikiforidis, P.P. Groumpos,
tions on Medical Imaging 34 (2015) 1993–2024, https://
Brain tumor characterization using the soft computing
doi.org/10.1109/TMI.2014.2377694.
technique of fuzzy cognitive maps, Applied Soft Com-
43. M. Metcalf, D. Xu, D.T. Okuda, L. Carvajal, D.A.C.
puting 8 (2008) 820–828, https://doi.org/10.1016/j.asoc.
Kelley, P. Mukherjee, S.J. Nelson, D.R. Nat, D.B. Vi-
2007.06.006.
gneron, D. Pelletier, High-resolution phased-array MRI
56. Pylearn2, n.d., http://deeplearning.net/software/
of the human brain at 7 Tesla: initial experience in
pylearn2/.
multiple sclerosis patients, Journal of Neuroimaging 57. Pytorch, n.d., http://torch.ch/.
20 (2010) 141–147, https://doi.org/10.1111/j.1552-6569. 58. Pytorch, n.d., https://pytorch.org/.
2008.00338.x.High-Resolution. 59. O. Rapalino, E.M. Ratai, Multiparametric imaging analy-
44. G. Mohan, M. Monica Subashini, MRI based medical sis: magnetic resonance spectroscopy, Magnetic Resonance
image analysis: survey on brain tumor grade classifica- Imaging Clinics of North America 24 (2016) 671–686,
tion, Biomedical Signal Processing and Control 39 (2018) https://doi.org/10.1016/j.mric.2016.06.001.
139–161, https://doi.org/10.1016/j.bspc.2017.07.007. 60. S. Roy, S. Sadhu, S.K. Bandyopadhyay, D. Bhattacharyya,
45. Morgan P. McBee, Omer A. Awan, et al., Deep learning T.H. Kim, Brain tumor classification using adaptive neuro-
in radiology, Academic Radiology (2018) 1–9, https://doi. fuzzy inference system from MRI, International Journal
org/10.1016/j.acra.2018.02.018. of Bio-Science and Bio-Technology 8 (2016) 203–218,
46. Mostefa Ben Naceur, Rachida Saouli, Mohamed Akil, Ros- https://doi.org/10.14257/ijbsbt.2016.8.3.21.
tom Kachouri, Fully automatic brain tumor segmenta- 61. Rupal R. Agravat, Mehul S. Raval, Deep learning for au-
tion using end-to-end incremental deep neural networks tomated brain tumor segmentation in MRI images, Chap-
in MRI images, Computer Methods and Programs in ter 11, in: Soft Computing Based Medical Image Analysis,
Biomedicine (2018), https://doi.org/10.1016/j.cmpb.2018. 2018, pp. 183–201.
09.007. 62. J. Sachdeva, V. Kumar, I. Gupta, N. Khandelwal, C.K. Ahuja,
47. Mrudang D. Pandya, Parth D. Shah, Sunil Jardosh, Medical A package-SFERCB-“segmentation, feature extraction, re-
image diagnosis for disease detection: a deep learning ap- duction and classification analysis by both SVM and ANN
proach, Chapter 3, in: U-Healthcare Monitoring Systems, for brain tumors”, Applied Soft Computing 47 (2016)
2019, pp. 37–60. 151–167, https://doi.org/10.1016/j.asoc.2016.05.020.
CHAPTER 4 Medical Imaging With Intelligent Systems: A Review 73
63. J. Sachdeva, V. Kumar, I. Gupta, N. Khandelwal, C.K. Ahuja, 76. N.J. Tustison, K.L. Shrinidhi, M. Wintermark, C.R. Durst,
Multiclass brain tumor classification using GA-SVM, in: B.M. Kandel, J.C. Gee, M.C. Grossman, B.B. Avants, Op-
2011 Developments in E-systems Engineering, vol. 97, timal symmetric multimodal templates and concatenated
2011, pp. 182–187. random forests for supervised brain tumor segmenta-
64. Saddam Hussain, Syed Muhammad Anwar, Muhammad tion (simplified) with ANTsR, Neuroinformatics 13 (2015)
Majid, Segmentation of glioma tumors in brain using 209–225, https://doi.org/10.1007/s12021-014-9245-2.
deep convolutional neural network, Neurocomputing 282 77. A.G. Van Der Kolk, J. Hendrikse, J.J.M. Zwanenburg, F.
(2018) 248–261, https://doi.org/10.1016/j.neucom.2017. Visser, P.R. Luijten, Clinical applications of 7 T MRI in the
12.032. brain, European Journal of Radiology 82 (2013) 708–718,
65. A. Sarkar, E.A. Chiocca, Glioblastoma and malignant astro- https://doi.org/10.1016/j.ejrad.2011.07.007.
cytoma, in: Brain Tumors, American BrainTumor Associa- 78. G. Vishnuvarthanan, M.P. Rajasekaran, P. Subbaraj, A.
tion, Chicago, 2016, pp. 1–22. Vishnuvarthanan, An unsupervised learning method with
66. J.A. Schwartzbaum, J.L. Fisher, K.D. Aldape, M. Wrensch, a clustering approach for tumor identification and tissue
Epidemiology and molecular pathology of glioma, Nature segmentation in magnetic resonance brain images, Ap-
Clinical Practice Neurology 2 (2006) 494–503, https://doi. plied Soft Computing 38 (2016) 190–212, https://doi.org/
org/10.1038/ncpneuro0289. 10.1016/j.asoc.2015.09.016.
67. Juergen Schmidhuber, Deep learning in neural networks: 79. Weibo Liu, Zidong Wang, Xiaohui Lui, Nianyin Zeng,
an overview, Neural and Evolutionary Computing 61 Yurong Liu, et al., A survey of deep neural network archi-
(2015) 85–117, https://doi.org/10.1016/j.neunet.2014.09. tectures and their applications, Neurocomputing 11 (26)
003. (2017), https://doi.org/10.1016/j.neucom.2016.12.038.
80. Xiaomei Zhao, Yihong Wu, Guidong Song, Zhenye Li, et
68. Sérgio Pereira, Adriano Pinto, Victor Alves, Carlos A. Silva,
al., A deep learning model integrating FCNNs and CRFs for
Brain tumor segmentation using convolutional neural net-
brain tumor segmentation, Medical Image Analysis (2017),
works in MRI images, IEEE Transactions on Medical Imag-
https://doi.org/10.1016/j.media.2017.10.002.
ing 35 (5) (2016) 1240–1251, https://doi.org/10.1109/
81. Yuehao Pan, Weimin Huang, Zhiping Lin, Wanzheng Zhu,
TMI.2016.2538465.
Jiayin Zhou, Jocelyn Wong, Zhongxiang Ding, Brain tu-
69. K.A. Smitha, A.K. Gupta, R.S. Jayasree, Relative percent-
mor grading based on neural networks and convolutional
age signal intensity recovery of perfusion metrics-an effi-
neural networks, in: Engineering in Medicine and Biology
cient tool for differentiating grades of glioma, British Jour-
Society (EMBC), 2015 37th Annual International Confer-
nal of Radiology 88 (2015), https://doi.org/10.1259/bjr. ence of the IEEE, 2015, pp. 699–702.
20140784. 82. M. Zarinbal, M.H. Fazel Zarandi, I.B. Turksen, M. Izadi, A
70. M. Strumia, D. Feltell, N. Evangelou, P. Gowland, C. type-2 fuzzy image processing expert system for diagnosing
Tench, L. Bai, Grey matter segmentation of 7T MR images, brain tumors, Journal of Medical Systems 39 (2015) 110,
in: IEEE Nuclear Science Symposium Conference Record, https://doi.org/10.1007/s10916-015-0311-6.
2011, pp. 3710–3714. 83. N. Zhang, S. Ruan, S. Lebonvallet, Q. Liao, Y. Zhu, Ker-
71. Tensorflow, n.d., https://www.tensorflow.org/. nel feature selection to fuse multi-spectral MRI images
72. Theano, n.d., http://deeplearning.net/software/theano/. for brain tumor segmentation, Computer Vision and Im-
73. J.M. Theysohn, O. Kraff, K. Eilers, D. Andrade, M. Gerwig, age Understanding 115 (2011) 256–269, https://doi.org/
D. Timmann, F. Schmitt, M.E. Ladd, S.C. Ladd, A.K. Bitz, 10.1016/j.cviu.2010.09.007.
Vestibular effects of a 7 Tesla MRI examination compared 84. Y. Zhang, Z. Dong, L. Wu, S. Wang, A hybrid method for
to 1.5 T and 0 T in healthy volunteers, PLoS ONE 9 (2014) MRI brain image classification, Expert Systems with Appli-
3–10, https://doi.org/10.1371/journal.pone.0092104. cations 38 (2011) 10049–10053, https://doi.org/10.1016/j.
74. Thierry Bouwmans, Caroline Silva, Cristina Marghes, Mo- eswa.2011.02.012.
hammed Sami Zitouni, Harish Bhaskar, Carl Frelicot, On 85. Y. Zhu, G.S. Young, Z. Xue, R.Y. Huang, H. You, K. Se-
the role and the importance of features for background tayesh, H. Hatabu, F. Cao, S.T. Wong, Semi-automatic seg-
modeling and foreground detection, Computer Science mentation software for quantitative clinical brain glioblas-
Review 28 (2018) 26–91, https://doi.org/10.1016/j.cosrev. toma evaluation, Academic Radiology 19 (2012) 977–985,
2018.01.004. https://doi.org/10.1016/j.acra.2012.03.026.
75. Torch, n.d., http://torch.ch/.
CHAPTER 5
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00012-9 75
Copyright © 2019 Elsevier Inc. All rights reserved.
76 Deep Learning and Parallel Computing Environment for Bioengineering Systems
FIG. 5.1 Common organization of deep learning for medical image analysis.
this actual reason, that is, to interpret the features to process images having input information. In addi-
in a multidimensional space. tion, ConvNets accept input in the form of a matrix.
(2) Unlike multi-layer perceptrons, convolutional neu- Convolutional layers realm is spatial information.
ral networks recognize the information that closer In general, neural networks can be useful for a search
image pixels are deeply correlated compared to problem. Each neuron in the network is used for search-
pixels which are distant. ing the association between the input data and their
ConvNets are different from multi-layer perceptron output feature. Dropout dynamically turns neurons off
when comparing the types of hidden layers present while forward transmitting and also makes off their
in the network. In ConvNets, the neurons are orga- weights from converging matching points. After that,
nized in three dimensions: depth, width, and height. it turns on every neuron present in the network and
Each layer converts three-dimensional inputs into three- propagates towards the back. During forward propaga-
dimensional output features using an activation func- tion, layer values are set to zero to perform dropout.
tion. In a multi-layer perceptron, the number of param- In ConvNets, multiple layers are calculated in a well-
eters is increased, due to the fact that it accepts only a organized manner. Each layer has an essential part in
vector as input. ConvNets address this problem in order the network. The input layer receives the data image.
CHAPTER 5 Medical Image Analysis With Deep Neural Networks 77
Each neuron in the fully connected layer is connected ConvNets. Their organization for image classification
to two neighboring layers and does not share its in- is shown in Fig. 5.4. The process of the convolutional
formation within a layer. The neurons in the present layer is shown in Fig. 5.5. A ConvNet in 3D is shown in
layer have full activations in the preceding layer. The ac- Fig. 5.6. The learning of ConvNets comprises two stages
tivations are processed with matrix multiplication and – forward and backward stages. The dot-product of in-
bias is added to that term. In ConvNets, each neuron put with weight and addition of bias term in each layer
is related to a local area and shares its parameters. In is represented in the forward stage. The actual output
the convolutional layer, the main aim of ConvNets is from the network is associated with the stored objective
to mine the information from the given image. The output [4]. In the backward stage, the weight modifica-
convolutional layer performs most of the processing in tions are performed in every level of the layer to make a
78 Deep Learning and Parallel Computing Environment for Bioengineering Systems
One of the most effective innovations in the ar- A DeconvNet is an opponent model of ConvNet that
chitecture of convolutional neural networks, and also maps features to pixels instead of mapping pixels to
award-winning, is AlexNet [3] architecture. The CNN features. The DeconvNet perform filtering and pooling
architecture of AlexNet is shown in Fig. 5.8. A large in reverse order of ConvNet. Employing a DeconvNet is
dataset consisting of 1.2 million images with features a method of performing unsupervised learning. In the
having high-resolution is trained and recognized for ZFNet architecture, DeconvNet is attached to every layer
1000 different categories. The convolutional neural net- of ConvNet network. Initially, an input data image is
work has 60 million trainable parameters with 650,000 presented to the ConvNet and features are calculated
network connections. The final layer has 1000-way soft- through convolutional and pooling layers. To evaluate
max. AlexNet uses non-saturating neurons and effective the activation function of ConvNet, the value zero is as-
GPU for faster processing of the network. AlexNet uses signed to all other activations. The output feature map
the dropout method to reduce the overfitting problem. of ConvNet is passed through DeconvNet. In Decon-
The architecture contains eight layers, including three vNet, unpooling is applied; rectification and filtering
convolutional, two pooling, and two fully-connected are used to restructure the input data image. The pro-
layers with softmax classifier. The output layer provides cess is repeated until the input space is reached. ZFNet
a 1000-way softmax, which recognizes 1000 different is mainly used for image classification. ZFNet is able to
class scores. AlexNet architecture is proficient in attain- handle large training data sets. The dropout technique
ing best computational results on large datasets. But it is used to reduce the number of parameters.
reduces the performance if one convolutional layer is Network-In-Network (NIN) [7] is an innovative deep
detached. neural network used for improving classical discrim-
ZFNet [6] is an innovative technique for well-thought inability of local data image patches within their lo-
intermediary layers and their enhancement. The CNN cal regions. The CNN architecture of NIN is shown in
architecture of ZFNet is shown in Fig. 5.9. ZFNet has Fig. 5.10. In general, scanning the input by a predictable
eight layers, including five convolutional layers that are convolutional layer uses kernels for filtering the image
associated with three fully connected layers. The opera- through a nonlinear activation function. As an alter-
tion of convolution layer is executed with GPU. ZFNet is native, NIN forms micro-neural networks with further
a multi-layered deconvolutional network (DeconvNet). composite architectures to abstract the image patches
80 Deep Learning and Parallel Computing Environment for Bioengineering Systems
learning followed by a supervised training. When data width and contains depth size of color channels. The
is scarce, R-CNN is an efficient method for training the higher layers’ locations are related to the image loca-
large dataset with supervised learning and followed by tions and connected to receptive fields. The basic struc-
fine-tuning on a small dataset. ture of an FCN includes a convolutional layer, pooling
Fully convolutional networks (FCNs) [9] are a deep layer and activation functions that operate on a local re-
as well as influential architecture in semantic segmenta- gion of the image and based only on their associated
tion. An FCN consists of 22 layers, including 19 convo- coordinates. In general, FCN works on any arbitrary-
lutional layers, and is associated with 3 fully connected sized input and produces output based on spatial di-
layers. An FCN takes the input of any size and produces mensions. When the receptive fields overlay consider-
fixed-size output with effective training and interpre- ably, FCN performs layer-by-layer computation using
tation. In semantic segmentation, FCN processes the the feedforward and backpropagation algorithm instead
input data image pixels-to-pixels, which results in the of processing images patch-by-patch.
state-of-the-art without any need for the supplementary Visual Graphics Group (VGG) [10] architecture is
process. FCN is trained end-to-end (i) after supervised
used for evaluating the growth of network depth us-
learning and (ii) for prediction of the input data image
ing large-scale images. Initially, the architecture takes
pixel-wise. The fully convolutional layers of the network
3 × 3 convolution filters and increases the depth up to
calculate compact outputs from random-sized inputs.
16–19 weight layers. The CNN architecture of VGG Net
Both training and interpretation are accomplished for
the entire image at once by computation of the feedfor- is shown in Fig. 5.12. The input data image given to
ward and backpropagation algorithm. Patchwise learn- the network is a 224 × 224 fixed-size RGB image. The
ing is mutual in all methods, but insufficiencies oc- pre-processing of the image is done by subtracting the
cur in the efficiency of training the fully convolutional mean value from each image pixel. The data image is
layer. The complications of pre- and post-processing forwarded through convolutional layers with 3 × 3 fil-
are not included in FCN, which modifies the network ters for further processing. The VGG network consists
from trained information and transmits current realiza- of a series of five convolutional layers, which are asso-
tion for prediction of classification networks as fully ciated with three fully connected layers. The first and
convolutional. Each layer in FCN consists of a three- second fully connected layers have 4096 channels. The
dimensional array that includes the size of height × third fully connected layer called soft-max layer con-
width × depth, where height and weight are represented tains 1000 channels for producing 1000-way classifica-
as spatial dimensions, and depth is represented as a fea- tions. Every hidden layer is processed with the rectifi-
ture. The image pixel size in the first layer is height × cation nonlinear activation function. Initially, the VGG
82 Deep Learning and Parallel Computing Environment for Bioengineering Systems
network contains 11 weight layers having 8 convolu- activation function. GoogLeNet consists of 22 layers,
tional layers that are associated with 3 fully connected including 21 convolutional layers that are associated
layers. After that, the network increases the depth to 19 with one fully connected layer. The network is made
weighted layers having 16 convolutional layers that are from building blocks of convolutional layers and used
associated with 3 fully connected layers. The convolu- to calculate the optimal way of constructing local region
tional layer width starts at 64 and increases by a factor repeatedly with the spatial feature. The lower layer con-
of 2 iteratively and stops when it reaches 512. sists of the input focus in the local regions and 1 × 1
In general, a deep convolutional neural network ac- convolutions are enclosed by the next layer. In the sub-
cepts fixed size input data images. This constraint is sequent convolutions, the filter size varies from 1 × 1,
synthetic and may decrease the accuracy of recogniz- 3 × 3, and 5 × 5 for alignment of image patches. Finally,
ing images of random size. The above constraint is re- a softmax classifier produces output classification of the
moved by innovative pooling approach called spatial given input data image.
pyramid pooling network (SPP-Net) [11]. The CNN ar- Fast regions with convolutional neural networks
chitecture of SPP-Net is shown in Fig. 5.13. SPP-Net can
structures (Fast R-CNN) [13] is an efficient technique
produce a fixed-size image irrespective of an image size.
compared to R-CNN for object detection that reaches
The SPP-Net calculates feature maps only once from the
a higher mean average precision. The CNN architecture
given input data image and applies them to the pool-
of Fast R-CNN is shown in Fig. 5.15. Fast R-CNN in-
ing layer for producing fixed-size images. The SPP-Net
troduces numerous advances in training the network,
avoids repetitive computation in convolutional layers.
It is useful for image classification and also important improves time complexity for testing, and also increases
in object detection. The output classification of SPP-Net the accuracy of object detection. Fast R-CNN trains a
achieves better performance and requires no fine-tuning VGG16 deep network that is 9 times faster than R-CNN
of given input image representation. SPP-Net is one of and 213 times faster in testing for reaching a higher
the best effective techniques in computer vision. The mean average precision. When compared to SPP-net,
network structure partitions the given input image into Fast R-CNN trains VGG16 deep network 3 times faster,
divisions and combines their local regions. Multi-level 10 times faster in testing and is more precise. The no-
pooling in SPP-Net performs faster on object deforma- table drawbacks of R-CNN networks are: (i) R-CNN
tions. The network structure not only generates variable requires multi-stage pipeline for training the network;
size images for testing but also does training and re- (ii) the time and space complexity is more in case
duces over-fitting with the dropout method. of training the network; and (iii) the detection of an
object is slow. The notable drawbacks of SPPnets are:
(i) SPPnets require multi-stage pipeline for training the
network; (ii) the time and space complexity is more in
case of training the network; and (iii) extracted features
from the object proposals are written to disk.
The Fast R-CNN technique has numerous benefits:
(1) Training of the network in single-stage by means of
the multi-task loss function;
(2) Every network layer is updated during training;
(3) Reaching better object detection quality via higher
FIG. 5.13 CNN architecture of SPP-Net.
mean average precision than R-CNN and SPPnets;
A convolutional neural network structure called in- (4) Disk space is not required for storing the object
ception module performs better image classification proposal features.
and object detection. This inception module is also re- A Fast R-CNN network accepts the input data image
ferred to as GoogLeNet [12]. The CNN architecture of and a group of object proposals. The training network
GoogLeNet is shown in Fig. 5.14. The GoogLeNet archi- forwards the entire image to convolutional and pooling
tecture optimizes the use of computational resources. layers for producing the convolutional feature map. The
GoogLeNet architecture increases the width and depth region of interest (RoI) of object proposal is selected
of the convolutional neural network with the least cost. and passed to the pooling layer of the network. Finally,
The optimization quality of architecture is based on every feature vector is passed to a fully connected layer
Hebbian principle and absence of multi-scale compu- that ends with the division of two output layers: softmax
tation. Every convolutional layer uses rectified linear classifier and bounding-box regression.
CHAPTER 5 Medical Image Analysis With Deep Neural Networks 83
Region proposal network (RPN) [14] is a deep con- Faster R-CNN [14] is a deep neural network that is
volutional neural network architecture that detects ob- composed of two different modules. The first module
jects using regions. The CNN architecture of RPN is identifies the object proposals, and the second uses the
shown in Fig. 5.16. RPN calculates object boundaries, object proposals for detection. The CNN architecture of
objectness scores in each object and allows for least Faster R-CNN is shown in Fig. 5.17. Faster R-CNN com-
cost region proposals. RPN performs end-to-end train- bines RPN and Fast R-CNN into a distinct network. The
ing to make high feature region proposals. The network RPN network performs end-to-end training of the net-
accepts input data image and produces a set of rectangu- work to predict the object boundaries and objectness
lar object proposals as output with an objectness score. scores from the given input data image. RPN produces
This objectness score is used to measure the relationship region proposals from the input image. After that, the
to a group of object categories. The RPN selects a small region proposals are used by Fast R-CNN for detection
of objects. So, RPN and Fast R-CNN share their convo-
network with sliding window spatial information from
lutional features and use the popular attention mech-
the given input data image. Each sliding window is re-
anism, in which RPN identifies where to look in the
lated to a low-level feature representation. This feature is
input image for object detection. An innovative anchor
passed to two fully connected layers: a box-classification
boxes method is introduced for avoiding filters. The an-
layer and a box-regression layer. At each sliding win-
chor box method is based on a pyramid of anchors.
dow, the object proposals from multiple regions are The box-classification and box-regression are performed
predicted. The classification layer has 2000 scores that with the help of anchor boxes of different scales and as-
evaluate the probability of an object, and the regression pect ratios. The feature map belongs to single scale, and
layer has 4000 output coordinates of k anchor boxes. the filters are of single size. Faster R-CNN is also used for
An anchor is placed at the sliding window and related multi-scale anchors for sharing the information with-
through aspect ratio and a scale. The RPN method per- out any additional cost. Faster R-CNN produces better
forms object detection in a different sort of scales and results on PASCAL VOC 2007, PASCAL VOC 2012, and
aspect ratios. The training of network is achieved by the MS COCO datasets than other networks.
backpropagation algorithm and stochastic gradient de- A novel residual learning network structure called
scent method. The RPN produces better results in PAS- ResNet [15] was invented for learning of networks that
CAL VOC dataset. are significantly deeper than all other networks used
84 Deep Learning and Parallel Computing Environment for Bioengineering Systems
the final convolutional layer to train position-sensitive down and bottom-up approaches. In a top-down archi-
score maps. tecture, predictions are computed at the optimum stage
A novel modularized architecture residual learning with skip network connections. In a bottom-up archi-
network with next dimension (ResNeXt) [17] outper- tecture, a feature pyramid with a prediction is made
forms 101-layer ResNet, 152-layer ResNet, and 200-layer individually at all levels of the network.
ResNet on image classification. The ResNeXt network Instance segmentation is an inspiring task which
is built by iterating a building block that combines a needs accurate detection of the object image and also
group of conversions within a similar topology. The segmentation of each occurrence. Therefore, it merges
ResNeXt results in a regular, multi-branch network that object detection with semantic segmentation. Object
has only a small number of hyper-parameters such as detection classifies and localizes the objects using a
width, filter sizes, strides to initialize. The method uses bounding box regression. The semantic segmentation
a new dimension called cardinality that defines the size classifies each pixel to a set of classes. Mask R-CNN
of transformations in addition to width and depth. The [19] is a simple and general method for object instance
cardinality of the network enhances the accuracy of im- segmentation. It effectively detects the objects and pro-
age classification and performs more efficiently than duces superior segmentation mask for each occurrence.
going with a deeper network. The network design be- The network encompasses Faster R-CNN by including
comes complex when more layers get involved with the an important step for predicting the object mask with
development of hyper-parameters such as width, filter the existing step for bounding box classification. Mask
sizes, strides, etc. The VGG network contains an efficient R-CNN trains the network in a simpler manner and
method of building a deep architecture which loads the improves with a small modification of Faster R-CNN.
blocks of the same shape. The ResNeXt inherits the fea- The mask branch is a small, fully convolutional net-
tures from VGG and loads the functions of the identical work used for every RoI and determines a segmentation
topology. This optimized constraint decreases the se- mask. Faster R-CNN does not perform pixel-to-pixel
lection of hyper-parameters. The inception module in alignment in the network. An RoIAlign layer is proposed
ResNeXt splits the input into low-dimensional embed- for alignment of the network between inputs and out-
dings having 1 × 1 convolutions and converts it to spe- put.
cific filters such as 3 × 3, 5 × 5 that are combined by RetinaNet [20] is a distinct, integrated network made
concatenation. up of a backbone network along with two subnetworks.
Feature pyramids are an essential part of recogni- In RetinaNet, a feature pyramid network (FPN) is used
tion systems for object detection at multiple scales. But as a backbone network which is responsible for cal-
current research work in object detection has avoided culating the feature map in a convolutional layer of a
the feature pyramids due to memory and computation given input data image. The FPN structure is merged
cost. The main objective of feature pyramid networks with adjacent connections and enhanced for construct-
(FPN) [18] is to build the feature pyramids with min- ing high-level feature maps at different scales. The net-
imum cost. The FPN structure is merged with adjacent work accepts any arbitrary-size input of a single-scale
connections and enhanced for constructing high-level image and produces output feature maps at different
feature maps at different scales. The network accepts any levels. The two subnets are used to perform bounding
arbitrary-size input of a single-scale image and produces box classification/regression. RetinaNet is a one-stage
output feature maps at different levels. The FPN network dense detection method for object detection. The clas-
incorporates two different approaches – bottom-up and sification subnet calculates the likelihood of an object
top-down approaches. In a bottom-up approach, feed- present at the spatial location that is used for each of
forward computation of ConvNet computes the feature the anchors and object classes. The classification sub-
map at multiple scales with a factor of 2. The feature net is attached to every FPN level, and their parameters
pyramid defines one pyramid stage for each level. The are shared among all levels of the pyramid. The classifi-
last layer’s output feature map is used to construct the cation subnet design is simple by accepting the input
pyramid. The innermost layer of every phase has robust feature map with a number of channels that applies
features. In a top-down approach, the stronger feature 3 × 3 convolutional layers with filters and uses ReLU ac-
maps are created from higher pyramid levels. These fea- tivation function. The classification subnet is deeper by
tures are developed with the features of a bottom-up using 3 × 3 convolutional layers and does not forward
approach through adjacent connections of the network. the parameter information to the box regression subnet,
Every adjacent connection of the network combines fea- which is attached to the network parallel to the classi-
ture maps of identical spatial size from both the top- fication subnet and terminates at 4 anchors sequential
86 Deep Learning and Parallel Computing Environment for Bioengineering Systems
output per spatial location. RetinaNet uses a focal loss all other activations. The output feature map of the con-
function to address the class imbalance issue in the one- volution is passed through deconvolution. In deconvo-
stage detector. A comparison of CNN methods is shown lution, unpooling is applied; rectification and filtering
in Table 5.1. are used to restructure the input data image. The process
is repeated until input space is reached. In recent times,
deconvolution is used for super-resolution [26], visu-
5.4 CONVOLUTIONAL LAYER alization [6], semantic segmentation [27], recognition
The convolutional layer is made up of numerous con- [28], visual question answering [29], and localization
volution filters for calculating different feature maps. [30].
The feature map of every neuron is connected to the
adjacent neurons in the preceding layer. Such a local 5.4.3 Dilated Convolution
region is denoted as a receptive field of the neuron Dilated convolution [31] is a modern improvement of
in the preceding layer. Initially, the input is convolved the convolutional neural network that acquaints with
with a trained filter to obtain the feature map and com- one new parameter to the convolutional layer. This new
putes the convolved outcome with a nonlinear activa- parameter inserts the value of zero between kernels.
tion function. The filter is shared by spatial information Dilated convolution increases the size of a local recep-
of the input for constructing each feature map. The en- tive field and the network includes more information
tire feature maps are generated by using numerous fil- about the input image. This characteristic is essential
ters. for computation which requires a local receptive field
with large size. In general, 1-D dilated convolution is
5.4.1 Tiled Convolution extended to 2-D dilated convolution, 2-D dilated con-
In convolutional neural networks, weight sharing meth- volution is extended to 3-D dilated convolution, and so
ods can significantly reduce the number of parameters. on. In dilated convolutional layers, the factor of dila-
However, CNNs constrain the network that learns from tion increases rapidly at each layer. The middle feature
different kinds of variance. Tiled CNN [21] is a method map FM2 is created from the bottom level feature map
that learns scale and rotational invariant information FM1 using 1-D dilated convolution; FM3 is created from
using tiles and feature maps. Individual filters can be the feature map FM2 using 2-D dilated convolution;
trained within the layer, and the difficult invariances can FM4 is created from the feature map FM3 using 3-D
be trained from pooling layers. The convolution opera-
dilated convolution, and so on. Dilated convolutions
tion is applied to each k unit, where k is the size of the
have achieved impressive performance for the applica-
tile for controlling the network by sharing their weights.
tions in speech recognition [32], scene segmentation
If the size of the tile is 1, then the feature map has
[31], speech synthesis [33], and machine translation
identical weights and represents the identical features of
[34].
traditional CNN. If the size of the tile is 2, the network
provides better results than traditional CNN [22].
5.4.4 Network-in-Network
5.4.2 Transposed Convolution Network-in-network [7] is an essential deep neural net-
work. In general, a convolutional layer uses a linear filter
Transposed convolution [6], [9], [23], [24] is a compet-
ing model of the convolutional network that converts for producing the feature map, and a nonlinear activa-
features to pixels, instead of mapping pixels to features. tion function for scanning the input data image. In NIN,
Transposed convolution is the backward approach of a a micro-neural network of function approximator is in-
traditional convolution network. Transposed convolu- troduced with multilayer perceptron. The feature maps
tion is also referred to as deconvolution and marginally are found by sliding the micro-networks in multilayer
as strided convolution [25]. The deconvolution per- perceptron and are then forwarded to the next layer of
forms filtering and pooling in reverse order of convolu- the network. The output of feature map from the fi-
tion. The deconvolution is a method of performing un- nal multilayer convolutional layer is forwarded to the
supervised learning. In the ZFNet architecture, deconvo- global average pooling layer, and the resultant vector is
lution is attached to every layer of the convolution net- passed to the softmax classifier.
work. Initially, an input data image is presented to the The nonlinear activation function for the feature
convolution. The features are calculated through convo- map of linear convolution layer is calculated as:
lutional and pooling layers. To evaluate the activation
function of convolution, the value of zero is assigned to l
fp,q,r = max(WrT Ip,q + br , 0), (5.3)
TABLE 5.1
Comparison of CNN methods.
Method Organization Features Benefits Applications
LeNet 5 [2] 5 CONV Layers with 3 FC Structured in six planes 1. Six different types of feature map are Handwritten recognition
Layers extracted Face Recognition
2. Implements weight sharing techniques Online Handwritten
recognition
Machine-printed character
recognition
AlexNet [3] 5 CONV. Layers with 3 FC 60 Million trained parame- 1. Non-saturing neurons are used for faster Image Classification
Layers ters and 650,000 connec- training Object Detection
tions 2. Effective GPU implementation
3. Dropout is for reducing the overfitting
problem.
ZFNet [6] 5 CONV Layers with 3 FC An innovative technique for 1. The operation of convolution layer is Image Classification
Layers well-thoughtful of intermedi- executed with GPU
ary layers and their enhance- 2. ZFNet is a multi-layered deconvolutional
ment networks
3. ZFNet is able to handle large training data
sets
4. Dropout technique is used for reducing the
parameters
NIN [7] 3 mlpconv layers with 1 An innovative method for 1. NIN form micro neural networks to abstract Image Classification
global average pooling layer improving classical discrim- the image patch. Object Detection
inability of local data image 2. Global average pooling method is used for
patches within their local re- reducing the overfitting problem in the
gions network
R-CNN [8] 5 CONV Layers with 1 FC Object recognition using re- 1. Localization of objects with the help of Image Classification
Layer gions R-CNN architecture Object Detection
2. Training a high performance network with
minimum number of interpreted data images
3. Enhances mean average precision
TABLE 5.1 (continued)
Method Organization Features Benefits Applications
FCN [9] 19 convolutional layers FCN takes any size of in- 1. The complications of pre- and Semantic Segmentation
with 3 fully connected lay- put and produces fixed- post-processing are not included in FCN
ers size output with effective 2. FCN performs layer-by-layer computation
training and interpretation using feedforward and backpropagation
algorithm instead of processing images
patch-by-patch
VGG [10] 11 weight layers having 8 Evaluating the growth 1. Enhancement of weight layers from 16 to Image Classification
CONV Layers with 3 FC of network depth using 19
Layers and increases the large-scale images
depth to 19 weight layers
having 16 CONV Layers
with 3 FC Layers
SPP-Net [11] 5 CONV Layers with 3 SPP-Net can produce a 1. SPP-Net is one of the most effective Image Classification
FC Layers including Spa- fixed-size image irrespec- techniques in computer vision Object Detection
tial Pyramid Pooling tive of an image size 2. The SPP-Net calculates feature maps only
once from the given input data image and
applies them to the pooling layer for
producing fixed-size images
3. Multi-level pooling in SPP-Net performs
faster under object deformations
GoogLeNet [12] 21 CONV Layers with 1 Inception Module network 1. Optimization quality of the network is Image Classification
FC Layer for better classification based on the Hebbian principle Object Detection
and detection of images 2. The network increases the width and depth
of the convolutional neural network with lower
cost
Fast R-CNN [13] 13 CONV Layers, 4 max Efficient technique com- 1. Training of the network is single-stage by Object Detection
pooling layers with 1 RoI pared to R-CNN for object means of multi-task loss function
pooling layer and several detection that reaches a 2. Every network layer is updated during
FC layers higher mean average pre- training
cision 3. Better object detection quality via higher
mean average precision than R-CNN and
SPP-Net
4. Disk space is not required for storing the
object proposal features
RPN [14] Classification layer has Object recognition using 1. RPN method performs object detection on Object Detection
2000 scores and regres- regions different scales and for different aspect ratios
sion layer has 4000 out- 2. The training of network is achieved by the
put coordinates backpropagation algorithm and stochastic
gradient descent method
TABLE 5.1 (continued)
Method Organization Features Benefits Applications
Faster R-CNN [14] Merges RPN with Fast R- Object recognition using 1. Both RPN and Fast R-CNN share their Object Detection
CNN regions convolutional features
2. RPN produces region proposals, and Fast
R-CNN detects the object
ResNet [15] 34-layer ResNet Learning of networks 1. Optimization of deep residual network is Image Classification
50-layer ResNet that are significantly easy Object Detection
101-layer ResNet deeper than all other 2. Deep residual network provides better
152-layer ResNet networks used before accuracy from increased depth of network
than previous networks
R-FCN [16] Convolutional Layers Region-based object de- 1. RoI pooling layer aggregates the output Image Classification
with RoI Pooling Layer tection and creates position-sensitive scores for each Object Detection
class
2. Position-sensitive RoI layer performs
discriminatory pooling and combines
responses from one out of all score maps
ResNeXt [17] VGG/ResNet method of ResNeXt network is built 1. Increasing cardinality of the network Image Classification
repeating layers with car- by iterating a building enhances the accuracy of image classification Object Detection
dinality 32 block that combines a and is more efficient than going with a deeper
group of conversions of network
similar topology 2. ResNeXt inherits the features from VGG
FPN [18] Feature pyramid recog- FPN structure merged 1. Builds the feature pyramids with minimum Object Detection
nition system with adjacent connec- cost
tions and enhanced for 2. The network accepts any arbitrary-size
constructing high-level input and produces feature maps at different
feature maps at different levels
scales.
Mask R-CNN [19] Inherits Faster R-CNN Effectively detects the 1. Simple and general method for object Object Detection
with RoI Align Layer objects and also pro- instance segmentation Semantic Segmentation
duces superior segmen- 2. RoIAlign layer is used for alignment of the
tation mask for each oc- network between inputs and output
currence
RetinaNet [20] Full connected network RetinaNet is a one-stage 1. RetinaNet uses focal loss function to Object Detection
made up of ResNet-FPN dense detection meth- address the class imbalance issue in
backbone ods for object detection one-stage detector
2. RetinaNet uses ReLU activation function
90 Deep Learning and Parallel Computing Environment for Bioengineering Systems
l
where fp,q,r is the nonlinear activation score of the rth 5.5.2 Mixed Pooling
l
feature map, Ip,q is the input patch, WrT is the weight Mixed pooling [37] is an efficient methodology con-
vector, and br is the bias term. structed by a combination of average and max pooling.
The nonlinear activation function for the feature The mixed pooling can be computed as:
map of multilayer perceptron convolution layer is cal-
1
culated as: Op,q,r = θ max fu,v,r + (1 − θ) fu,v,r ,
(u,v)∈Rpq Rpq
n (u,v)∈Rpq
fp,q,rn = max(Wr nT fp,q:
n − 1 + b n, 0)
k (5.4)
(5.6)
where n is the number of the layer in the multilayer
perceptron. The global average pooling layer has few where θ is a random value of selecting either max pool-
hyper-parameters, reducing the problem of overfitting ing (1) or average pooling (0). The value is stored in the
and decreasing the computational cost. forward propagation network and utilized for backward
propagation. The final result [37] shows that mixed
5.4.5 Inception Module pooling performs better than the other methods.
The inception module is familiarized in the network
GoogLeNet [12]. The network uses variable size filters 5.5.3 Stochastic Pooling
for constructing the feature maps and estimates the op- Stochastic pooling [38] is an essential method which
timal sparse construction by the inception module. The selects the activations based on a multinomial distribu-
inception module consists of three different convolu- tion and makes sure that activations of non-maximal
tion operations and only one pooling layer operation. feature maps are also used. The stochastic pooling calcu-
The operation of convolution places 1 × 1 convolution lates the activation function within the region. It selects
before 3 × 3 and 5 × 5 convolutions for the growth of a location within the region to set the pooling activa-
depth and width of CNN without any additional cost. tion function. The problem of overfitting can be reduced
The number of network hyper-parameters is decreased more in stochastic pooling than in other methods.
to 5 million, which is less when compared to ZFNet (75
million) and AlexNet (60 million). 5.5.4 Spectral Pooling
Spectral pooling [39] is useful for reducing the dimen-
sionality of the input image. The given input data im-
5.5 POOLING LAYER age is passed to a network. The spectral pooling calcu-
Pooling is an essential perception of CNN. It reduces the lates the input feature map by using the discrete Fourier
computational complexity by decreasing the number of transform (DFT) method, and reduces only the required
network connections between convolutional layers. In image. The spectral pooling finally uses inverse DFT for
this section, we bring together the details about latest mapping the feature map to the spatial domain. The
pooling approaches used in CNNs. operation of spectral pooling uses sequential low-pass
filtering which performs better than max pooling. The
5.5.1 Lp Pooling spectral pooling process of attaining matrix truncation
Lp pooling is a naturally stimulated pooling approach costs less in CNNs.
demonstrated on composite cells [35]. The analysis of
Lp pooling approach [36] offers improved generaliza- 5.5.5 Spatial Pyramid Pooling
tion compared to max pooling. K. He et al. proposed a spatial pyramid pooling (SPP)
The Lp pooling can be computed as: [11] to perform fixed-size presentation irrespective of
1/ l input size. The SPP pooling layer computes the input
Op,q,r = l
fu,v,r , (5.5) feature map in its local region and outperforms a num-
(u,v)∈Rpq
ber of bins. The spatial pyramid pooling computation is
better than sliding window pooling. If the final pooling
where Op,q,r is the output vector of a pooling opera- layer is replaced with spatial pyramid pooling, then it is
tion in the location (p, q) with r denoting the feature able to handle variable size input images.
map; fu,v,r is the value of the feature and (u, v) denotes
the location; Rpq is the region of object proposal. The 5.5.6 Multi-Scale Orderless Pooling
average pooling performs computation if p = 1. Max Multi-scale orderless pooling method [40] is used to
pooling method is used when p = ∞. improve the performance of various CNN methods.
CHAPTER 5 Medical Image Analysis With Deep Neural Networks 91
The method computes activation features for the en- The PReLU function is determined as:
tire image and also for image patches in multi-scales.
The multi-scale orderless pooling method acquires the fp,q,r = max(Ip,q,r , 0) + θr min(Ip,q,r , 0) (5.9)
global spatial information. The activation functions
of local image patches are combined by an encoding
where θr is the trained argument of the rth chan-
method called VLAD [41], which acquires local and
nel. There is no additional complexity of overfitting in
fine-grained features of the image. The novel image pre-
PReLU and also no need for an additional cost. It can
sentation is attained by combining the features of global
also learn in parallel with other arguments by the back-
information and VLAD characteristics of local patch in-
propagation algorithm.
formation.
5.6.4 Randomized ReLU
5.6 ACTIVATION FUNCTION Randomized leaky rectified linear unit (RReLU) [45] is
an extension of Leaky ReLU. In this activation function,
An appropriate activation function considerably in-
the arguments of the negative section are randomly se-
creases the presentation of a CNN for a definite process.
lected during training and then kept constant in testing.
In this section, we familiarize the newly used activation
The RReLU function is determined as:
functions in CNNs.
(i) (i) (i) (i)
5.6.1 Rectified Linear Unit (ReLU) fp,q,r = max(Ip,q,r , 0) + θr min(Ip,q,r , 0), (5.10)
One of the efficient activation functions is the rectified
linear unit (ReLU) [42]. The activation function is de- (i)
where Ip,q,r is defined as the input at the location (p, q)
termined as: (i)
on the rth kernel of the ith sample; θr is the sample
(i)
fp,q,r = max(Ip,q,r , 0), (5.7) parameter and fp,q,r is the output. The RReLU function
reduces the problem of overfitting because of random
where Ip,q,r is the input vector present at location (p, q) selection of the parameter.
on the rth kernel. The activation function ReLU always
keeps a positive section and reduces the negative section 5.6.5 Exponential Linear Unit (ELU)
to zero. The operation of ReLU such as max(·) is more An efficient activation function was proposed by
robust than tanh or sigmoid functions. ReLU converts D.A. Clevert et al. [46] and called exponential linear
the network to get sparse depictions. unit (ELU), which performs robust training of deep
networks and leads to greater classification precision.
5.6.2 Leaky ReLU When compared to other activation functions, ELU in-
A latent drawback of ReLU function is that a negative troduces a saturation function for handling the negative
section is set zero, which makes the unit inactive. This section. If the unit is deactivated, the activation function
functionality results in the unit never becoming active is decreased, which makes ELU perform faster when
and modifying its weights. noise is present.
Leaky rectified linear unit (LReLU) [43] function is The ELU function is determined as:
determined as:
fp,q,r = max(Ip,q,r , 0) + min(θ (eIp,q,r − 1), 0), (5.11)
fp,q,r = max(Ip,q,r , 0) + θ min(Ip,q,r , 0), (5.8)
where θ is a predetermined argument in the range either where θ is a predetermined parameter, which is con-
close to 0 or 1. LReLU compresses the negative section trolled by the ELU saturating function for a negative
instead of initializing it to zero. section.
The maxout function is determined as: al. [51] uses three-dimensional convolutions to classify
the Alzheimer disease. J. Kawahara et al. [52] proposed
fp,q,r = max Ip,q,r , (5.12) a CNN-like architecture used for predicting the devel-
r∈[1,r]
opment of the brain. In image classification, CNNs are
where fp,q,r is the output of activation function. the recent state-of-the-art methods. The CNNs learned
about natural images, showing strong performance and
5.6.7 Probout encountering the accuracy of human expert systems. Fi-
J.T. Springenber et al. [48] proposed a variant of maxout nally, these statements conclude that CNNs can be im-
called probout. The operation of maxout is replaced by proved to control the essential architecture of medical
a probabilistic sampling method. It determines the like- images [53].
lihood of each z linear units as:
5.7.2 Object Classification
z In general, object classification takes a trivial portion of
Px = eθox / eθox , (5.13) the medical image as input and produces two or more
y=1 categories for classification. The local and global infor-
mation is required for better classification. W. Shen et
where θ is an essential parameter of the distribution. If al. [54] proposed a method that uses CNNs which ac-
multinomial distribution is considered for P1 , . . . , Pz , cept multi-scale inputs for classification. J. Kawahara
one of the z units is used to initialize the value of the et al. [55] proposed a multi-stream CNN method for
activation. detecting skin lesions and having each stream com-
The probabilities are redefined as: pute with the help of various resolutions of the im-
age. X. Gao et al. [56] proposed a method for merg-
z
P x = eθox /(2
0 = 0.5, P eθox ). (5.14) ing CNNs and RNNs for classification of images and
y=1 containing pre-trained CNN filters. This combination
performs processing of related information irrespec-
The activation function is determined as: tive of the image size. A. Setio et al. [57] proposed
an efficient network called multi-stream CNN for clas-
0 if x = 0 sification of chest CT into either a nodule or a non-
fx = f (x) = (5.15)
Ox else nodule. The candidates having nine different patches
are extracted in the individual stream and finally for-
where x is a draw from a multinomial distribution. warded to the classification layer. D. Nie et al. [58] pro-
Probout achieves a balance in defining properties for posed a network that accepts three-dimensional MRI
maxout units and also enhances their properties. The images and trains the three-dimensional convolutional
computational cost of probout is much greater than for neural networks to assess if a patient has high-grade
maxout, since it performs extra probability computa- gliomas.
tions.
5.7.3 Region, Organ, and Landmark
Localization
5.7 APPLICATIONS OF CNN IN MEDICAL Organ and landmark localization is an essential step
IMAGE ANALYSIS in segmentation or therapy planning. In medical im-
5.7.1 Image Classification age analysis, localization needs three-dimensional vol-
Image classification is the primary domain, in which umes of parsing. The deep neural network indulges
deep neural networks play the most important role of the three-dimensional space as an alignment of two-
medical image analysis. The image classification accepts dimensional orthogonal planes. D. Yang et al. [59] pro-
the given input images and produces output classifi- posed a regular CNN for processing individual sets of
cation for identifying whether the disease is present two-dimensional MRI slices. The landmark of the three-
or not. E. Kim et al. [49] proposed a CNN method dimensional position was determined as a combination
which outperforms perfect image classification accu- of three two-dimensional slices that produced better
racy in cytopathology. Inception v3 architecture [50] classification results. B. De Vos et al. [60] proposed a
is one of the best methods for medical data analysis method that selects the region of interest in anatomical
and has accomplished proficient human performance. regions such as descending aorta, heart, and aortic arch
The CNN architecture proposed by E. Hosseini-AsL et by recognizing a bounding box classification. C. Payer et
CHAPTER 5 Medical Image Analysis With Deep Neural Networks 93
al. [61] proposed a method for predicting the locations pling. The network merges these two layers between
directly. The CNN directly degenerates the locations of deconvolution and convolution operation. The given
the landmarks. Every landmark is presented by using a input image is processed by the network in the forward
Gaussian function, and the network is learned directly pass and results in segmentation. O. Cicek et al. [69]
from the landmark map. Only a few CNNs are able proposed an extension of U-net for performing a full
to address the problem of landmark localization, and three-dimensional segmentation. F. Milletari et al. [70]
regions have three-dimensional image space. Y. Zheng proposed a different three-dimensional U-net structure
et al. [62] proposed a method that reduces the time referred to as V-net. The network computes image seg-
complexity by dividing three-dimensional convolutions mentation using three-dimensional convolutional lay-
into three one-dimensional convolutions for detecting ers with a dice coefficient function. R. Korez et al. [71]
carotid artery bifurcation in the CT data. B. Kong et al. proposed a three-dimensional fCNN architecture used
[63] proposed a method for detection of end-systole for segmentation of MR images. R. Moeskops et al. [72]
and end-diastole frames from the heart using a combi- proposed an fCNN method to perform segmenta-
nation of techniques such as LSTM-RNN with a CNN. tion of brain MRI, coronary blood vessels in car-
The CNN is one of the essential techniques for local- diac CT angiography, and pectoral strength in breast
ization of regions, landmarks, and organs using two- MRI.
dimensional classification of images.
5.7.6 Lesion Segmentation
5.7.4 Object or Lesion Detection In the application of deep learning methods, lesion seg-
The detection of objects or lesions in a given input im- mentation merges the tasks of substructures segmenta-
age is an essential part of diagnosis, which is time con- tion, organ segmentation, and object detection. K. Kam-
suming for clinicians. It is the process of identifying nitsas et al. [73] and M. Ghafoorian et al. [74] pro-
and localizing the small lesions in a given input im- posed a method to perform precise segmentation with
age. The research in this area is used to detect lesions the help of local and global information using multi-
automatically, and enhances the accuracy of detection stream networks. The U-net architecture also uses lo-
or reducing the time of human experts. The first object cal and global information for lesion segmentation.
detection method using convolutional neural networks Brosch et al. [75] proposed a method that uses three-
was introduced in 1995. The CNN used different lay- dimensional convolutions and skips the network con-
ers for detecting nodules in X-ray images [64]. Most nection between the first and final layers for segmenta-
of the research work in deep learning for object detec- tion of lesions in brain MRI.
tion is performed by CNNs. The CNNs process pixel
classification and perform post-processing to get can-
didates of the object. The three-dimensional informa- 5.8 DISCUSSION
tion in object detection is processed using multi-stream
CNNs have accomplished noble performance in a col-
CNNs [65]. A. Teramoto et al. [66] proposed a multi-
lection of areas such as brain, retinal, chest X-ray, chest
stream CNN methodology to combine CT data along
with positron emission tomography data. Q. Dou et CT, breast, cardiac, abdominal and musculoskeletal im-
al. [67] proposed a novel three-dimensional CNN tech- age analysis. Thus, convolutional neural networks have
nique, which was used for the discovery of micro-bleeds almost limitless applications.
in brain MRI. The deep neural networks applied to medical image
analysis have many challenges:
5.7.5 Organ and Substructure Segmentation (1) Complexity in training large datasets;
In medical image analysis, such as brain or cardiac, (2) The picture archiving and communication systems
organ and substructure segmentation permits investi- (PACSs) are not useful in medicine;
gation of quantifiable parameters associated with shape (3) Attainment of significant classification for the im-
and volume. It is an essential part of the computer-aided ages;
detection of the object. The processing of segmentation (4) The time complexity of labeling large datasets is
allows recognizing the group of voxels which produces higher;
either the interior or the contour of the objects. The (5) The class imbalance is one of the essential chal-
most distinguished CNN architecture in medical image lenges in image classification;
analysis is U-net [68], which consists of two differ- (6) Providing entire image to the network is not a fea-
ent feature layers, namely downsampling and upsam- sible solution due to memory restrictions.
94 Deep Learning and Parallel Computing Environment for Bioengineering Systems
5.9 CONCLUSIONS 11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Spa-
tial pyramid pooling in deep convolutional networks for
This chapter reviewed the different deep learning con-
visual recognition, in: Computer Vision – ECCV 2014,
volution neural network methods. The features, bene-
vol. 8691, 2014, pp. 1–14.
fits, and applications of convolutional neural network 12. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet,
methods were also discussed. Deep learning systems are Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
beneficial in the exploration of disorder classification; Vanhoucke, Andrew Rabinovich, Going deeper with con-
tissue, anatomy, lesion and tumor segmentation; lesion volutions, in: IEEE Conference on Computer Vision and
and tumor detection and classification; survival and dis- Pattern Recognition – CVPR 2015, vol. 1, 2015, pp. 1–9.
ease activity prediction; as well as image construction 13. Ross B. Girshick, Fast R-CNN, in: Proceedings of the Inter-
and enhancement. We have also presented the future re- national Conference on Computer Vision – ICCV, vol. 1,
search challenges of deep neural networks. 2015, pp. 1–9.
14. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun,
Author contributions: Both authors contributed
Faster R-CNN: towards real-time object detection with re-
equally. gion proposal networks, in: Computer Vision and Pattern
Recognition, vol. 1, 2016, pp. 1–14, arXiv:1506.01497v3.
15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep
REFERENCES residual learning for image recognition, in: IEEE Confer-
1. L. Deng, A tutorial survey of architectures, algorithms, and ence on Computer Vision and Pattern Recognition – CVPR,
applications for deep learning, APSIPA Transactions on vol. 1, 2016, pp. 770–778.
Signal and Information Processing 3 (2014) 1–29, https:// 16. Jifeng Dai, Yi Li, Kaiming He, Jian Sun, R-FCN: object
doi.org/10.1017/ATSIP.2014.4. detection via region-based fully convolutional networks,
2. Y. LeCun, L. Bottu, Y. Bengio, Gradient-based learning Advances in Neural Information Processing Systems 29
applied to document recognition, Proceedings of the (2016) 379–387, arXiv:1605.06409v2.
IEEE 86 (11) (1998) 2278–2324, https://doi.org/10.1109/ 17. Saining Xe, Ross Girshick, Piotr Dollar, Zhuowen Tu,
5.726791. Kaiming He, Aggregated residual transformations for deep
3. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Im- neural networks, in: IEEE Conference on Computer Vi-
agenet classification with deep convolutional neural net- sion and Pattern Recognition – CVPR 2017, vol. 1, 2017,
works, in: NIPS’12 Proceedings of the 25th International pp. 5987–5995, arXiv:1611.05431v2.
Conference on Neural Information Processing System, 18. Tsung-Yi Lin, Piotr Dollar, Ross B. Girshick, Kaiming He,
vol. 1, 2012, pp. 1097–1105, arXiv:1311.2901. Bharath Hariharan, Serge J. Belongie, Feature pyramid net-
4. K. Balaji, K. Lavanya, Recent trends in deep learning with works for object detection, in: IEEE Conference on Com-
applications, in: Cognitive Computing for Big Data Sys- puter Vision and Pattern Recognition – CVPR 2017, vol. 1,
tems Over IoT, vol. 14, Springer International Publishing 2017, pp. 936–944.
AG, 2018, pp. 201–222. 19. Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Gir-
5. Y. Lan Boureau, Jean Ponce, Yann Lecun, A theoretical anal- shick, Mask R-CNN, in: IEEE Conference on Computer
ysis of feature pooling in visual recognition, in: ICML 2010 Vision and Pattern Recognition – CVPR 2018, vol. 1, 2018,
– Proceedings, 27th International Conference on Machine pp. 1–12, arXiv:1703.06870v3.
Learning, vol. 19(1–8), 2010, pp. 111–118. 20. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Pi-
6. Matthew D. Zeiler, Rob Fergus, Visualizing and under- otr Dollar, Focal loss for dense object detection, in: IEEE
standing convolutional networks, CoRR abs/1311.2901 Conference on Computer Vision and Pattern Recognition
(2013) 1–11, arXiv:1311.2901. – CVPR 2018, vol. 1, 2018, pp. 1–10, arXiv:1708.02002v2.
7. Min Lin, Qiang Chen, Shuicheng Yan, Network in network, 21. Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh,
in: ICLR – 2014, vol. 3, 2014, pp. 1–10, arXiv:1312.4400v3. Quoc V. Le, Andrew Y. Ng, Tiled convolutional neural net-
8. Ross B. Girshick, Jeff Donahue, Trevor Darrell, Jitendra works, Advances in Neural Information Processing Systems
Malik, Rich feature hierarchies for accurate object detec- 1 (2010) 1279–1287, arXiv:1311.2901.
tion and semantic segmentation, in: Proceedings of the 22. Yi Zheng, Qi Liu, Enhong Chen, Yong Ge, J. Leon Zhao,
IEEE Conference on Computer Vision and Pattern Recog- Time series classification using multi-channels deep con-
nition (CVPR), U. C. Berkeley, Berkeley, 2014, pp. 1–21, volutional neural networks, in: Web-Age Information Man-
arXiv:1311.2524v1. agement – WAIM 2014, vol. 8485, 2014, pp. 298–310.
9. Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully con- 23. Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, Rob
volutional networks for semantic segmentation, in: Com- Fergus, Deconvolutional networks, in: IEEE Computer So-
puter Vision and Pattern Recognition – 2015, abs/1411. ciety Conference on Computer Vision and Pattern Recog-
4038, 2015, pp. 1–10, arXiv:1411.4038v2. nition – CVPR 2010, vol. 1, 2010, pp. 2528–2535.
10. Karen Simonyan, Andrew Zisserman, Very deep convo- 24. Matthew D. Zeiler, Graham W. Taylor, Rob Fergus, Adaptive
lutional networks for large-scale image recognition, in: deconvolutional networks for mid and high level feature
Computer Vision and Pattern Recognition – 2015, 2015, learning, in: International Conference on Computer Vi-
pp. 1–15, arXiv:1409.1556v6. sion, vol. 1, 2011, pp. 2018–2025.
CHAPTER 5 Medical Image Analysis With Deep Neural Networks 95
25. Francesco Visin, Marco Ciccone, Adriana Romero, Kyle 37. Dingjun Yu, Hanli Wang, Peiqiu Chen, Zhihua Wei, Mixed
Kastner, Kyunghyun Cho, Yoshua Bengio, Matteo Mat- pooling for convolutional neural networks, in: Interna-
teucci, Aaron Courville, Reseg: a recurrent neural network- tional Conference on Rough Sets and Knowledge Technol-
based model for semantic segmentation, in: Computer Vi- ogy – RSKT 2014, vol. 8818, 2014, pp. 364–375.
sion and Pattern Recognition – 2016, vol. 1, 2016, pp. 1–8, 38. Matthew D. Zeiler, Rob Fergus, Stochastic pooling for regu-
arXiv:1511.07053v3. larization of deep convolutional neural networks, in: Pro-
26. Chao Dong, Chen Change Loy, Kaiming He, Xiaoou ceedings of the International Conference on Learning Rep-
Tang, Image super-resolution using deep convolutional resentations – ICLR 2013, vol. 1, 2013, pp. 1–9, arXiv:
networks, IEEE Transactions on Pattern Analysis and Ma- 1301.3557v1.
39. Oren Rippel, Jasper Snoek, Ryan P. Adams, Spectral
chine Intelligence 38 (2) (2016) 295–307, https://doi.org/
representation for convolutional neural networks, Ad-
10.1109/TPAMI.2015.2439281.
vances in Neural Information Processing Systems 2 (2015)
27. Hyeonwoo Noh, Seunghoon Hong, Bohyung Han,
2449–2457, arXiv:1506.03767.
Gradient-based learning applied to document recognition, 40. Yunchao Gong, Liwei Wang, Ruiqi Guo, Svetlana Lazeb-
in: IEEE Conference on Computer Vision and Pattern nik, Multi-scale orderless pooling of deep convolutional
Recognition – CVPR – 2015, vol. 1, 2015, pp. 1520–1528, activation features, in: Computer Vision and Pattern Recog-
arXiv:1505.04366v1. nition – CVPR 2014, vol. 1, 2014, pp. 392–407, arXiv:
28. Yuting Zhang, Kibok Lee, Honglak Lee, Augmenting su- 1403.1840v3.
pervised neural networks with unsupervised objectives for 41. H. Jeqou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, C.
large-scale image classification, in: International Confer- Schmid, Aggregating local image descriptors into compact
ence on Machine Learning – ICML 2016, vol. 48, 2016, codes, IEEE Transactions on Pattern Analysis and Machine
pp. 612–621, arXiv:1606.06582v1. Intelligence 34 (9) (2012) 1704–1716, https://doi.org/10.
29. Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi 1109/TPAMI.2011.235.
Parikh, Dhruv Batra, Human attention in visual question 42. V. Nair, G.E. Hinton, Rectified linear units improve re-
answering: Do humans and deep networks look at the stricted Boltzmann machines, in: International Conference
same regions?, in: Proceedings on the Conference of Em- on Machine Learning – ICML’10, vol. 1, 2010, pp. 807–814.
pirical Methods in Natural Language Processing – EMNLP 43. Andrew L. Mass, Awni Y. Hannum, Andrew Y. Ng, Rectifier
2016, vol. 1, 2016, pp. 932–937, arXiv:1606.03556v2. nonlinearities improve neural network acoustic models, in:
Proceedings of the International Conference on Machine
30. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva,
Learning, vol. 28, 2013, pp. 1–6.
Antonio Torralba, Learning deep features for discrimina-
44. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun,
tive localization, in: IEEE Conference on Computer Vi- Delving deep into rectifiers: surpassing human-level per-
sion and Pattern Recognition – CVPR 2015, vol. 1, 2015, formance on ImageNet classification, in: Proceedings of
pp. 2921–2929, arXiv:1512.04150v1. the International Conference on Computer Vision – ICCV
31. Fisher Yu, Vladlen Koltun, Multi-scale context aggregation 2015, vol. 1, 2015, pp. 1026–1034, arXiv:1502.01852v1.
by dilated convolutions, Proceedings of the IEEE 1 (2016) 45. Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li, Empirical eval-
1–13, arXiv:1511.07122v3. uation of rectified activations in convolutional network,
32. Tom Sercu, Vaibhava Goel, Dense prediction on sequences in: Proceedings of the International Conference on Ma-
with time-dilated convolutions for speech recognition, in: chine Learning – ICML 2015, vol. 1, 2015, pp. 1–5, arXiv:
Proceedings of the Advances in Neural Information Pro- 1505.00853v2.
cessing System, vol. 1, 2016, pp. 1–5, arXiv:1611.09288v2. 46. Djork-Arne Clevert, Thomas Unterthiner, Sepp Hochreiter,
33. Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Fast and accurate deep network learning by exponential
Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, linear units (ELUs), in: Proceedings of the International
Andrew Senior, Koray Kavukcuoglu, Wavenet: a generative Conference on Learning Representations – ICLR 2016,
model for raw audio, Proceedings of the IEEE 1 (2016) vol. 1, 2016, pp. 1–14, arXiv:1511.07289v5.
1–5, arXiv:1609.03499. 47. Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza,
Aaron Courville, Yoshua Bengio, Maxout networks, in:
34. Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron
Proceedings of the International Conference on Machine
van den Oord, Alex Graves, Koray Kavukcuoglu, Neural
Learning – ICML 2013, vol. 28(3), 2013, pp. 1319–1327,
machine translation in linear time, Computation and Lan-
arXiv:1302.4389v4.
guage 1 (2017) 1–9, arXiv:1610.10099v2. 48. Jost Tobias Springenberg, Martin Riedmiller, Improving
35. Aapo Hyvarinen, Urs Koster, Complex cell pooling and the deep neural networks with probabilistic maxout units, in:
statistics of natural images, Network Computation in Neu- Proceedings of the International Conference on Machine
ral Systems 18 (2) (2009) 81–100, https://doi.org/10.1080/ Learning – ICML 2014, vol. 1, 2014, pp. 1–10, arXiv:1312.
09548980701418942. 6116v2.
36. Joan Bruna, Arthur Szlam, Yann LeCun, Signal recovery 49. Edward Kim, Miquel Corte-Real, Zubair Baloch, A deep se-
from pooling representations, in: Proceedings of the Inter- mantic mobile application for thyroid cytopathology, Pro-
national Conference on Machine Learning – ICML 2014, ceedings of the SPIE 9789 (2016) 1–10, https://doi.org/10.
vol. 1, 2014, pp. 307–315, arXiv:1311.4025v3. 1117/12.2216468.
96 Deep Learning and Parallel Computing Environment for Bioengineering Systems
50. A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. 61. C. Payer, D. Stern, H. Bischof, M. Urschler, Regressing
Blau, S. Thrun, Dermatologist-level classification of skin heatmaps for multiple landmark localization using CNNs,
cancer with deep neural networks, Nature 7639 (2017) in: Medical Image Computing and Computer-Assisted In-
115–118, https://doi.org/10.1038/nature21056. tervention – MICCAI 2016, vol. 9901, 2016, pp. 230–238.
51. Ehsan Hosseini-Asl, Georgy Gimel’farb, Ayman El-Baz, 62. Yefeng Zheng, David Liu, Bogdan Georgescu, Hien
Alzheimer’s disease diagnostics by a deeply supervised Nguyen, Dorin Comaniciu, 3D deep learning for effi-
adaptable 3D convolutional network, in: Proceedings of cient and robust landmark detection in volumetric data,
the International Conference on Machine Learning – ICML in: International Conference on Medical Image Comput-
2016, vol. 1, 2016, pp. 1–12, arXiv:1607.00556v1. ing and Computer-Assisted Intervention – MICCAI 2015,
52. J. Kawahara, C.J. Brown, S.P. Miller, B.G. Booth, V. Chau, vol. 9349, 2015, pp. 565–572.
R.E. Grunau, J.G. Zwicker, Hamarneh G. Brainnetcnn, 63. B. Kong, Y. Zhan, M. Shin, T. Denny, S. Zhang, Recognizing
end-diastole and end-systole frames via deep temporal re-
Convolutional neural networks for brain networks; to-
gression network, in: International Conference on Medical
wards predicting neurodevelopment, NeuroImage 146
Image Computing and Computer-Assisted Intervention –
(2017) 1038–1049, https://doi.org/10.1016/j.neuroimage.
MICCAI 2016, vol. 9902, 2016, pp. 264–272.
2016.09.046.
64. S.-C.B. Lo, S.-L.A. Lou, Jyh-Shyan Lin, M.T. Freedman,
53. Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, Nooshin
M.V. Chien, S.K. Mun, Artificial convolution neural net-
Ghavami, Ester Bonmati, Guotai Wang, Steven Bandula,
work techniques and applications for lung nodule detec-
Caroline M. Moore, Mark Emberton, Sébastien Ourselin, J. tion, IEEE Transactions on Medical Imaging 14 (4) (1995)
Alison Noble, Dean C. Barratt, Tom Vercauteren, Weakly- 711–718, https://doi.org/10.1109/42.476112.
supervised convolutional neural networks for multimodal 65. Adrian Barbu, Le Lu, Holger Roth, Ari Seff, Ronald M. Sum-
image registration, in: Computer Vision and Pattern Recog- mers, An analysis of robust cost functions for CNN in
nition – CVPR 2018, vol. 1, 2018, pp. 1–19. computer-aided diagnosis, Computer Methods in Biome-
54. W. Shen, M. Zhou, F. Yang, C. Yang, J. Tian, Multi-scale chanics and Biomedical Engineering: Imaging & Visual-
convolutional neural networks for lung nodule classifica- ization 6 (11) (2018) 253–258, https://doi.org/10.1080/
tion, in: Proceedings of the Medical Imaging, vol. 24, 2015, 21681163.2016.1138240.
pp. 588–599. 66. A. Teramoto, H. Fujita, O. Yamamuro, T. Tamaki, Auto-
55. Jeremy Kawahara, Ghassan Hamarneh, Multi-resolution- mated detection of pulmonary nodules in PET/CT images:
tract CNN with hybrid pretrained and skin-lesion trained ensemble false-positive reduction using a convolutional
layers, in: International Workshop on Machine Learning in neural network technique, Medical Physics 43 (6) (2016)
Medical Imaging, vol. 1, 2016, pp. 164–171. 2821–2827, https://doi.org/10.1118/1.4948498.
56. X. Gao, S. Lin, T.Y. Wong, Automatic feature learning 67. Qi Dou, Hao Chen, Lequan Yu, Lei Zhao, Jing Qin, Defeng
to grade nuclear cataracts based on deep learning, IEEE Wang, Vincent C.T. Mok, Lin Shi, Pheng-Ann Heng, Auto-
Transactions on Biomedical Engineering 62 (11) (2015) matic detection of cerebral microbleeds from MR images
2693–2701, https://doi.org/10.1109/TBME.2015.2444389. via 3D convolutional neural networks, IEEE Transactions
57. A.A. Setio, F. Ciompi, G. Litjens, P. Gerke, C. Jacobs, S.J. on Medical Imaging 35 (5) (2016) 1182–1195, https://
Van Riel, M.M. Wille, M. Nagibullah, C.I. Sanchez, B. doi.org/10.1109/TMI.2016.2528129.
Van Ginneken, Pulmonary nodule detection in CT im- 68. Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-net:
ages: false positive reduction using multi-view convo- Convolutional networks for biomedical image segmenta-
lutional networks, IEEE Transactions on Medical Imag- tion, in: International Conference on Medical Image Com-
puting and Computer-Assisted Intervention – MICCAI
ing 35 (5) (2016) 1160–1169, https://doi.org/10.1109/TMI.
2015, vol. 9351, 2015, pp. 234–241, arXiv:1505.04597v1,
2016.2536809.
2015.
58. D. Nie, H. Zhang, E. Adeli, L. Liu, D. Shen, 3D deep learn-
69. O. Cicek, A. Abdulkadir, S.S. Lienkamp, T. Brox, O. Ron-
ing for multi-modal imaging-guided survival time predic-
neberger, 3D U-net: learning dense volumetric segmenta-
tion of brain tumor patients, in: Medical Image Comput-
tion from sparse annotation, in: International Conference
ing and Computer-Assisted Intervention, vol. 9901, 2016,
on Medical Image Computing and Computer-Assisted In-
pp. 212–220. tervention – MICCAI 2016, vol. 9901, 2016, pp. 424–432.
59. Dong Yang, Shaoting Zhang, Zhennan Yan, Chaowei Tan, 70. Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi V-
Kang Li, Dimitris Metaxas, Automated anatomical land- net, Fully convolutional neural networks for volumetric
mark detection on distal femur surface using convolu- medical image segmentation, in: Computer Vision and
tional neural network, in: International Symposium on Pattern Recognition – CVPR 2016, vol. 1, 2016, pp. 1–11,
Biomedical Imaging – 2015, vol. 1, 2015, pp. 17–21. arXiv:1606.04797v1.
60. Bob D. de Vos, Jelmer M. Wolterink, Pim A. de Jong, Max 71. R. Korez, B. Likar, F. Pernus, T. Vrtovec, Model-based seg-
A. Viergever, Ivana Isgum, 2D image classification for 3D mentation of vertebral bodies from mr images with 3D
anatomy localization: employing deep convolutional neu- CNNs, in: International Conference on Medical Image
ral networks, in: Proceedings of the SPIE Medical Imaging, Computing and Computer-Assisted Intervention – MIC-
vol. 9784, 2016, pp. 1–10. CAI 2016, vol. 9901, 2016, pp. 433–441.
CHAPTER 5 Medical Image Analysis With Deep Neural Networks 97
72. Pim Moeskops, Jelmer M. Wolterink, Bas H.M. van 74. M. Ghafoorian, N. Karssemeijer, T. Heskes, I.W.M. Van
der Velden, Kenneth G.A. Gilhuijs, Tim Leiner, Max A. Uder, F.E. de Leeuw, E. Marchiori, B. van Ginneken, B. Pla-
Viergever, Deep learning for multi-task medical image seg- tel, Non-uniform patch sampling with deep convolutional
mentation in multiple modalities, in: International Con- neural networks for white matter hyperintensity segmenta-
ference on Medical Image Computing and Computer- tion, in: International Symposium on Biomedical Imaging,
Assisted Intervention – MICCAI 2016, vol. 1, 2016, vol. 1, 2016, pp. 1–10.
pp. 1–9. 75. Tom Brosch, Lisa Y.W. Tang, Youngjin Yoo, David K.B. Li,
73. Konstantinos Kamnitsas, Christian Ledig, Virginia F.J. New- Anthony Traboulsee, Roger Tam, Deep 3D convolutional
combe, Joanna P. Simpson, Andrew D. Kane, David K. encoder networks with shortcuts for multiscale feature in-
Menon, Daniel Rueckert, Ben Glocker, Efficient multi-scale tegration applied to multiple sclerosis lesion segmenta-
3D CNN with fully connected CRF for accurate brain tion, IEEE Transactions on Medical Imaging 35 (5) (2016)
lesion segmentation, in: Computer Vision and Pattern 1229–1239, https://doi.org/10.1109/TMI.2016.2528821.
Recognition – CVPR 2017, vol. 36, 2017, pp. 61–78.
CHAPTER 6
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00013-0 99
Copyright © 2019 Elsevier Inc. All rights reserved.
100 Deep Learning and Parallel Computing Environment for Bioengineering Systems
oped model, GPU technology has been utilized while the years that followed, the vast majority of those who
training. participated in the competition used CNNs. All of the
entries have been developed using CNNs. This contest
proved the success of the CNN in image classification,
6.2 IMAGE CLASSIFICATION which spoke much of its name. CNNs have become
Image processing is an improved example for digitizing powerful models in feature learning and image classi-
a scene and performing some operations, or a method fication. Researchers who have always aimed to achieve
for extracting useful information from it. Image classi- better results have developed many methods using the
fication is a very wide area of image processing. Clas- CNN approach.
sification is the process of ensuring that unclassified Michael et al. [6] call attention to that how to code
images are included in their class within certain cate- data and invariance properties in profound CNN struc-
gories [1]. Image classification is a problem of computer tures is as yet a reasonable issue. In the investigation,
vision that deals with a lot of basic information from it is proposed that CNN change the standard convo-
fields such as healthcare, agriculture, meteorology and lution square and afterward transmit some more data
safety. The human brain can easily classify images. But layer after the cover, yet it is prescribed to remain
for a computer this is not easy if the image contains in the system with some solidness. The fundamental
noise. Different methods have been developed to per- thought is to exploit both positive and negative highs
form the classification operation. General classification in convolution maps. This is accomplished by adjust-
procedures can be divided into two broad categories of ing the conventional enactment work venture before it
supervised classification based on the method used and is dumped. Extensive investigations in two established
unsupervised classification [2]. datasets (MNIST and CIFAR-10) demonstrate that the
In a supervised class, the investigator defines homo- proposed approach performs better to standard CNN.
geneous representations of information classes in the H. Dou et al. [7] proposed a multi-scale CNN with a
image. These examples are called training areas. The profundity diminishing multi-section structure to take
choice of appropriate training areas is based on the care of the issue of scale change in an image.
knowledge of the analyst’s classification. Thus, the an- Image processing is now routinely used by a wide
alyst is tempted to control the classification of certain range of individuals who have access to digital cam-
classes [3]. The unsupervised classification reverses the eras and computers. With a minimum investment, one
supervised classification process. Programs using clus- can readily enhance contrast, detect edges, quantify in-
tering algorithms are employed to determine statistical tensity, and apply a variety of mathematical operations
groupings or constructs in the data. Generally, the ana- to images. Although these techniques can be extremely
lyst specifies how many groups or clusters in the data powerful, the average user often digitally manipulates
can be searched for. In addition to specifying the re- images with abandon, seldom understanding the most
quired number of classes, the analyst can also deter- basic principles behind the simplest image-processing
mine the separation distance between the clusters and routines [8]. Although this may be acceptable to some
the parameters for the variation within each cluster. The individuals, it often leads to an image that is signifi-
unchecked classification does not start with a prede- cantly degraded and does not achieve the results that
fined class set. Supervised learning has been extremely would be possible with some knowledge of the basic
successful in learning good visual presentations that not operations of an image-processing system.
only give good results on the task they are trained on, One of the main problems in computer vision is
but also transfer to other tasks and data sets [3]. Sci- the image classification problem, which is concerned
entists have developed many methods for solving the with determining the presence of visual structures in
image classification problem. These methods compete an input image. As we know, image classification is a
to achieve perfection in image classification. ImageNet complex process that may be affected by many factors.
[4] is an image classification competition. The data to Because classification results are the basis for many en-
be processed and the number of categories to be clas- vironmental and socioeconomic applications, scientists
sified are increased every year. The competition, which and practitioners have made great efforts in developing
was organized in 2012, has been a milestone in image advanced classification approaches and techniques for
classification. improving classification accuracy [9].
A. Krizevsky et al. [5] have been successful in achiev- Classification between the objects is an easy task
ing the best result of recent times with the approach for humans but it has proved to be a complex prob-
they have developed using convolution networks. In lem for machines. The rise of high-capacity computers,
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 101
TABLE 6.1
Different image classification techniques.
Classification techniques Benefits Assumptions and/or limitations
Neural network • Can be used for classification or • Difficult to understand the structure of
regression an algorithm
• Able to represent Boolean functions • Too many attributes can result in
(AND, OR, NOT) overfitting
• Tolerant of noisy inputs • Optimal network structure can only be
• Instances can be classified by more determined by experimentation
than one output
Support vector machine • Models nonlinear class boundaries • Training is slow compared to Bayes
• Overfitting is unlikely to occur and decision trees
• Computational complexity reduced to • Difficult to determine optimal
a quadratic optimization problem parameters when training data is not
• Easy to control the complexity of linearly separable
decision rule and frequency of error • Difficult to understand the structure of
an algorithm
Fuzzy logic • Different stochastic relationships can • Prior knowledge is very important to
be identified to describe properties get good results
• Precise solutions are not obtained if
the direction of decision is not clear
Genetic algorithm • Can be used in feature classification • Computation or development of the
and feature selection scoring function is nontrivial
• Primarily used in optimization • Not the most efficient method to find
• Always finds a “good” solution (not some optima, rather than global
always the best solution) • Complications involved in the
• Can handle large, complex, representation of training/output data
nondifferentiable and multimodal
spaces
• Efficient search method for a complex
problem space
• Good at refining irrelevant and noisy
features selected for classification
TABLE 6.2
Comparative analysis of different image classification techniques.
Parameter Artificial neural Support vector Fuzzy logic Genetic algorithm
networks machines
Type of approach Nonparametric Nonparametric with Stochastic Large time series
binary classifier data
Nonlinear decision Efficient when the Efficient when the Depends on prior Depends on the
boundaries data have only few data have more input knowledge for direction of decision
input variables variables decision boundaries
Training speed Network structure, Training data size, Iterative application of Refining irrelevant
momentum rate, kernel parameter, the fuzzy integral and noise genes
learning rate, class separability
convergence criteria
Accuracy Depends on the Depends on selection Selection of cutting Selection of genes
number of input of optimal hyper plane threshold
classes
General Network structure Kernel parameter Fused fuzzy integral Feature selection
performance
the ImageNet challenge. To improve the classification 2. Graph based learning shows good accuracy, but high
accuracy and achieve competitive ImageNet challenge computational complexity.
accuracy, the proposed work considers classification of 3. The accuracy of an algorithm depends on the sam-
multiple images into different categories (classes) with ple selection (selection of the most informative
more accuracy in classification, reduction in cost and unlabeled samples) and consideration of spectral
in shorter time by applying parallelism using a deep data.
neural network model. (http://www.jatit.org/volumes/
research-papers/Vol4No11/5Vol4No11.pdf) 6.2.4 Research Challenge
1. Getting high classification accuracy by considering
6.2.3 Research Gaps the computational complexity of designing a semi-
After surveying the literature of the different classifica- supervised classification algorithm is a challenging
tion techniques, we can observe that task.
1. The accuracy of a current algorithm for the most part 2. Designing an effective and robust algorithm by ex-
relies upon adequate labeled data. ploiting spectral, as well as spatial information.
104 Deep Learning and Parallel Computing Environment for Bioengineering Systems
3. The sample selection method needs to be carefully such as medicine, safety and education. Deep learn-
chosen (use of active learning) while designing al- ing is a subfield of machine learning which endeav-
gorithms. ors to learn high-level abstractions in data by utilizing
hierarchical architectures. It is a developing methodol-
6.2.5 Problem Definition ogy and has been broadly connected in traditional ar-
We strive to classify multiple images into the correct cat- tificial intelligence domains, such as semantic parsing
egories (classes) with more accuracy in classification, [1], transfer learning [2,3], natural language process-
reduction in cost and in shorter time by applying par- ing [4], computer vision [5,6], and many more. There
allelism. are mainly three important reasons for the booming of
deep learning today: dramatically increased chip pro-
6.2.6 Objective cessing abilities (e.g., GPU units), significantly lowered
1. Training of the system followed by testing. The train- cost of computing hardware, and considerable advances
ing process implies taking the characteristic proper- in the machine learning algorithms [9]. Various deep
ties of the images (from a class) and framing a one learning approaches have been broadly reviewed and
of a kind depiction for a specific class. discussed in recent years [8–12]. Among those Schmid-
2. The testing step means categorizing the test im- huber et al. emphasized the important inspirations and
ages into various classes for which the system was technical contributions in a historical timeline format,
trained. This assigning of the class is done based on while Bengio examined the challenges of deep learning
the partitioning between classes based on the train- research and proposed a few forward-looking research
ing features. directions. Deep networks have shown to be effective for
computer vision tasks because they can extract appro-
priate features while jointly performing discrimination
6.3 DEEP CONVOLUTIONAL NEURON [9,13]. In recent ImageNet Large Scale Visual Recogni-
NETWORK tion Challenge (ILSVRC) competitions [11], deep learn-
An artificial neural network was derived mimicking the ing methods have been widely adopted by different re-
working mechanism of the human brain. Its layered net- searchers and achieved top accuracy scores [7]. Deep
work structure has been made by simulating the way learning allows computational models that are com-
nerve cells work. This structure has been used for a posed of multiple processing layers to learn representa-
long time in many artificial intelligence problems and tions of data with multiple levels of abstraction. These
has achieved great results [1]. The foundations of deep methods have dramatically improved the state-of-the-
learning include artificial neural networks. An artificial art in speech recognition, visual object recognition, ob-
neural network has been produced for providing archi- ject detection, and many other domains such as drug
tects with a better model of the brain. In the 1980s, discovery and genomics. Deep learning discovers intri-
deep constraints did not permit intensive matrix manip- cate structures in large data sets by using the backprop-
ulation, so deep learning couldn’t be transformed into agation algorithm to indicate how a machine should
training. In the late 1980s, Hinton and Lecun’s back- change its internal parameters that are used to com-
propagation algorithm [2] attracted the attention of the pute the representation in each layer from the represen-
Advanced Research Institute of Canada, despite the fact tation in the previous layer. Deep convolutional nets
that it did not resonate in the scientific areas, and re- have brought about breakthroughs in processing im-
search groups of institute’s affiliated universities were ages, video, speech and audio, whereas recurrent nets
perhaps the reference groups for deep learning on this have shed light on sequential data such as text and
issue [3]. Deep learning has been used in many ma- speech. The rise of deep learning [10] from its roots
chine learning problems and has achieved successful to becoming the state-of-the-art of AI has been fueled
results. Recent developments in deeper learning enable by three recent trends: the explosion in the amount of
efficient processing of large numbers of images. Many training data, the use of accelerators such as graphics
famous architectures were developed such as Boltzman processing units (GPUs), and advancements in the de-
machines, restricted Boltzman machines, autoencoders sign of models used for training. These three trends have
and convolutional neural networks. Convolutional net- made the task of training deep layer neural networks
works, in particular, achieve great results in image classi- with large amounts of data both tractable and useful.
fication. Along with the progress of technology, almost Using any of the deep learning frameworks (e.g., Caffe
every field has inevitably benefited from the technol- [11], TensorFlow [12], MXNet [7]), users can develop
ogy. Image classification is valuable for many key areas and train their models. Neural network models range in
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 105
size from small (5 MB) to very large (500 MB). Training device, for instance, a GPU. The stages that show no
neural networks can take a significant amount of time, data parallelism are executed in host code. The stages
and the goal is to find suitable weights for the different that show rich proportion of data parallelism are ex-
variables in the neural network. Once the model train- ecuted in the device code [4]. A CUDA program is a
ing is complete, it can be used for inference, serving and bound together source code fusing the two, host and
applying the trained model on new data in domains device code. The NVIDIA C compiler (nvcc) separates
such a natural language processing, speech recognition, the two in the midst of the accumulation methodology.
or image classification. The host code is a straight ANSI C code; it is moreover
accumulated with the host’s standard C compilers and
continues running as a typical CPU process. The device
6.4 COMPUTE UNIFIED DEVICE code is made using ANSI C extended with watchwords
ARCHITECTURE (CUDA) for naming data parallel limits, called parcels, and their
The graphics processing unit (GPU) has become an in- related data structures. The device code is consistently
tegral part of today’s mainstream computing systems. furthermore requested by the nvcc and executed on a
During the past six years, there has been a marked in- GPU device. Under conditions where no device is open
crease in the performance and capabilities of GPUs. or the part is even more fittingly executed on a CPU,
A modern GPU is not only a powerful graphics en- one can similarly execute partitions on a CPU using
gine, but also a highly parallel programmable proces- the mirroring features in CUDA programming progres-
sor featuring peak arithmetic and memory bandwidth sion pack (SDK) or the MCUDA mechanical assembly
that substantially outpaces its CPU counterpart. The [Stratton 2008]. In the matrix multiplication example,
GPU’s rapid increase in both programmability and ca- the entire matrix multiplication computation can be
pability has spawned a research community that has implemented as a kernel where each thread is used to
successfully mapped a broad range of computationally compute one element of output matrix P . The kernel
demanding, complex problems to the GPU [15]. This functions (or, simply, kernels) typically generate a large
effort in general-purpose computing on the GPU, also number of threads to exploit data parallelism. In this
known as GPU computing, has positioned the GPU as a example, the number of threads used by the kernel is
compelling alternative to traditional microprocessors in a function of the matrix dimension. For a 1000 × 1000
high-performance computer systems of the future. Due matrix multiplication, the kernel that uses one thread
to the rapid growth of graphics processing unit (GPU) to compute one P element would generate 1,000,000
processing capability, using GPU as a coprocessor to threads when it is invoked. It is worth noting that CUDA
assist the central processing unit (CPU) in computing threads are of much lighter weight than the CPU threads
massive data becomes essential. Computational scien- [5]. CUDA programmers can assume that these threads
tists have long been interested in graphics processing take very few cycles to generate and schedule due to effi-
units (GPUs) due to their relatively low cost per unit of cient hardware support. This is in contrast with the CPU
floating-point (FP) performance. Unlike conventional threads that typically require thousands of clock cycles
multiprocessors, GPU’s processor cores are specialized to generate and schedule. CUDA is a development to
for program behaviors common to graphics shaders— the C language that licenses GPU code to be written
thousands of independent threads, each comprising in ordinary C. The code is either engaged for the host
only dozens or hundreds of instructions, performing processor (the CPU) or centered at the device proces-
few memory accesses and producing a small number sor (the GPU). The host processor delivers multithread
of output values. Recent advances in hardware and pro- endeavors (or bits as they are known in CUDA) onto
grammability have opened GPUs to a broader commu- the GPU device. The GPU has its own special internal
nity of developers. GPUs’ throughput-optimized archi- scheduler that will by then assign the pieces to whatever
tectural features can outstrip CPU performance on nu- GPU gear is accessible. We’ll give relevant details later.
merical computational workloads, depending on how Given there is adequate parallelism in the task, as the
well the workload matches the computational behavior amount of SMs in the GPU grows, so should the speed
for which the GPU is designed. An important question of the program [8]. In any case, this covers a notewor-
for many developers is whether they can map particu- thy issue. You have to request at what rate the code can
lar applications to these new GPUs to achieve signif- be continued running in parallel. The best speedup pos-
icant performance increases over contemporary multi- sible is obliged by the proportion of consecutive code.
core processors. A CUDA program includes at least one If it contains an endless proportion of planning power
stage that is executed on either the host (CPU) or a and can do parallel endeavors in zero time, we will at
106 Deep Learning and Parallel Computing Environment for Bioengineering Systems
present be left to deal with the consecutive code part. increase the CGMA ratio for a kernel. Accessing DRAM
Thus, we have to consider in the beginning if we can is slow and expensive. To overcome this problem, sev-
avoid ifs, ands or buts and parallelize a great deal of the eral low-capacity, high-bandwidth memories, both on-
exceptional weight. and off-chip, are present on a CUDA GPU. If some data
NVIDIA is centered around offering assistance to is used frequently, then CUDA caches it in one of the
CUDA. Broad information, models, and contraptions low-level memories. Thus, the processor does not need
to help with progression are available from its site at to access the DRAM every time. The following figure il-
http://www.nvidia.com under CudaZone. CUDA, rather lustrates the memory architecture supported by CUDA
than its forerunners, has directly truly started to gain and typically found on NVIDIA cards. The following are
vitality and from the blue, no doubt, there will be a pro- the different types of memory used in CUDA [24]:
gramming lingo that will ascend as the one of choice for A. Local Memory. Each SP uses local memory. All vari-
GPU programming. Given that the amount of CUDA- ables declared in a kernel (a function to be executed
enabled GPUs now number in the millions, there is a gi- on GPU) are saved in local memory.
gantic market out there sitting tight for CUDA-engaged B. Registers. A kernel may consist of several expres-
applications [21]. Figuring is progressing from “central sions. During execution of an expression, values are
planning” on the CPU to “co-taking care of” on the saved into the registers of SP.
CPU and GPU. To enable this new enrolling perspec- C. Global Memory. It is the main memory of GPU.
tive, NVIDIA made the CUDA parallel figuring building Whenever a memory from GPU is allocated for vari-
that is directly transporting in GeForce, ION Quadro, ables by using cudaMalloc () function, by default it
and Tesla GPUs, addressing a gigantic presented base uses global memory.
for application development [28]. See Fig. 6.3. D. Shared Memory. Shared memory is shared by every
Apart from the device DRAM, CUDA supports sev- thread in a block. Shared memory is used to reduce
eral additional types of memory that can be used to the latency (memory access delay). When the shared
memory is used for a variable, it should be prefixed calculations. A calculation communicated utilizing Ten-
with keyword _shared_ during its declaration, e.g., sorFlow can be executed with almost no change on a
_shared_int x. wide assortment of heterogeneous systems, going from
E. Constant Memory. Constant memory is also used cell phones, for example, telephones and tablets, up to
to reduce latency. But constant memory is used in huge scale disseminated systems of several machines
only those situations, when multiple threads have and a large number of computational gadgets, for ex-
to access the same value. This is how constant mem- ample, GPU cards [14]. The framework is adaptable
ory reduces the latency. and can be utilized to express a wide assortment of
F. Texture Memory. Texture memory is again used to calculations, including preparing and derivation calcu-
reduce the latency. Texture memory is used in a spe- lations for profound neural system models, and it has
cial case. Consider an image. When it accesses a par- been utilized for leading examination and for convey-
ticular pixel, there are more chances that will access ing a machine learning system into creation crosswise
surrounding pixels. Such a group of values which over more than 12 territories of software engineering
are accessed together are saved in texture memory. and different fields, including discourse acknowledg-
ment, PC vision, application autonomy, data recovery,
regular dialect handling, geographic data extraction,
6.5 TENSORFLOW and computational medication revelation. This chap-
ter depicts the TensorFlow interface and an execution
TensorFlow is a machine learning system that operates
of that interface that has been developed at Google
at large scale and in heterogeneous environments. Ten-
[29].
sorFlow uses dataflow graphs to represent computation,
shared state, and the operations that mutate that state.
It maps the nodes of a dataflow graph across many 6.6 IMPLEMENTATION
machines in a cluster, and within a machine across
multiple computational devices, including multicore 6.6.1 Deep Convolutional Neural Networks
CPUs, general-purpose GPUs, and custom-designed Deep learning. Deep learning is a new area of machine
ASICs known as tensor processing units (TPUs). This learning research, which has been presented with the
architecture gives flexibility to the application devel- goal of getting machine learning nearer to one of its
oper: whereas in previous “parameter server” designs unique objectives.
the management of a shared state is built into the sys- Deep learning is an artificial intelligence function
tem, TensorFlow enables developers to experiment with that copies the task of the human brain and is used
novel optimizations and training algorithms [25]. Ten- in processing data and creating patterns for decision
sorFlow supports a variety of applications, with a focus making. Deep learning for images is simply using more
attributes extracted from the image rather than only its
on training and inference on deep neural networks. Sev-
signature. However, it is done automatically in the hid-
eral Google services use TensorFlow in production, so it
den layers and not as input (as it is the case in NN) [7],
was released as an open-source project and has become
as can see in Fig. 6.4. The neurons in the first layer pass
widely used for machine learning research. TensorFlow
input data to the network.
uses a single dataflow graph to represent all computa-
tion and states in a machine learning algorithm, includ-
ing the individual mathematical operations, parameters
and their update rules, and input preprocessing [26].
The dataflow graph expresses the communication be-
tween subcomputations explicitly, thus making it easy
to execute independent computations in parallel and
to partition computations across multiple devices. Ten-
sorFlow differs from batch dataflow systems in two
respects. The model supports multiple concurrent ex-
ecutions on overlapping subgraphs of the overall graph.
And individual vertices may have mutual states that can FIG. 6.4 Neural network representation.
be shared between different executions of the graph.
TensorFlow [28] is an interface for communicating ma- Similarly, the last layer is called the output layer. The
chine learning calculations, and used for executing such layers in-between the input and output layers are called
108 Deep Learning and Parallel Computing Environment for Bioengineering Systems
hidden layers. In this example, the network has only one Suppose we try to teach a computer to recognize im-
hidden layer shown in blue [13]. The networks which ages and classify them into one of these 10 categories
have many hidden layers tend to be more accurate and [22]; see Fig. 6.5.
are called deep networks; hence machine learning algo- To do so, we first need to educate the how a cat,
rithms which use these deep networks are called deep a dog, a bird, etc., looks like before the computer has
learning. the capacity to perceive another picture. The more fe-
A typical convolutional network [9] is a sequence of lines the computer sees, the better it gets at perceiv-
ing felines [30]. This is known as regulated learning.
convolution and pooling pairs, followed by a few fully
It can convey this errand by marking the images, the
connected layers. A convolution is like a small neural
PC will begin perceiving designs exhibited in feline im-
network that is applied repeatedly, once at each loca- ages that are different from others and will begin as-
tion on its input. As a result, the network layers become sembling its own particular discernment. We will utilize
much smaller but increase in depth. Pooling is the oper- Python and TensorFlow [28] to compose the program.
ation that usually decreases the size of the input image. TensorFlow is an open source, profound learning system
Max pooling is the most common pooling algorithm, made by Google that gives engineers granular control
and has proven to be effective in many computer vision over every neuron (known as a “hub” in TensorFlow)
tasks. so it can alter the weights and accomplish ideal execu-
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 109
tion. TensorFlow has numerous developed libraries (a ers were told not to dismiss images in which numerous
few of which shall be utilized for image ordering) and occurrences of the question show up. See Table 6.3.
has an astonishing network, so only needs to discover
open source usage for basically any profound learning 6.6.3 Implementing an Image Classifier
point. Since the project specification did not present a concrete
problem to solve; a problem first needs to be found and
6.6.2 Dataset decided on. The problem needs to be not too complex,
In this work we have chosen to use the CIFAR-10 dataset while still being not too trivial. It requires implement-
that comprises 60,000 images, having size of 32 × 32 ing some kind of image classifier; what kind of images,
pixels. The dataset contains 10 classes that are funda- however, and how specific the image classification, e.g.,
mentally unrelated (don’t overlap), with each class con- classifying different objects or more specifically classify-
taining 6000 images [22]. The images are little, obvi- ing different types of the same object, remains part of
ously marked, and have no commotion which makes the implementation [15]. A sufficient amount of data
the dataset perfect for this assignment with impressively of sufficient quality is also needed to be able to train
substantially less pre-preparing. Few images taken from and test the image classifier; what “sufficient” means
the dataset are presented in Fig. 6.6. in numerical terms and in practice may fall on the im-
I. CIFAR-100 dataset. This dataset is much the same plementation chapter as well. The first task in this part
as the CIFAR-10 [30], with the exception of it has 100 of the project is therefore to find and look at different
classes containing 600 images each. There are 500 pre- sources of data sets; then decide on a problem of reason-
pared images and 100 testing images for each class. able complexity, where a sufficient amount of data can
The 100 classes in the CIFAR-100 are gathered into 20 be acquired for training and testing the image classifier.
superclasses. Each image accompanies a “fine” name After that, the data, of course, needs to be downloaded
(the class to which it has a place) and a “coarse” mark [16].
(the superclass to which it has a place). In the follow- The second task is to begin implementing the im-
ing we describe the class and superclass structure of age classifier. The image classifier will be implemented
the CIFAR-100 dataset. Every superclass contains five in both Microsoft CNTK and Google TensorFlow, using
classes. Where the name of the class is plural, the label- TensorFlow as back end, with Keras, a third party API for
TABLE 6.3 point in the future as well [24]. All the different mod-
List of classes in CIFAR-100. els of the image classifier in the different frameworks
will be implemented and developed in the same pro-
Superclass Classes gramming language and development environment, to
aquatic mammals beaver, dolphin, otter, seal, make the models more comparable. The programming
whale language that will be used for the implementation is
fish aquarium fish, flatfish, ray, Python 3 [32] and the development environment that
shark, trout will be used is Microsoft Visual Studio 2015 Enterprise
flowers orchids, poppies, roses, with the Python Tools for Visual Studio plug-in [31]
sunflowers, tulips installed, in addition to using the same programming
food containers bottles, bowls, cans, cups, language and IDE [27].
plates
fruit and vegetables apples, mushrooms, 6.6.4 Installation and System Requirements
oranges, pears, sweet In order to gather the necessary information to eval-
peppers uate the system requirements, software and hardware
household electrical clock, computer keyboard, support, and programming language support, Tensor-
devices lamp, telephone, television Flow, Keras, and CNTK’s documentation was studied.
household furniture bed, chair, couch, table, In order to evaluate the ease and speed of installation,
wardrobe the frameworks were downloaded and installed. This is
insects bee, beetle, butterfly, the more subjective part of the evaluation. The aspects
caterpillar, cockroach that the conclusions are based on are: the number of
steps required to be able to use the framework, and the
large carnivores bear, leopard, lion, tiger,
wolf perceived ease to follow the aforementioned steps. Writ-
ten below are the steps the authors used to install each
large man-made outdoor bridge, castle, house, road,
respective framework. First, the development environ-
things skyscraper
ment needed to be set up. Since the development is to
large natural outdoor cloud, forest, mountain,
be done in Python, Python 3.5.3 was downloaded and
scenes plain, sea
installed from Python’s homepage https://www.python.
large omnivores and camel, cattle, chimpanzee, org/. The IDE used was Microsoft Visual Studio 2015
herbivores elephant, kangaroo
Enterprise [34], which was already installed on the com-
medium-sized mammals fox, porcupine, possum, puters used in this study. To be able to use Python in
raccoon, skunk Visual Studio, the Python Tools for Visual Studio (PTVS)
non-insect invertebrates crab, lobster, snail, spider, extension [33] needed to be installed. To install PTVS,
worm the Visual Studio installation was modified through the
people baby, boy, girl, man, woman Windows Control Panel with the added PTVS extension.
reptiles crocodile, dinosaur, lizard, Google TensorFlow [29] was downloaded and installed
snake, turtle through Visual Studio, with PTVS, using the built-in
small mammals hamster, mouse, rabbit, tool pip. To be able to use the GPU, TensorFlow’s
shrew, squirrel GPU version 0.12.1 was installed, pip handled and
trees maple, oak, palm, pine, installed all Python related dependencies [21]. When
willow using TensorFlow’s GPU version, two additional down-
loads were required: NVIDIA CUDA Toolkit 8.0 and
vehicles 1 bicycle, bus, motorcycle,
pickup truck, train NVIDIA cuDNN v5.1 (CUDA Deep Neural Network) li-
brary, which were downloaded from https://developer.
vehicles 2 lawn-mower, rocket,
nvidia.com/cuda-downloads, and https://developer.nvi
streetcar, tank, tractor
dia.com/cudnn, respectively. The cuDNN’s dll-file was
placed in the CUDA-folder created after installing;
deep learning, as front end. Keras is usable as front end Keras was downloaded and installed through Visual
Studio, with PTVS, using the built-in tool pip. The
to TensorFlow [29] today, the process to add Keras to the
version installed was 1.2.2. Pip handled and installed
TensorFlow core is ongoing as of January 2019 [25], and all Python related dependencies, however, the scipy
will probably be able to use CNTK as back end at some and numpy versions installed through pip were wrong,
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 111
and needed to be downloaded and installed manually. the cycle is made by the CPU at the finish of each cy-
The correct versions of scipy and numpy needed by cle.
Keras were downloaded from http://www.lfd.uci.edu/ Convolutional layer. Instead of directly using the
~gohlke/pythonlibs/. The downloaded whl-files of the 2D convolution routine projected by Caffe, here we de-
correct versions of scipy and numpy were installed using compose the data convolutional operations into fine-
pip through the Windows Command Prompt [26]. grained tasks and then map them to threads on the
Now we discuss the mapping mechanism between GPU.
CUDA programming model and Deep CNN-to-GPU in Pooling layer. In this layer, according to a predefined
detail. group size, components in a convolutional result array
are divided into different groups. The main objective
A. Mapping implementation is finding the max value or calculating the mean value
The preparing method and mapping association be- of each group, which depends on the chosen sampling
tween Deep CNN layers and CUDA kernel are showed method.
up in Fig. 6.7. Two CUDA kernel functions are proposed Full connection layer. The forward propagation and
to be designed for the forward and backward propaga- backward propagation on a single neuron in a full con-
tion of each of the layers, respectively. At the start of nection layer are shown in Fig. 6.7. In a forward pass,
each cycle, forward convolution kernel loads a sample a dot product z between an input vector x and weight
from global memory, according to its index which is vector ω needs to be done, and the final output of the
the record provided by the CPU; this way training data neuron a is calculated by a = f (z). And in the backward
are fed in continuously. On account of CPU, network pass, the gradient δ1z is calculated by δ1a Xda/dz, where
−−−−→
outputs need to be copied back to calculate the current δ1a is the dot product between − →
ω and δa 1 + 1.
value of the loss function after a forward pass, while Output layer. The mapping mechanism of output
the learning rate needs to be balanced after a backward layers to CUDA kernels is similar to a full connection
pass. By comparing the present loss and the predefined layer, in general. It likewise gives mapping operations
minimum loss value, a decision whether to jump out on the same neuron to one thread when applying the
FIG. 6.7 DCNN training on GPU. Green parts are CUDA kernel functions executed on GPU, blue parts are
operations executed on CPU.
112 Deep Learning and Parallel Computing Environment for Bioengineering Systems
softmax function as the activation function, consider- sisting of only 1 layer, or a more complicated neural
ing the communications among threads become more network consisting of 5, 9, 16, etc., layers.
frequent under this condition. 4. The weight matrices and bias vectors of proper size
and initialized to their initial values. (One weight
6.6.5 Algorithms matrix and bias vector per layer.)
Gradient descent is the dominant method used to train 5. The loss value – the model has as output the logit
deep learning models. The proposed work should con- vector (estimated training labels) and by compar-
tain the following two algorithms, which are used for ing the logit with the actual labels, we can calculate
the training purpose: the loss value (with the softmax with cross-entropy
1. Stochastic gradient descent function). The loss value is an indication of how
2. Mini-batch gradient descent close the estimated training labels are to the actual
All classifiers are trained using stochastic gradient training labels, and they will be used to update the
descent with one of three loss functions: perceptron, weight values.
hinge, or logistic. For each label, a binary classifier is 6. An optimizer, which will use the calculated loss
trained, and an image is classified with the label corre- value to update the weights and biases with back-
sponding to the largest “score”. The parameters of gra- propagation.
dient descent include the number of training iterations
and the learning step. Finally, the mini-batch gradient 6.7.2 Understanding the Original Image
descent (MBGD) algorithm is improved with comput- Dataset
ing united device architecture (CUDA) multi-streaming The original one batch data is a 10000 × 3072 matrix
technique, which further speeds up network training expressed in numpy array. The number of columns,
in GCN framework, CIFAR-10 dataset compared to the 10000, indicates the number of sample data. As stated
state-of-the-art framework of TensorFlow [17]. in the CIFAR-10/CIFAR-100 dataset, the row vector (of
size 3072) represents a color image of 32 × 32 pixels.
Since this project is going to use CNN for the classifica-
6.7 RESULT ANALYSIS tion tasks, the original row vector is not appropriate. In
6.7.1 Neural Networks in TensorFlow order to feed an image data into a CNN model, the di-
The graph containing a neural network (see Fig. 6.8.) mension of the input tensor should be either (width ×
should contain the following steps: height × num_channel) or (num_channel × width ×
1. The input datasets – the training dataset and la- height). It depends on the choice. We select the first
bels, the test dataset and labels (and the validation choice because the default choice in TensorFlow’s CNN
dataset and labels). operation is such [20]. How to reshape into such a
2. The test and validation datasets can be placed inside form? The row vector for an image has the exact same
a tf.constant(). And the training dataset is placed in number of elements if we calculate 32 · 32 · 3 = 3072. In
a tf.placeholder() so that it can be fed in batches order to reshape the row vector into (width × height ×
during training (stochastic gradient descent). num_channel) form, two steps are required. The first
3. The neural network model with all of its layers. This step is to use the reshape function, and the second step
can be a simple fully connected neural network con- is to use the transpose function in numpy. By definition
from the numpy official website, reshape transforms an can take a list of axes, and each value specifies an in-
array to a new shape without changing its data. Here, dex of dimension we want to move. For example, calling
the phrase “without changing its data” is an important transpose with argument (1, 2, 0) in an numpy array of
part since don’t want to damage the data [21]. Reshape (num_channel, width, height) will return a new numpy
operations should be delivered in three more detailed array of (width, height, num_channel). See Fig. 6.10.
steps. The following describes the logics behind:
1. Divide the row vector into 3 pieces, where each piece 6.7.3 Understanding the Original Labels
means a color channel. The resulting array is a 3 × The label data is just a list of 10,000 numbers ranging
1024 matrix, which makes 10,000 × 3 × 1024 tensors from 0 to 9, which correspond to each of the 10 classes
in total. in CIFAR-10.
2. Divide each 3 pieces further by 32 which is the airplane : 0
width and height of an image. This results in 3 × automobile : 1
32 × 32, making 10,000 × 3 × 32 × 32 tensors in to- bird : 2
tal. In order to realize the logical concept in numpy, cat : 3
reshape should be called with the following argu- deer : 4
ments: (10,000, 3, 32, 32). As we have noticed, the dog : 5
reshape function doesn’t automatically divide fur- frog : 6
ther when the third value (32, width) is provided. horse : 7
We need to explicitly specify the value for the last ship : 8
value (32, height). See Fig. 6.9. truck : 9
This is not the end of story, yet. Now, one image Code 1 defines a function to return a handy list of
data is represented in (num_channel, width, height) image categories. This function will be used in the pro-
form. However, this is not the shape TensorFlow and duction phase. Because the predicted output is a num-
matplotlib are expecting. They are expecting a different ber, it should be converted as a string, so humans can
shape, namely (width, height, num_channel), instead. read. See Fig. 6.11.
So it is required to swap the order of the axes, and this The display_stats defined below answers some of the
is where transpose comes in. The transpose function questions like in a given batch of data [23].
114 Deep Learning and Parallel Computing Environment for Bioengineering Systems
“What are all possible labels?” “What is the range of machine learning algorithms the best chance on the
values in the image data?” “Are the labels in order or problem.
random?” See Fig. 6.12. Normalize. The normalize function takes data, x,
We have tried the third batch and its 7000 images. and returns it as a normalized numpy array. Data x
As the result, Fig. 6.13 shows that the numbers of image can be anything, and it can be an N -dimensional array.
data for each class are about the same. In this chapter, it will be 3D array for an image. The
min–max normalization (y = (x − min)/(max − min))
6.7.4 Implementing Pre-process Functions
technique is used, but there are other options, too. By
One can probably notice that some frameworks/li- applying min–max normalization, the original image
braries like TensorFlow, numpy, or Scikit-learn provide
data is going to be transformed in the range from 0
similar functions, which we are going to build. For get-
to 1 (inclusive). A simple answer to why normalization
ting the best accuracy of machine learning algorithms
on the datasets, some machine learning algorithms re- should be performed is somewhat related to activation
quire the information to be in an explicit form, whereas functions. See Fig. 6.14.
other algorithms can perform better if the information For example, a sigmoid activation function takes an
is set up with a certain goal, but not always. At last, the input value and outputs a new value ranging from 0 to
raw data may not be in the required format to best ex- 1. When the input value is somewhat large, the output
pose the underlying structure and relationships to the value easily reaches the max value of 1. Similarly, when
predicted variables. It is important to prepare the avail- the input value is somewhat small, the output value eas-
able data in such a way that it gives various different ily reaches the max value of 0. See Fig. 6.15.
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 115
For another example, ReLU activation function takes 0 to infinity. When the input value is somewhat large,
an input value and outputs a new value ranging from the output value increases linearly. However, when the
input value is somewhat small, the output value easily
reaches the max value of 0. See Fig. 6.16.
FIG. 6.13 Showing sample image in batch. FIG. 6.16 ReLU function.
Now, when thinking about the image data, all val- put, x, which is a list of labels (ground truth). The total
ues originally range from 0 to 255. It appears that when number of elements in the list is the total number of
data is passed to the sigmoid function, the output is samples in a batch. One_hot_encode function returns
almost always 1, and when it is passed into ReLU func- a two-dimensional tensor, where the number of rows is
tion, the output could be very large. When backpropa- the size of the batch, and the number of columns is the
gation process is performed to optimize the networks, number of image classes. See Fig. 6.18.
this could lead to exploding/vanishing gradient prob- Process all the data and save it.
lems. In order to avoid the issue, it is better to let all the Code 6 below uses the previously implemented
values be around 0 and 1. functions, normalize and one-hot-encode, to pre-
process the given dataset. As depicted in Fig. 6.19, 10%
6.7.5 Output of the Model of data from every batch will be combined to form the
For now, what we need to know is the output of the validation dataset. The remaining 90% of data is used
model. It is a set of probabilities of each class of im- as a training dataset. Lastly, there is a testing dataset that
age based on the model’s prediction result. In order to is already provided. The code cell below will preprocess
express those probabilities in code, a vector having the all the CIFAR-10 data and save it to an external file [22].
same number of elements as the number of classes of def _preprocess_and_save(normalize, one_hot_en-
the image is needed. For instance, CIFAR-10 provides 10 code, features, labels, filename): features = normal-
different classes of the image, so we need a vector of size ize(features) labels = one_hot_encode(labels)
10 as well. See Fig. 6.17. pickle.dump((features, labels), open(filename, ’wb’))
Also, our model should be able to compare the pre- def preprocess _and_save _data(cifar10_ dataset_folder_
diction with the ground truth label. This means the path, normalize, one_hot_encode): n_batches = 5
shape of the label data should also be transformed into valid_features = [] valid_labels = []
a vector of size 10, too. Instead, because the label is the for batch_i in range(1, n_batches + 1): features,
ground truth, we set the value 1 for the corresponding labels = load_cfar10 _ batch( cifar10 _dataset_folder
element [21]. One_hot_encode function takes the in- _path, batch_i)
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 117
np.array(test_features), np.array(test_labels), ’prepro- bigger group of information over the GPUs. This setup
cess _training.p’) requires that all GPUs share the model parameters. A
Code 6: Overall process code verifiable truth is that exchanging information to and
from GPUs is very moderate [19]. Therefore, choose to
6.7.6 Training a Model Using Multiple GPU store and refresh every model parameter on the CPU
Cards/CUDA (see the green box). A new classification of model pa-
Present day workstations may contain numerous GPUs rameters is exchanged to the GPU when another bunch
for logical calculation. TensorFlow can use this condi- of information is handled by all GPUs. The GPUs are
tion to run the preparation task simultaneously over dif- synchronized in the task. All inclinations are gathered
ferent cards. Preparing a model in a parallel, dispersed from the GPUs and arrived at the midpoint of (see
mold requires organizing preparing forms. For what the green box). The model parameters are refreshed
tails term demonstrate is reproduction to be one dupli- with the angles found the middle value of over every
cate of model preparing on a subset of information. In- single model copy. The variation in training time is
nocently utilizing nonconcurring updates of model pa- relatively low in both frameworks, although the varia-
rameters prompts problematic preparing execution on tion is slightly higher using Keras with TensorFlow as
the grounds that an individual model imitation may be back end, the last run on CIFAR-10 using Keras with
prepared on a stale duplicate of the model parameters. TensorFlow as back end especially stands out, having
On the other hand, utilizing completely synchronous 30 seconds to its nearest neighbor, see Table 6.4. In-
updates will be as moderate as the slowest show repro- terestingly, the first epoch was consistently the epoch
duction [18]. that took the most time to finish, see the Maximum
On a workstation with numerous GPU cards, each Epoch Time column in the tables. After some testing
GPU will have comparable speed and contain enough and after the results presented in tables were com-
memory to run an entire CIFAR-10 show. In this way, piled, we came to the conclusion that the first epochs
we outline our preparation framework in the following took more time because we ran the scripts with de-
way: bugging on in Visual Studio. When we ran the scripts
1. Place an individual model imitated on each GPU. without debugging, the first epochs took approximately
2. Update model parameters synchronously by sitting the same time as the rest of the epochs. Train_neu-
tight for all GPUs to wrap up a clump of informa- ral_network function runs optimization task on a given
tion. batch. Because CIFAR-10 dataset comes with 5 separate
A diagram of this model is presented in Fig. 6.20. batches, and each batch contains different image data,
train_neural_network should be run over every batch.
This can be done with simple codes just like shown in
Code 7. It runs the training over 10 epochs for every
batch, see Fig. 6.21, and Fig. 6.22 shows the training re-
sults.
TABLE 6.4
Results of Keras/TensorFlow CIFAR-10.
Run Total Mean epoch Maximum
(no) time (s) time (s) epoch time (s)
1 3733 93.4 98
2 3746 93.6 97
3 3757 94.1 99
4 3759 94.0 101
5 3701 92.4 97
FIG. 6.20 CUDA based working model. 1–5 18692 93.5 100
tion). The misfortune is computed on preparing and set k − 1 times. The variance of the resulting estimate is
approval, and its interpretation is how well the model is minimized as k becomes larger. See Fig. 6.22.
getting along for these two sets. As opposed to exactness, The precision of a model is generally decided after
misfortune isn’t a rate. It is a sum of the mistakes made the model parameters are found and settled, and no
in every case of preparing or approval sets. On account more learning occurs. At that point the tests are encour-
of neural systems, the misfortune is typically negative aged to the model and the quantity of mix-ups (zero–
log-probability and remaining aggregate of squares for one misfortunes) the model makes is recorded, after
characterization and relapse individually. At that point correlation with the genuine targets. At that
normally, the fundamental target in a learning model point, the level of misclassification is computed. See
is to decrease (limit) the misfortune capacity’s, an in- Fig. 6.23.
centive regarding the model’s parameters, by changing After getting the trained model, it is very easy to
the weight vector esteems through various advancement predict whether images from the test dataset are cats
techniques, for example, back engendering in neural or dogs, etc. (with a predicted probability for each).
systems. Cross-validation is a method used to assess ma- The intuition behind this is that even if the test image
chine learning models on a limited data sample. The is not too easy to make a prediction, the transforma-
procedure has a single parameter called k that refers to tions change it so that the model has a higher chance
the number of groups that a given data sample is to be of capturing the dog/cat shape and predicting accord-
split into. As such, the procedure is often called k-fold ingly. A few of the images have been misclassified due
cross-validation. At last when a specific value for k is to poor contrast, rectangular rather than square images,
picked, it might be utilized instead of k in the reference or because the dog/cat is in a very small portion of the
to the model, for example, k = 10 implies 10-fold cross- image. Take a look, for example, at the rectangular im-
validation. age of a dog. When the model tries to predict for this
In the k-fold cross-validation method, the dataset is image, it sees just the center of the image (cropping
distributed into k subsets, and the method is repeated by default is center). Thus, it cannot predict if the im-
k times. Each time, one of the k subsets is utilized as age is of a dog or a cat. From above it observed that,
the test set and the remaining k − 1 subsets are put to- at our machine has predicted the image of the puppy
gether to form a training set. The average error across all and classified it as a dog and similarly for car and clas-
k trials is then calculated. The advantage of this method sified it as an automobile. A learning curve is a plot of
is that every data point has one chance to be in a test the training and test losses as a function of the num-
set exactly once, and has the chance to be in a training ber of iterations. These plots are very useful to visualize
120 Deep Learning and Parallel Computing Environment for Bioengineering Systems
the train/validation losses and validation accuracy. The For instance, if the number of tests is 1000 and
learning curve of the model is as shown in Fig. 6.24. As model orders 952 of those accurately, at that point the
found in the figure, the accuracy rate remains constant models correctness is 95.2%. See Fig. 6.25.
after 3000 iterations. Following the training, the tech-
nique has finished with 90% precision in the test phase
utilizing untagged information in the database. It is un- 6.8 CONCLUSIONS
avoidable that the classification accuracy of our modest The present study investigated whether deep learning
amount of information increases when the quantity of could be applied to the classification of images on the
information is increased. CIFAR-10 database. Deep learning technologies are be-
CHAPTER 6 Deep Convolutional Neural Network for Image Classification on CUDA Platform 121
REFERENCES
1. S.J. Lee, T. Chen, L. Yu, C.H. Lai, Image classification based
on the boost convolutional neural network, IEEE Access 6
(2018) 12755–12768.
2. G. Wang, W. Li, M.A. Zuluaga, R. Pratt, P.A. Patel, M. Aert-
sen, T. Doel, A.L. David, J. Deprest, S. Ourselin, T. Ver-
cauteren, Interactive medical image segmentation using
deep learning with image-specific fine-tuning, IEEE Trans-
actions on Medical Imaging (2018).
3. J. Ker, L. Wang, J. Rao, T. Lim, Deep learning applications
in medical image analysis, IEEE Access 6 (2018) 937589.
4. H. Wu, S. Prasad, Semi-supervised deep learning us-
ing pseudo labels for hyperspectral image classification,
FIG. 6.24 Learning curve of the model. IEEE Transactions on Image Processing 27 (3) (2018)
1259–1270.
5. Emine Engil, Ahmet Çinar, Zafer Güler, A GPU-based con- 17. S. Ioffe, C. Szegedy, Batch normalization: accelerating deep
volutional neural network approach for image classifica- network training by reducing internal covariate shift, arXiv
tion, in: Artificial Intelligence and Data Processing Sympo- preprint, arXiv:1502.03167, 2015.
sium (IDAP), 2017 International, IEEE, 2017, pp. 1–6. 18. T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao,
6. Waseem Rawat, Zenghui Wang, Deep convolutional neural B. Xu, C. Zhang, Z. Zhang, MXNet: a flexible and efficient
networks for image classification: a comprehensive review, machine learning library for heterogeneous distributed sys-
Neural Computation 29 (9) (2017) 2352–2449. tems, arXiv preprint, arXiv:1512.01274, 2015.
7. X. Jia, Image recognition method based on deep learning,
19. T. Li, Y. Dou, J. Jiang, Y. Wang, Q. Lv, Optimized deep belief
in: Control and Decision Conference (CCDC), 2017 29th
networks on CUDA GPUs, in: Neural Networks (IJCNN),
Chinese, IEEE, May 2017, pp. 4730–4735.
2015 International Joint Conference on, IEEE, 2015 July,
8. A. Isin, S. Ozdalili, Cardiac arrhythmia detection using
pp. 1–8.
deep learning, Procedia Computer Science 120 (2017)
268–275. 20. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran,
9. H. Zhang, Z. Zheng, S. Xu, W. Dai, Q. Ho, X. Liang, Z. Hu, B. Catanzaro, E. Shelhamer, cuDNN: efficient primitives
J. Wei, P. Xie, E.P. Xing, Poseidon: an efficient communi- for deep learning, 2014.
cation architecture for distributed deep learning on GPU 21. H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional
clusters, arXiv preprint, 2017. deep belief networks for scalable unsupervised learning
10. A. Awan, K. Hamidouche, J. Hashmi, D.K. Panda, S-Caffe: of hierarchical representations, in: Proceedings of the
co-designing MPI runtimes and caffe for scalable deep International Conference on Machine Learning, 2009,
learning on modern GPU clusters, in: 22nd ACM SIGPLAN pp. 609–616.
Symposium on Principles and Practice of Parallel Program- 22. A. Awan, K. Hamidouche, J. Hashmi, D.K. Panda, S-Caffe:
ming, February 2017; co-designing MPI runtimes and caffe for scalable deep
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep learning on modern GPU clusters, in: 22nd ACM SIGPLAN
network training by reducing internal covariate shift, arXiv: Symposium on Principles and Practice of Parallel Program-
1502.03167, 2015. ming, February 2017.
11. F. Tschopp, J.N. Martel, S.C. Turaga, M. Cook, J. Funke, Ef- 23. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran,
ficient convolutional neural networks for pixelwise classifi- B. Catanzaro, E. Shelhamer, cuDNN: efficient primitives
cation on heterogeneous hardware systems, in: Biomedical
for deep learning, 2014.
Imaging (ISBI), 2016 IEEE 13th International Symposium
24. NVIDIA corp., Nvidia dgx-1, [Online], available: http://
on, IEEE, April 2016, pp. 1225–1228.
www.nvidia.com/object/deep-learning-system.html, 2016.
12. Y. Demir, A. Uçar, C. Güzeliş, Moving towards in object
recognition with deep learning for autonomous driving 25. TensorFlow, http://tensorflow.org/.
applications, 2016. 26. TensorFrames, https://github.com/databricks/tensor
13. H.R. Roth, L. Lu, J. Liu, J. Yao, A. Seff, K. Cherry, L. Kim, frames.
R.M. Summers, Improving computer-aided detection using 27. TensorFlowOnSpark, https://github.com/yahoo/Tensor
convolutional neural networks and random view aggrega- FlowOnSpark.
tion, IEEE Transactions on Medical Imaging 35 (5) (2016) 28. The CIFAR-10 Dataset, https://www.cs.toronto.edu/~kriz/
1170–1181. cifar.html.
14. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, 29. THE MNIST DATABASE, http://yann.lecun.com/exdb/
M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, mnist/.
TensorFlow: a system for large-scale machine learning, in: 30. Yahoo Flickr Creative Commons 100M.
OSDI, vol. 16, 2016, pp. 265–283. 31. CNTK, [Online], available: http://www.cntk.ai/, Feb. 2017.
15. B. Alipanahi, A. Delong, M.T. Weirauch, B.J. Frey, Predict-
32. http://junyelee.blogspot.com/2018/01/deep-learning-
ing the sequence specificities of DNA- and RNA-binding
with-python.html.
proteins by deep learning, Nature Biotechnology 33 (8)
(2015) 831. 33. http://www.drdobbs.com/parallel/cuda-supercomputing-
16. K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: for-the-masses-part/215900921.
surpassing human-level performance on ImageNet classi- 34. https://www.tutorialspoint.com/cuda/cuda_memories.
fication, in: Proceedings of the IEEE International Confer- htm.
ence on Computer Vision, 2015, pp. 1026–1034.
CHAPTER 7
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00014-2 123
Copyright © 2019 Elsevier Inc. All rights reserved.
124 Deep Learning and Parallel Computing Environment for Bioengineering Systems
size also made deep learning techniques more popu- used training method to provide minimal error sur-
lar. At the time when big data emerged, deep learning face.
models also became popular in providing solutions by Adding multiple hidden layers to the MLP network
processing and analyzing big data. To train a large scale will form the deep architecture which can solve more
deep learning model, high performance computing sys- complex problems since the hidden layers capture non-
tems are required. By utilizing a GPU based deep learn- linearity features as shown in Fig. 7.2. Such deep ar-
ing framework, the training time is reduced from several chitectures are called deep neural networks (DNNs).
weeks to one day, or to several hours. Hence, a typical To train these DNNs, more sophisticated approaches
deep learning model initially undergoes unsupervised were proposed. In short, DNNs can be trained by super-
training, and then a supervised training methodology is vised and unsupervised learning techniques. In super-
applied for better fine-tuning and to learn features and vised learning, target outputs are specified and weights
representations of big data for classification and pattern are fine-tuned to reduce the error, which in turn predicts
recognition tasks [2]. Other than health care domain, a desired value for the process of classification or regres-
sion. In the case of unsupervised learning, the training
deep learning approaches have proven better perfor-
process is carried out by not having any target data.
mance in natural language processing, speech recogni-
Hence, unsupervised learning is most widely used in
tion, computer vision, etc.
feature extraction, dimensionality reduction and clus-
The multi-layer perceptron (MLP) network was de-
tering. Also, for the applications attributed to health
signed to simulate the process of biological neural net-
informatics, it is better to combine unsupervised learn-
works, which can alter themselves, generate new neural ing method with the initial training process of DNN
connections and also involve a learning process car- to extract the more abstract features and then use these
ried on according to the stimulations raised by neurons. features for classification by incorporating a supervised
MLP network comprises an input layer, one or more learning algorithm.
hidden layers, and an output layer [1]. Training the MLP In the past, DNNs were not more focused on due
network is carried out for many epochs, and new in- to the requirement of high computational capacity
put sample is presented for each epoch and weights both for training and processing, specifically for a few
between neurons are adjusted on the basis of the learn- real time applications. Recently, advancements made in
ing algorithm. During the initial training cycle, random hardware technology and also the possibility of paral-
weights are assigned between neurons and for the sub- lelization, made through GPU acceleration, multicore
sequent training cycles weights are fine-tuned to reduce processing and cloud computing, the limitations were
the difference among target outputs and network out- overcome, which enabled DNNs to become a popular
puts. The gradient descent method is the most widely learning methodology based on artificial intelligence.
CHAPTER 7 Efficient Deep Learning Approaches for Health Informatics 125
Although different deep learning approaches are dis- architectures such as deep autoencoder, recurrent neural
cussed in the literature, a detailed analysis and compar- network, deep belief network (DBN), deep Boltzmann
ison of the deep learning models is missing and their machine (DBM), restricted Boltzmann machine (RBM),
constructional, training requirements, specifically in the convolutional neural network, etc., are known, which
health care domain, are not discussed efficiently. Thus are introduced and discussed below.
this chapter deals with in-depth exploration of vari-
ous deep learning approaches and their applications Deep Autoencoders
in health informatics. The subsequent sections of this An autoencoder is a kind of neural network to extract
chapter discuss structural design, description and appli- the features using data driven learning. It has equally
cations of the deep learning approaches, specifically in many nodes both in the input and output layers, and
the health informatics domain. Also the required input training is carried out to recreate the input vector in-
features, data format and outcomes of each model in stead of assigning a target label to it. In general, the
health care applications are discussed. A comparative number of neurons in a hidden layer is less compared
analysis of various deep learning models is also pre- to that of the input layer neurons so that encoding of
sented with respect to the health informatics domain. data is done in a lower-dimensional space and also ab-
stract features are extracted. For some applications, high
dimensional input data has been fed to the input layer,
7.2 DEEP LEARNING APPROACHES and in that case maintaining one hidden layer is not suf-
Deep learning architectures are capable of extracting ficient to encode all the data. Hence, deep autoencoder
hierarchical representation features automatically and architectures are designed by keeping multiple autoen-
also use the remaining layers for learning intricate fea- coders which can be fixed on top of each other. Fig. 7.3
tures from simpler ones. Hence, these approaches can shows a simple model of a deep autoencoder with in-
be incorporated to design an end-to-end network model put, output, encoder and decoder layers [3]. These net-
to learn features automatically from raw inputs and to works also experience the problem of vanishing gra-
process them accordingly. In the literature, several DNN dients during the training process. To solve this prob-
126 Deep Learning and Parallel Computing Environment for Bioengineering Systems
lem, commonly the network is initially assigned with parative analysis of manually found and deep learned
the weight values randomly in the pre-training phase, features.
after which standard backpropagation algorithm is in-
corporated to fine-tune the parameters. In the literature, Recurrent Neural Networks (RNNs)
various autoencoders are designed to make the repre- RNN is a neural network designed for analyzing streams
sentation learning stronger and more efficient against of data by means of hidden units. In some of the ap-
the deviations occurring in the input space. plications like text processing, speech recognition and
DNA sequences, the output depends on the previous
computations. Since RNNs deal with sequential data,
they are well suited for the health informatics domain
where enormous amounts of sequential data are avail-
able to process [3]. Fig. 7.4 shows a model of RNN.
to learn. Zheng et al. [12] discussed about LSTMs for Deep Belief Network
RUL estimation. Comparisons of three kinds of autoen- A deep belief network is a kind of deep learning net-
coder such as the simple RNN, LSTM and GRU are done work formed by stacking several RBMs. Fig. 7.6 shows
in [13] for the purpose of prognostics of auto engines a model of a deep belief network (DBN) [1]. The train-
and fault diagnosis. A health monitoring system based ing process is carried out in a greedy layer-wise manner
on empirical evaluation using LSTMs was carried out by with weight fine-tuning to abstract hierarchical features
Zhao et al. [14] to predict tool wear. An integrated ar- derived from the raw input data. DBN was designed to
chitecture, which is a combination of a CNN and an model a perceived distribution among the input and
LSTM, was proposed in [15] and outperformed several hidden layers’ space such that direct connections exist
baseline approaches. among the lower layer nodes and indirect connections
exist at the upper layer nodes [4]. The training process is
Restricted Boltzmann Machine (RBM) carried out layer-wise at the same time by adjusting the
Hinton et al. [16] have designed a restricted Boltzmann weight parameters using contrastive convergence (CD)
machine model which is a variation of Boltzmann ma- to establish a balanced estimate of the learning proba-
chine and a kind of neural network. It is a stochas- bility. Besides, the conditional probability distribution
tic model with normal input, output and hidden units of the input samples is determined to learn the abstract
and also restricted to construct a bipartite graph [1] as features which are robust and also invariable to trans-
shown in Fig. 7.5. A pair of nodes from each of these formation, noise, etc. [19].
units can form a symmetric connection between them.
However, the nodes within the unit have no direct con-
nection. To extract more abstract features, multiple RBM
models are fixed and the upper layers are completely
connected as conventional learning models to differen-
tiate feature vectors [4]. The network learns a probabilis-
tic distribution over its input space. It has the ability to
detect patterns even if some data is missing [2]. How-
ever, due to the challenges like inactive hidden nodes,
class variation, more sensitivity to large scale data set,
etc., make the training process more difficult, and also
tracking of cost or loss function is difficult [3]. These
issues are overcome by the methods based on RBM pro-
posed by Nair et al. [17] and Li et al. [18]. RBMs pro-
vide better solutions for feature extraction, dimension-
ality reduction and collaborative filtering. Two popular FIG. 7.6 Deep belief network (DBN).
RBMs are deep belief network and deep Boltzmann ma-
chine.
Deep Boltzmann Machine (DBM)
A deep Boltzmann machine is a model with more hid-
den layers with directionless connections between the
nodes as shown in Fig. 7.7. DBM learns the features
hierarchically from the raw data and the features ex-
tracted in one layer are applied as hidden variables as
input to the subsequent layer. As in DBN, DBM incorpo-
rates a Markov random field for layer-wise pre-training
for the large unlabeled data and then provides feedback
from the upper layer to the backward layers. By applying
the backpropagation method, the training algorithm is
fine-tuned [20]. The training process in DBM needs to
be adapted to define the training information, weight
initialization and adjustment parameters. It is observed
FIG. 7.5 Restricted Boltzmann machine. from the DBM that time complexity constraints will oc-
128 Deep Learning and Parallel Computing Environment for Bioengineering Systems
cur when setting the parameters as optimal [4]. A center- collection of digital filters to perform the convolution
ing optimization method was proposed by Montavon operation on the input data. The pooling layer is used as
et al. [21] to make the learning mechanism more stable a dimensionality reduction layer and decides the thresh-
and also for midsized DBM for the purpose of designing old. During backpropagation, a number of parameters
a generative, faster and discriminative model. are required to be adjusted, which in turn minimizes
the connections within the neural network architecture.
Convolutional Neural Network The design of CNNs is stimulated by means of a bio-
A neural network which was designed to process multi- logical model of the visual cortex. According to the func-
dimensional data like image and time series data is tional process of a visual cortex, a typical CNN design
called a convolutional neural network (CNN). It in- comprises the order of convolution and subsample lay-
cludes feature extraction and weight computation dur- ers. The purpose of keeping fully connected layers after
ing the training process. The name of such networks is the last subsampling layer is to perform dimensional-
obtained by applying a convolution operator which is ity reduction. These fully connected layers are consid-
useful for solving complex operations. The true fact is ered like in traditional neural networks. In most of the
that CNNs provide automatic feature extraction, which cases, CNNs are used to analyze image data and hence
is the primary advantage [2]. The specified input data is the operations performed at these layers are within a
initially forwarded to a feature extraction network, and two-dimensional plane. CNNs are most widely used in
then the resultant extracted features are forwarded to a health care applications since they have the capabil-
classifier network as shown in Fig. 7.8. The feature ex- ity of automatically generating features from the time
traction network comprises loads of convolutional and series data and the frequency representation images.
pooling layer pairs. Convolutional layer consists of a These features are then forwarded to a classifier network
for classification and regression. CNNs are also used in
other applications like speech recognition, time series
prediction, etc. [3]. Janssens et al. [22] have incorpo-
rated CNNs to monitor rotating machinery conditions.
The discrete Fourier transform of two accelerometers is
applied as a network input. In [23], the authors have de-
signed a CNN for prediction purposes. Here, data from
sensors at periodic intervals are considered as input.
A regression layer is added since the input data is fed as
a continuous value. The authors clearly demonstrated
about how well the designed CNN based regression
model outperforms the traditional regression methods
such as multilayer perceptron algorithm, support vec-
tor regression, and also the relevance vector regression.
A deep CNN architecture was designed by Ding and He
FIG. 7.7 Deep Boltzmann machine. [24] and by Guo et al. [26] for giving solutions for fault
diagnosis applications. In Abdeljaber et al. [25], the au- auditory signals. Generally, the training process of a
thors have designed CNNs for vibration analysis. The deep learning network takes more time. However, dur-
benefit of this approach is the automatic detection of ing testing, a deep learning network can be fast when
robust features from the input without the need for ad- run on GPUs.
ditional processing. In Lee et al. [27], the authors have Table 7.1 shows the various deep learning methods
examined the application of CNNs in analyzing noisy which were developed over the decade and their com-
TABLE 7.1
Comparison of deep learning methods.
Deep learning Description Strengths Weaknesses
algorithms
Denoising Designed to correct Better for feature extraction More computational time;
autoencoders corrupted input data values and compression Addition of random noise;
Less scalability to high
dimensional data
Sparse autoencoder Sparsity parameter can be Linearly separable features More computational time
applied to loss function to can be produced since more forward passes
construct robust features are required for every input
independent of applications sample
Restricted Boltzmann Designed as a generative Ability to create patterns Training process will be
machine (RBM) model with layer by layer even when some data are difficult;
feature learning missing Tracking of loss or cost
function takes more time
Deep Boltzmann Architecture is designed Robust feature extraction More training time is
machine (DBM) with undirected connection through unsupervised required; Joint optimization
between layers of the training is possible by of parameters will be
network allowing a feedback difficult when large data set
mechanism is applied to the network
Deep belief network Designed with undirected Able to extract global Slowest training process
(DBN) connection between features from data; Shows
topmost two layers and better performance for
directed connection one-dimensional data; Good
between lower layers for data dimensionality
reduction problems
Convolutional neural Deep neural network Most widely used in deep Large volume of data is
network (CNN) structure with learning applications with required with more
interconnections reflects the different variations of hyperparameter tuning to
biological visual cortex training strategies; Provides extract optimal features
good performance for
multi-dimensional data;
Representational abstract
features can be extracted
from raw data
Recurrent neural Neural network structure to Most widely used to model Training process is difficult
network model sequential time series time series data and sometimes affected
data; Temporal layer is from vanishing gradients.
added to learn about More parameters have to be
complex variations in data updated, which in turn
makes the real time
prediction process more
difficult
130 Deep Learning and Parallel Computing Environment for Bioengineering Systems
parative analysis. Some techniques are more popular DNA methylation states derived from DNA sequences.
and most widely used for health care applications and Compared to deep learning approaches, conventional
for big data analysis. machine learning approaches provide more benefits,
specifically for processing small data sets.
performance results in processing the health records pared to a fully connected network, CNNs use less pa-
[45]. In the literature, some works designed supervised rameters by applying a convolution operation on the
deep learning networks and also presented unsuper- input data space and also parameters are shared be-
vised deep learning models to process electronic health tween the regions. Hence, large DNA sequence data can
records in terms of learning representations of patient be trained using these models and also improved pat-
data and are then evaluated using shadow classifiers. Liu tern detection accuracy can be obtained. Deepbind, a
et al. [46] have proposed a multiple layer CNN for pre- deep architecture based on CNNs, was proposed by Ali-
dicting various kinds of heart diseases and also specified panathi et al. [57], which predicts specificities of DNA
the significant benefits. In [47], the authors designed and RNA binding proteins. CNNs were also used for
RNNs with hidden units named Deepcare to infer cur- predicting chromatin marks from a DNA sequence [44].
rent illness states and predict future medical outcomes. Angermueller et al. [35] have incorporated CNNs for
They also modified the network with a decay effect to predicting DNA methylation states. Like CNNs, other
process irregularly timed events. Deepcare model is also deep architectures were also applied for extracting fea-
used to evaluate in various applications like disease pro- tures from raw DNA sequence data and for processing
gression modeling, future risk prediction of diabetes, the data.
and mental health patient cohorts. Miotto et al. [48]
have developed a three layer stacked denoising autoen- Mobile Devices
coder (SDA) to learn deep patient representations for Smartphones and wearable devices which are embed-
the EHRs. They also applied this representation on dis- ded with sensors play a significant role in health moni-
ease risk prediction using random forests as classifiers. toring. By using these devices, direct access to personal
This deep representation provides better prediction re- analytics of the patients is possible, which can con-
sults than conventional machine learning algorithms tribute to monitoring their health, facilitating preven-
(like PCA, k-means). Liang et al. [49] have applied tive care and also helping in managing ongoing illness
RBMs to learn representations from EHRs and proven [2]. Deep learning plays a crucial role in analyzing these
better prediction accuracy for various diseases. new kinds of data. There exist some challengeable is-
Deep learning models are also used to model time sues like efficient implementation of deep neural archi-
series data such as laboratory test results with respect tecture design on a mobile device for processing data
to the identification of specific phenotypes. Lipton et from sensors, etc. To overcome these challenges, sev-
al. [54] have designed RNNs with LSTM to identify pat- eral suggestions were proposed in the literature. Lane
terns from clinical measurements which are a kind of and Georgiev [61] have proposed a low power deep
time series data sources. This model was trained to clas- neural network, which exploits both CPU and digital
sify various diagnoses from irregularly sampled clini- signal processor of mobile devices without burdening
cal measurements. The model provided better results the hardware. They also proposed another deep archi-
compared to traditional machine learning algorithms tecture DeepX, a software accelerator that can minimize
in terms of processing time series data. In [59], a deep the resource usage, which is the major need of mo-
architecture based RNNs were used by the authors for bile adoption. It also enabled large scale deep learning
processing free-text patient summary information and to execute on mobile devices and outperformed cloud
also for obtaining improved results in removing pro- based off-loading solutions [62]. In [63], the authors
tected health information from clinical notes. Table 7.2 have incorporated CNNs and RNNs with LSTM to pre-
describes about the deep learning models and some ap- dict frozen gait problems in Parkinson disease patients.
plications with respect to health informatics. These patients will struggle to initiate movements such
as walking. A deep learning technique was also used to
Genomics predict poor or good sleep of persons using actigraphy
Deep learning models are widely used in extracting measurements of the physical activity during their awak-
high-level abstract features, providing improved perfor- ening time.
mance over the traditional models, increasing inter-
pretability and also for understanding and processing
biological data. To predict splicing action of exons, a 7.4 CHALLENGES AND LIMITATIONS
fully connected feedforward neural network was de- From the review on deep learning approaches and ap-
signed by Xiong et al. [60]. In recent years, CNNs were plications in health informatics, it can be inferred that
applied on the DNA dataset directly without the re- DBN and autoencoders are better suited for fault diag-
quirement of defining features a priori [2], [44]. Com- nosis purposes. CNN and RNN architectures are most
132 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 7.2
Deep learning models and applications with respect to specific area and input data set.
Model Data Area Applications References
Deep neural Genomics Bioinformatics Representing gene variants from microarray [50]
network data
Deep neural Genomics Bioinformatics Drug design from the given molecule [1]
network compounds
Deep neural Genomics Bioinformatics Interaction of compound-protein, RNA binding [35]
network protein from protein structures,
genes/RNA/DNA sequences and molecule
compounds
Stacked AE Electronic Medical Detection of distinguished patterns of [51]
health records informatics physiology in clinical time series data
Stacked AE Electronic Medical Designing a structure for measuring sequences [2]
health records informatics of serum uric acid to represent the signatures
of gout and acute leukemia
Stacked sparse Clinical imaging Medical Early diagnosis of Alzheimer disease from brain [36]
AE Imaging MRIs
Stacked sparse Genomics Bioinformatics Diagnosis of cancer from gene expression [32]
AE profiles
Stacked sparse Genomics Bioinformatics To predict protein backbones from protein [2]
AE sequences
Stacked Electronic Medical To produce unsupervised representation of [48]
denoising AE health records informatics patients which can be used for the prediction of
future clinical events
Stacked Electronic Medical To predict future diseases from the available [52]
denoising AE health records informatics patient clinical data status
Stacked Clinical imaging Medical To diagnose breast nodules and lesions from [39]
denoising AE imaging ultrasound images
Stacked Clinical imaging Medical To detect different modes of variations in [37]
denoising AE imaging Alzheimer disease from brain MRIs
RBM Electronic Medical Segmentation of more number of sclerosis [2]
health records informatics lesions in multi-channel 3D MRIs
RBM Electronic Medical Automatic diagnosis and disease classification [49]
health records informatics from the patient clinical data status
RBM Electronic Medical Predicting suicide risk analysis of mental health [2]
health records informatics patients through representations of the medical
concepts which are embedded in the EHRs
RBM Mobile data Pervasive To identify photoplethysmography signals for [53]
sensing effective health monitoring
LSTM RNN Electronic Medical To effectively diagnose and classify from the [54]
health records informatics clinical measurements of patients in pediatric
intensive care unit
LSTM RNN Electronic Medical To design a memory model with dynamic [47]
health records informatics nature for predictive treatment based on patient
history
LSTM RNN Electronic Medical To predict disease levels from longitudinal lab [2]
health records informatics tests
LSTM RNN Electronic Medical Effective collection of details from patient [2]
health records informatics clinical notes
CHAPTER 7 Efficient Deep Learning Approaches for Health Informatics 133
widely designed to learn representations from health diagnostic problems in health informatics, the authors
data. Even though all these models produce better out- have not given enough proof and clarifications for why
comes in health informatics, expert’s domain knowl- they have chosen these architectures. From Table 7.2, we
edge is an essential requirement for successful realiza- can see that deep autoencoders, DBN, RNN and CNN
tion. Deep learning architectures depend heavily on rep- models played a crucial role in producing solutions for
resentation data of features. However, getting them for fault diagnosis applications, because the researchers had
any application is an expensive task and also a time con- chosen these models to solve specific fault diagnostic
suming process. However, deep learning approaches are problems. Hence, analyzing the models with various
desirable for health informatics applications since they perceptions also gives rise to many open challenges for
can establish better outcomes compared to traditional researchers. Thus in existing systems, deep learning ap-
machine learning approaches. The key challenges ob- proaches are applied as normal machine learning tech-
served from the literature are as follows: niques without the proper evidence of why the network
There exist a few questions like why the authors produces good results, etc.
have chosen CNN or RNN for some applications, how In the previous sections, it was discussed that deep
deep these architectures have to be designed, and so learning approaches require large amounts of training
on. Hence, it will be difficult to realize the reasons for data for better predictive results. Even though many
certain results produced by the network. Thus, it can medical organizations have taken steps to convert med-
be concluded that developments in deep learning ap- ical documents from paper to electronic records, the
proaches are oriented towards using modern GPUs for dataset related to a particular disease is always de-
solving computational problems in parallel. manded [1]. Hence, deep learning approaches cannot
Even though some of the deep learning models are be suited for all types of application particularly in the
being incorporated for giving solutions for particular case of diagnosis of diseases which occur rarely. An-
134 Deep Learning and Parallel Computing Environment for Bioengineering Systems
other concern is that, while training the deep learning 7.5 CONCLUSIONS
networks, the overfitting problem will exist when the In recent years, deep learning approaches dominated in
network parameters are equivalent to the size of in- pattern recognition and classification through learning.
put data set. Thus the DNN model can remember the In this work, we have discussed the basic deep learn-
training samples, but cannot be generalized for new in- ing network models and outlined some of the appli-
put samples which have not already been seen. Hence, cations in health informatics. Biomedical data can be
the solutions have to be explored to prevent overfitting efficiently processed by deep learning networks, which
problems and to improve in deriving solutions for all in turn increase the predictive power for many specific
input data sets. applications in the health informatics domain. More-
When using DNN models in health care applica- over, several applications of health informatics involve
tions, the collected data set cannot be used as such as processing the medical data as an unstructured source.
input to the network. It has to be pre-processed, normal- The sources of unstructured data arise from clinical
ized before feeding for the processing layers. Besides, imaging, medical informatics, bioinformatics, etc. How-
the initial assumption of parameter values which affect ever, electronic health records represent the data like
the design of a DNN, like size of input data set and the patient’s information, pathology, treatment given, diag-
minimum number of filters to be used for a CNN or its nosis details, etc., in structured format. Deep learning
depth, still has to be explored well and to be validated approaches handle both representations efficiently to
for standard input. Hence, the process of implementing produce better outcomes. Deep learning provides more
successful pre-processing and determining the optimal opportunities toward designing predictive data models,
set of hyperparameters can be a challenging problem to especially in health informatics. However, there exist
be solved. These issues are directly affecting the training some technical challenges to be resolved. For exam-
time of the network and also play a crucial role towards ple, it is too expensive to get patient and clinical data
designing an effective classification model for biomedi- and also a large fraction of health data set represent
cal applications. healthy control individuals. Deep learning algorithms
mostly depend on large amounts of training data. And
Another important aspect is that most of the DNNs
also the algorithms have been incorporated in applica-
can be easily misled. For example, it was observed in
tions where the input data set is balanced. Sometimes
[64] that by applying small changes to the input sam-
the network is given fabricated biological data samples.
ples will cause misclassification of the samples. How-
All these challenges act as a barrier for the basic require-
ever, it is noted that most of the machine learning algo-
ments such as data availability and privacy to be satis-
rithms are prone to these issues. The feature values can fied. Advancements made in the development of health
be intentionally set too high or too low to bring mis- care monitoring equipment and diagnosis instruments
classification. Similarly the decision tree classification will have a vital role for the future deep learning re-
process can also be misled by adding changes in the fea- search. With respect to computational power, in the
ture set. Thus all kinds of machine learning model are future more ad hoc hardware platforms with excellent
prone to such manipulations. In [65], the authors have computation power and storage for network models
proven that there is a possibility of obtaining a mean- will be available. Thus we conclude that deep learning
ingless artificial dataset which is divided into a finite set algorithms have established better outcomes and pre-
of classes even though they are not classified by the al- diction in health informatics with the integration of
gorithm. It is one of the drawbacks of DNN algorithms advanced parallel processors.
and also a limitation for all other traditional learning
algorithms as well.
Thus, when large amounts of biomedical data are REFERENCES
available, deep learning techniques can better develop 1. D. Ravì, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-
and produce good results, particularly in the applica- Perez, B. Lo, G.Z. Yang, Deep learning for health infor-
tions where human interpretation is difficult. Hence, matics, IEEE Journal of Biomedical and Health Informatics
these techniques lead to faster and smarter diagnosis 21 (1) (2017) 4–21.
2. R. Miotto, F. Wang, S. Wang, X. Jiang, J.T. Dudley, Deep
of diseases and also improve the decision making pro-
learning for healthcare: review, opportunities and chal-
cess. Thus, designing efficient deep learning models to lenges, Briefings in Bioinformatics (2017).
produce good predictive results in the heath informat- 3. S. Khan, T. Yairi, A review on the application of deep learn-
ics domain is always a challenging problem for the re- ing in system health management, Mechanical Systems
searchers. and Signal Processing 107 (2018) 241–265.
CHAPTER 7 Efficient Deep Learning Approaches for Health Informatics 135
4. P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, Ex- 17. V. Nair, G.E. Hinton, Rectified linear units improve re-
tracting and composing robust features with denois- stricted Boltzmann machines, in: Proceedings of the 27th
ing autoencoders, in: Proceedings of the 25th Interna- International Conference on Machine Learning (ICML-10),
tional Conference on Machine Learning, ACM, 2008, July, 2010, pp. 807–814.
pp. 1096–1103. 18. G. Li, L. Deng, Y. Xu, C. Wen, W. Wang, J. Pei, L. Shi, Tem-
5. S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Con- perature based restricted Boltzmann machines, Scientific
tractive auto-encoders: explicit invariance during feature Reports 6 (2016) 19133.
extraction, in: Proceedings of the 28th International Con- 19. G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algo-
ference on International Conference on Machine Learning, rithm for deep belief nets, Neural Computation 18 (7)
Omnipress, 2011, June, pp. 833–840. (2006) 1527–1554.
6. H.O.A. Ahmed, M.L.D. Wong, A.K. Nandi, Intelligent con- 20. R. Salakhutdinov, H. Larochelle, Efficient learning of deep
dition monitoring method for bearing faults from highly Boltzmann machines, in: Proceedings of the Thirteenth In-
compressed measurements using sparse over-complete fea- ternational Conference on Artificial Intelligence and Statis-
tures, Mechanical Systems and Signal Processing 99 (2018) tics, 2010, March, pp. 693–700.
459–477. 21. G. Montavon, K.R. Müller, Deep Boltzmann machines and
7. C. Lu, Z.Y. Wang, W.L. Qin, J. Ma, Fault diagnosis of the centering trick, in: Neural Networks: Tricks of the
rotary machinery components using a stacked denoising Trade, Springer, Berlin, Heidelberg, 2012, pp. 621–637.
autoencoder-based health state identification, Signal Pro- 22. O. Janssens, V. Slavkovikj, B. Vervisch, K. Stockman, M.
cessing 130 (2017) 377–388. Loccufier, S. Verstockt, et al., S. Van Hoecke, Convolutional
8. S. Tao, T. Zhang, J. Yang, X. Wang, W. Lu, Bearing fault diag- neural network based fault detection for rotating machin-
nosis method based on stacked autoencoder and softmax ery, Journal of Sound and Vibration 377 (2016) 331–345.
regression, in: Control Conference (CCC), 2015 34th Chi- 23. G.S. Babu, P. Zhao, X.L. Li, Deep convolutional neural net-
work based regression approach for estimation of remain-
nese, IEEE, 2015, July, pp. 6331–6335.
ing useful life, in: International Conference on Database
9. R. Kishore, K. Reddy, S. Sarkar, M. Giering, Anomaly detec-
Systems for Advanced Applications, Springer, Cham, 2016,
tion and fault disambiguation in large flight data: a multi-
April, pp. 214–228.
modal deep auto-encoder approach, in: Annual Confer-
24. X. Ding, Q. He, Energy-fluctuated multiscale feature learn-
ence of the Prognostics and Health Management Society,
ing with deep Convnet for intelligent spindle bearing fault
Denver, Colorado, 2016.
diagnosis, IEEE Transactions on Instrumentation and Mea-
10. Z. Chen, W. Li, Multisensor feature fusion for bearing fault
surement 66 (8) (2017) 1926–1935.
diagnosis using sparse autoencoder and deep belief net-
25. O. Abdeljaber, O. Avci, S. Kiranyaz, M. Gabbouj, D.J.
work, IEEE Transactions on Instrumentation and Measure-
Inman, Real-time vibration-based structural damage de-
ment 66 (7) (2017) 1693–1702.
tection using one-dimensional convolutional neural net-
11. R. Thirukovalluru, S. Dixit, R.K. Sevakula, N.K. Verma, A.
works, Journal of Sound and Vibration 388 (2017)
Salour, Generating feature sets for fault diagnosis using de-
154–170.
noising stacked auto-encoder, in: Prognostics and Health 26. X. Guo, L. Chen, C. Shen, Hierarchical adaptive deep con-
Management (ICPHM), 2016 IEEE International Confer- volution neural network and its application to bearing
ence on, IEEE, 2016, June, pp. 1–7. fault diagnosis, Measurement 93 (2016) 490–502.
12. S. Zheng, K. Ristovski, A. Farahat, C. Gupta, Long short- 27. D. Lee, V. Siu, R. Cruz, C. Yetman, Convolutional neu-
term memory network for remaining useful life estimation, ral net and bearing fault analysis, in: Proceedings of the
in: Prognostics and Health Management (ICPHM), 2017 International Conference on Data Mining (DMIN), The
IEEE International Conference on, 2017, June, pp. 88–95. Steering Committee of The World Congress in Computer
13. M. Yuan, Y. Wu, L. Lin, Fault diagnosis and remaining use- Science, Computer Engineering and Applied Computing
ful life estimation of aero engine using LSTM neural net- (WorldComp), 2016, January, p. 194.
work, in: Aircraft Utility Systems (AUS), IEEE International 28. L.A. Pastur-Romay, F. Cedrón, A. Pazos, A.B. Porto-Pazos,
Conference on, IEEE, 2016, October, pp. 135–140. Deep artificial neural networks and neuromorphic chips
14. R. Zhao, J. Wang, R. Yan, K. Mao, Machine health for big data analysis: pharmaceutical and bioinformatics
monitoring with LSTM networks, in: Sensing Technol- applications, International Journal of Molecular Sciences
ogy (ICST), 2016 10th International Conference on, IEEE, 17 (8) (2016) 1313.
2016, November, pp. 1–6. 29. M.K. Leung, A. Delong, B. Alipanahi, B.J. Frey, Machine
15. P. Malhotra, V. TV, A. Ramakrishnan, G. Anand, L. Vig, P. learning in genomic medicine: a review of computational
Agarwal, G. Shroff, Multi-sensor prognostics using an un- problems and data sets, Proceedings of the IEEE 104 (1)
supervised health index based on LSTM encoder-decoder, (2016) 176–197.
arXiv preprint, arXiv:1608.06154, 2016. 30. C. Angermueller, T. Pärnamaa, L. Parts, O. Stegle, Deep
16. G.E. Hinton, T.J. Sejnowski, Learning and relearning in learning for computational biology, Molecular Systems Bi-
Boltzmann machines, in: Parallel Distributed Processing: ology 12 (7) (2016) 878.
Explorations in the Microstructure of Cognition, vol. 1, 31. E. Gawehn, J.A. Hiss, G. Schneider, Deep learning in drug
1986, pp. 282–317, 2. discovery, Molecular Informatics 35 (1) (2016) 3–14.
136 Deep Learning and Parallel Computing Environment for Bioengineering Systems
32. R. Fakoor, F. Ladhak, A. Nazi, M. Huber, Using deep learn- 45. H. Schütze, C.D. Manning, P. Raghavan, Introduction to
ing to enhance cancer diagnosis and classification, in: Information Retrieval, vol. 39, Cambridge University Press,
Proceedings of the International Conference on Machine 2008.
Learning, vol. 28, 2013, June. 46. C. Liu, F. Wang, J. Hu, H. Xiong, Risk prediction with elec-
33. R. Ibrahim, N.A. Yousri, M.A. Ismail, N.M. El-Makky, tronic health records: a deep learning approach, in: ACM
Multi-level gene/MiRNA feature selection using deep be- International Conference on Knowledge Discovery and
lief nets and active learning, in: Engineering in Medicine Data Mining, Sydney, NSW, Australia, 2015, pp. 705–714.
and Biology Society (EMBC), 2014 36th Annual Inter- 47. T. Pham, T. Tran, D. Phung, S. Venkatesh, Deepcare: a
national Conference of the IEEE, IEEE, 2014, August, deep dynamic memory model for predictive medicine, in:
pp. 3957–3960. Pacific-Asia Conference on Knowledge Discovery and Data
34. S. Kearnes, K. McCloskey, M. Berndl, V. Pande, P. Ri- Mining, Springer, Cham, 2016, April, pp. 30–41.
ley, Molecular graph convolutions: moving beyond fin- 48. R. Miotto, L. Li, B.A. Kidd, J.T. Dudley, Deep patient: an
gerprints, Journal of Computer-Aided Molecular Design unsupervised representation to predict the future of pa-
30 (8) (2016) 595–608. tients from the electronic health records, Scientific Reports
35. C. Angermueller, H. Lee, W. Reik, O. Stegle, Accurate pre- 6 (2016) 26094.
diction of single-cell DNA methylation states using deep 49. Z. Liang, G. Zhang, J.X. Huang, Q.V. Hu, Deep learning for
learning, BioRxiv, 055715, 2017. healthcare decision making with EMRs, in: Bioinformatics
36. S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, D. Feng, Early and Biomedicine (BIBM), 2014 IEEE International Confer-
diagnosis of Alzheimer’s disease with deep learning, in: ence on, IEEE, 2014, November.
Biomedical Imaging (ISBI), 2014 IEEE 11th International 50. D. Quang, Y. Chen, X. Xie, DANN: a deep learning ap-
Symposium on, IEEE, 2014, April, pp. 1015–1018. proach for annotating the pathogenicity of genetic vari-
37. T. Brosch, R. Tam, Alzheimer’s Disease Neuroimaging Ini- ants, Bioinformatics 31 (5) (2014) 761–763.
tiative, Manifold learning of brain MRIs by deep learning,
51. Z. Che, D. Kale, W. Li, M.T. Bahadori, Y. Liu, Deep com-
in: International Conference on Medical Image Comput-
putational phenotyping, in: Proceedings of the 21st ACM
ing and Computer-Assisted Intervention, Springer, Berlin,
SIGKDD International Conference on Knowledge Discov-
Heidelberg, 2013, September, pp. 633–640.
ery and Data Mining, ACM, 2015, August, pp. 507–516.
38. A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, M.
52. R. Miotto, L. Li, J.T. Dudley, Deep learning to predict pa-
Nielsen, Deep feature learning for knee cartilage segmen-
tient future diseases from the electronic health records, in:
tation using a triplanar convolutional neural network, in:
European Conference on Information Retrieval, Springer,
International Conference on Medical Image Computing
Cham, 2016, March, pp. 768–774.
and Computer-Assisted Intervention, Springer, Berlin, Hei-
53. V. Jindal, J. Birjandtalab, M.B. Pouyan, M. Nourani, An
delberg, 2013, September, pp. 246–253.
adaptive deep learning approach for PPG-based identifi-
39. J.Z. Cheng, D. Ni, Y.H. Chou, J. Qin, C.M. Tiu, Y.C. Chang,
C.M. Chen, Computer-aided diagnosis with deep learning cation, in: Engineering in Medicine and Biology Society
architecture: applications to breast lesions in US images (EMBC), 2016 IEEE 38th Annual International Conference
and pulmonary nodules in CT scans, Scientific Reports 6 of the, IEEE, 2016, August, pp. 6401–6404.
(2016) 24454. 54. Z.C. Lipton, D.C. Kale, C. Elkan, R. Wetzel, Learning
40. V. Gulshan, L. Peng, M. Coram, M.C. Stumpe, D. Wu, A. to diagnose with LSTM recurrent neural networks, arXiv
Narayanaswamy, et al., R. Kim, Development and valida- preprint, arXiv:1511.03677, 2015.
tion of a deep learning algorithm for detection of diabetic 55. A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, M.
retinopathy in retinal fundus photographs, JAMA 316 (22) Nielsen, Deep feature learning for knee cartilage segmen-
(2016) 2402–2410. tation using a triplanar convolutional neural network, in:
41. A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. International Conference on Medical Image Computing
Blau, S. Thrun, Dermatologist-level classification of skin and Computer-Assisted Intervention, Springer, Berlin, Hei-
cancer with deep neural networks, Nature 542 (7639) delberg, 2013, September, pp. 246–253.
(2017) 115. 56. A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M.
42. E. Choi, M.T. Bahadori, A. Schuetz, W.F. Stewart, J. Sun, Blau, S. Thrun, Dermatologist-level classification of skin
Doctor AI: predicting clinical events via recurrent neural cancer with deep neural networks, Nature 542 (7639)
networks, in: Machine Learning for Healthcare Conference, (2017) 115.
2016, December, pp. 301–318. 57. B. Alipanahi, A. Delong, M.T. Weirauch, B.J. Frey, Predict-
43. D.R. Kelley, J. Snoek, J.L. Rinn, Basset: learning the regu- ing the sequence specificities of DNA- and RNA-binding
latory code of the accessible genome with deep convolu- proteins by deep learning, Nature Biotechnology 33 (8)
tional neural networks, Genome Research 26 (7) (2016) (2015) 831.
990–999. 58. A. Sathyanarayana, S. Joty, L. Fernandez-Luque, F. Ofli, J.
44. J. Zhou, O.G. Troyanskaya, Predicting effects of noncoding Srivastava, A. Elmagarmid, S. Taheri, Correction of: sleep
variants with deep learning-based sequence model, Nature quality prediction from wearable data using deep learning,
Methods 12 (10) (2015) 931. JMIR mHealth and uHealth 4 (4) (2016).
CHAPTER 7 Efficient Deep Learning Approaches for Health Informatics 137
59. F. Dernoncourt, J.Y. Lee, O. Uzuner, P. Szolovits, De- 15th ACM/IEEE International Conference on, IEEE, 2016,
identification of patient notes with recurrent neural net- April, pp. 1–12.
works, Journal of the American Medical Informatics Asso- 63. N.Y. Hammerla, S. Halloran, T. Ploetz, Deep, convolu-
ciation 24 (3) (2017) 596–606. tional, and recurrent models for human activity recog-
60. H.Y. Xiong, B. Alipanahi, L.J. Lee, H. Bretschneider, D. nition using wearables, arXiv preprint, arXiv:1604.08880,
Merico, R.K. Yuen, et al., Q. Morris, The human splicing 2016.
code reveals new insights into the genetic determinants of 64. A. Nguyen, J. Yosinski, J. Clune, Deep neural networks
disease, Science 347 (6218) (2015) 1254806. are easily fooled: high confidence predictions for unrec-
61. N.D. Lane, P. Georgiev, Can deep learning revolutionize ognizable images, in: Proceedings of the IEEE Confer-
mobile sensing?, in: Proceedings of the 16th International ence on Computer Vision and Pattern Recognition, 2015,
Workshop on Mobile Computing Systems and Applica- pp. 427–436.
tions, ACM, 2015, February, pp. 117–122. 65. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.
62. N.D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, Goodfellow, R. Fergus, Intriguing properties of neural net-
L. Qendro, F. Kawsar, DeepX: a software accelerator for works, arXiv preprint, arXiv:1312.6199, 2013.
low-power deep learning inference on mobile devices, in:
Information Processing in Sensor Networks (IPSN), 2016
CHAPTER 8
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00015-4 139
Copyright © 2019 Elsevier Inc. All rights reserved.
140 Deep Learning and Parallel Computing Environment for Bioengineering Systems
MYCIN, in simple words, advised diverse systems of encouraged by examining the human brain, which col-
antibiotic treatments for patients. At this time, AI al- lects multiple signals as input, creating an agglomera-
gorithms were more manual kind of feature extraction tion using weights, and subsequently transferring them
techniques and then became supervised learning tech- through nonlinear methods to create output signals.
niques. Later on in 2015, symbolic research was done in Amazing fact – It is said that AI will capture around 40
the field of unsupervised learning and its applications to 50% of jobs in the next coming years.
in the medical industry.
8.6.3 Deep Neural Network you to prove that you’re not a bot by choosing a typical
Deep is a word we have all been reading for a long time. type of photos such as street signs, etc.? Yes, that’s what
Let us understand why it is called deep neural network. the DNN can do on its own. And much more with ex-
The past rules developed for networks such as percep- tremely complicated data, analyzing which the human
tron, HNN, etc., worked on the concept of one input brain can take as long as a decade.
layer and one output layer.
But, if it’s more than 3 layers, including the output
and input layers, then it is called a deep neural net- 8.7 DEEP LEARNING IN MEDICAL IMAGING
work. Hence, deep in its rawest form is more than just [5]
one hidden layer. In DNNs, every layer of nodes [9] 8.7.1 Diabetic Retinopathy
trains on a different set of features based on the previ- It is a disease caused in the eye as a result of diabetes,
ous layer’s output. So, the more you go into the net, the causing blindness as time passes by. Among people suf-
more you can advance into it, and the more complex fering from diabetes at least 15% (according to sources)
features the nodes can recognize [12]. This is because it have risk of vision impairment. Manual process of DR
recombines and learns from the features of the previous is very tedious and complicated at the same time. In the
layer. This is what is called future hierarchy. These nets early stages, this disease hardly shows any symptoms. So
allow complex nonlinear relationships. The major ad- due to lack of expertise and equipment, even clinicians
vantage which DNNs hold is the ability of dealing with sometimes make mistakes [1]. But thanks to deep learn-
unstructured and unlabeled data which is actually most ing, automated detection of DR is possible, and with
of the world’s data. Especially in the medical field, most high accuracy, too. Below shown is a research work of
of the data is unstructured and unlabeled. So DNNs are the same [2].
apt for this large amount of data and its processing. See The classification and detection of moderate and
Fig. 8.3. worse referable of Messidor-2 dataset and implementa-
So, basically, deep learning networks can take all the tion of (DCNN) on EyePACS-1 dataset was conducted
raw, unlabeled, unstructured data, and cluster it, as well by Gulshan et al., where 874 patients provided 1700
as process, to similar forms. A simple example is that images to the Messidor-2 dataset and approximately
deep learning can take a billion images, and cluster 10,000 retinal images to the EyePACS-1. A 96.1% sen-
them according to their similarities: dogs in one cor- sitivity and 93.9% specificity on Messidor-1, and 97.5%
ner, chips in another, and in the ultimate one, all the sensitivity and 93.4% specificity on EyePACS-1, respec-
pictures of a car. This is the basis of the so-called smart tively, was claimed by the authors. The testing on pub-
photo albums. Have you ever encountered a situation lically available datasets for classification of fundus
while browsing the internet wherein the website asks was conducted by Kathirvel who trained a DCNN with
CHAPTER 8 Deep Learning and Semi-Supervised and Transfer Learning Algorithms 143
TABLE 8.1
Details of diabetic retinopathy.
Authors Model Dataset accuracy:acc or
sensitivity:sensi or
specificity:spec (%)
Gulshan et al. Deep Convolutional Neural Net- EyePACS-1 Messidor-2 97.5% sensi & 93.4% spec
work 96.1% sensi & 93.9% spec
Kathirvel CNN with dropout layer Kaggle-fundus, DRIVE and STARE 94–96%
Pratt et al. Cu-DCNN library Kaggle 75% acc
Haloi etal. Five layers CNN Massidor ROC 98% AUC 97% AUC
Alban et al. DCNN EyePACS 45% acc
Lim et al. DCNN DIARETDB1 SiDRP –
dropout layer techniques as well. The accuracy reported The research here involves a finite difference time
was up to 94–96%. domain (FTDT) data, set side by side numerical simula-
Haloi claimed sensitivity, specificity, accuracy and tion of tumor models [6] in the adipose tissue. In order
area under the curve (AUC) up to 97%, 96%, 96% and to reduce the dimensionality of the FTDT, the backscat-
0.988, respectively, on Maddissor dataset and AUC up ter signals are preprocessed and the features are mined.
to 0.98 on ROC dataset, by implementing five layer Now, the features which were mined are considered as
CNN [3] with a dropout mechanism for discovery of the input for the trained deep learning based classifier.
early stage DR on Retinopathy Online Challenge (ROC) Feature mining over here was completed [6] with the
[1] and Massidor datasets. Alban used CNNs for detec- help of principal component analysis (PCA). What it
tion of DR and also de-noised the angiograph images does is that it evades the necessity of handcrafting fea-
of EyePACS. A diagnosis of five classes’ severities was
tures for the input signals and hence can generally be
conducted, which provided 79% AUC and 45% accu-
applied to any sort of signal. Short-time Fourier trans-
racy. Lim et al. drew features from identified regions
form was also used. Two different deep learning archi-
utilizing the method proposed; then the feature vector
tectures were used here:
was passed to a DCNN for classification. The model on
DIARETDB1 and SiDRP datasets was realized. All the 1. Regular, or vanilla, feedforward deep neural net-
above works are summarized in Table 8.1. works (DNNs),
2. Convolutional neural networks (CNNs).
8.7.2 Cardiac Imaging Below is an extract from the research work by
In the arena of cardiac imaging, deep learning has un- Branislav Gerazov and Raquel C. Conceicao:
equivocally showed very satisfying results with great ac- “The experiments for optimizing the classifier
curacy. Especially considering calcium score quantifica- hyper parameters and assessing their
tion, MRI and CT scans were the largely utilized imag- performance were accomplished using
ing modalities. Physical detection of CAC in CT scans
10-fold cross-validation on randomized
needs extensive expert interaction, which makes it time-
consuming and not feasible when it comes to large-scale subsets of the whole dataset, without keeping
or epidemiological studies. together data from the same model. To take
into account the unbalanced count of the two
8.7.3 Tumor Classification in Homogeneous classes in the dataset, we used stratified folds
Breast Tissue which preserve the percentage of both
Due to the dielectric constant between malignant tu- classes. The hyper parameters were changed
mors and adipose breast tissue, breast cancer has seen
in the following ranges. The length of the
remarkable differences and improvements, and this is
purely due to deep learning. In this category of tumor feature vector as extracted using PCA was
classification, the research work done is presented be- varied in the range: 10–100. For the CNN
low. architecture, we generated spectrograms
144 Deep Learning and Parallel Computing Environment for Bioengineering Systems
using a 213 ps frame length, Hamming jor issue, which deep learning has to come through,
window, and 256 Fast Fourier Transform and it’s called the black box problem. Even though
bins.” the math which was used to make a neural network
Table 8.2 shows the original results obtained. They is fairly direct, comprehending as to how the output
were obtained by first down-sampling and trimming the was finally obtained is not as easy. What it means is
signals. In the first approach, around 30 PCA compo- that the model of machine learning gets its input, pro-
nents were extracted and used as an input of an RBF cesses and identifies patterns, but how the model works,
kernel SVM classifier. how the processing takes place is utterly complicated.
All the researchers using it do not know the way the
model works and processes, or why it provides better
TABLE 8.2 results.
Classification accuracy obtained for the binary
task of detecting tumor malignancy. 8.8.2 Semi-Supervised and Transfer
Approach Accuracy Learning Algorithms
PCA30 + SVM (CT) 89.20% We have had a look at deep learning models, their im-
DWT + SVM (CSCT) 91.19% plementation, and their contribution to the medical im-
PCA50 + DNN (2 × 300 + 1) 92.81%
age field, etc. Now let us see the developments made by
semi-supervised and transfer learning techniques in the
Spectrograms + CNN (4 × 20 + 9 × 300) 89.58%
field of medical imaging [13].
DNN (2 × 300) + SVM 93.44%
8.8.2.1 Semi-Supervised Learning
Let’s go through the concepts of supervised and unsu-
In the final step, they used around 300 outputs of pervised learning again briefly.
the penultimate layer of the finest performing DNN and
8.8.2.2 Supervised Learning
used them as an input to an SVM classifier. The SVM
This kind of learning involves the algorithm learning to
classifier showed 93.44% of accuracy.
assign labels to types of data inputs based on the labels
that were inputted by a human during the process of the
8.8 DEVELOPMENTS IN DEEP LEARNING training.
METHODS
8.8.2.3 Unsupervised Learning
Figuring out in medical or specific image data is not al-
This algorithm doesn’t involve any guidance from the
ways viable. What this means here is that there might
user. It analyzes the data and then sorts inherent simi-
be a rare disease or lack of an expert, although most
larities between the input samples.
deep learning methods stress more on supervised learn-
So, it can be quite obviously presumed that semi-
ing. To overcome the issue of lack of big data, there
supervised learning is a hybrid version. Whatever chal-
is a change required from supervised to unsupervised
lenges are somewhat faced in each type, semi-supervised
or semi-supervised learning. In spite of significant ef-
provides a win–win situation.
forts, deep learning theories have been unable to pro-
vide exhaustible solutions, rendering many questions Let’s go ahead and define semi-supervised clustering
unanswered, which extend the unlimited opportunity then.
for improvisation and growth. Semi-supervised learning makes use of both labeled
and unlabeled data. So with the use of some labeled and
8.8.1 Black Box and Deep Learning unlabeled data, the accuracy of the decision boundary
Around 10 years ago, when medical imaging came into becomes much higher. See Fig. 8.4. The advantages of
existence, it broke out to the world and gave rugged using supervised learning are:
solutions to a lot of unsolved mysteries in the medi- 1. Labeled data is often expensive and difficult to
cal industry. We can’t deny that medical imaging has find;
genuinely solved multiple problems which were pre- 2. The model becomes more robust by using a more
sumed to be impossible to solve. There is still a ma- precise decision boundary.
CHAPTER 8 Deep Learning and Semi-Supervised and Transfer Learning Algorithms 145
TABLE 8.3
The parameters for regular learning techniques.
Learning techniques Bladder Prostate Rectum
Supervised 0.952 (0.007) 0.891 (0.019) 0.884 (0.027)
Conventional 0.960 (0.006) 0.895 (0.024) 0.884 (0.031)
Semi-supervised 0.963 (0.007) 0.903 (0.022) 0.890 (0.029)
The function can be shown in the formulated form regulated learning calculation was performed for 6 cy-
as cles. The exhibitions of three calculations are presented
in the table shown below. The outcomes demonstrate
y = F (x, W ) + x that, by utilizing semi-supervised learning, to figure out
where x and y denote the input and output, respectively, how to expand the training data could better enhance
and W signifies the parameters present in the block. the segmentation performance. Likewise, continuously
Once feature learning is over, the feature maps are including the unlabeled information into the prepara-
directed to go into 4 tasks: 3 regression tasks and one tion set for refreshing the system parameters can per-
segmentation task. form superior to the regular learning techniques. See
Table 8.3.
Semi-Supervised Stage Now in the semi-supervised
learning stage, training of the network θ on set L = {I ,
S}, which comprises N MR images I = {I1 , I2 , ..., IN } 8.9 TRANSFER LEARNING
and their corresponding segmentation maps S = {S1 , Another very important machine learning method is
S2 , ..., SN }. Coming to the semi-supervised stage, what called the transfer learning method. In this learning
appears to be is an unlabeled data set U = {I , U }. With approach, a model established for a particular task is
this, a semi-supervised algorithm has the possibility to reused as the initial point for a model on a second task.
be made, comprising the inputs θ, L, U , along with out- The basic idea behind it is to use the information ob-
put of updated θ, in order to gradually train the network tained from tasks for which labeled data is accessible in
with unlabeled data. settings where only little labeled data is accessible. Cre-
The algorithm for the same is shown below. ation of labeled data can be expensive, hectic and a very
tedious task. So through this algorithm, creation of la-
Input: θ , L, U beled data is not required. See Fig. 8.5.
Output: updated θ Multi-task learning and featuriser are the two main
1: while len(U ) > 0 do requirements in transfer learning and its architecture.
2: Estimate SU by θ
3: Move k best pairs (IU , SU ) from U to L 8.9.1 Transfer Learning in Image Data
4: Optimize θ by training on L One of the most common applications of transfer learn-
5: return θ ing is with the usage of image data. Whether it is pre-
dictive modeling or some imaging application, transfer
The training method can be used as per choice. In the learning gives rugged solutions with high precision.
work performed especially in this experiment, the net- This can include any prediction task which includes
work parameters were initialized by Xavier Algorithm, photographs or videos taken as an input. These prob-
updated by backpropagation algorithm using Adam op- lem statements use a deep learning model, pre-trained
timizer. for a large and challenging image classification task.
The system parameters were instated by a similar This approach is quite useful as the images were trained
kind of training set, including 30 labeled MR pictures, on a large dataset of photographs or videos and re-
and after that, refurbished with another 30 unlabeled quire the model to make predictions on a large num-
MR pictures with various semi-supervised learning tech- ber of classes, demanding that the model learn to ex-
niques. The parameter k in the algorithm given was set tract features from photographs for solving the prob-
to 5, and the substitute updating in the traditional semi- lem.
CHAPTER 8 Deep Learning and Semi-Supervised and Transfer Learning Algorithms 147
8.9.1.1 Mechanisms of Deep Transfer Learning These ROIs were down-sampled to a communal size
for Medical Imaging and were bonded into 2 classes based on their intersec-
Transfer learning has been successfully used and appre- tion with ground truth annotations. DSC (dice similar-
ciated in situations where data is not in abundance. ity coefficient) is used as the metric.
Below shown are the results of a research done in the
field of medical imaging using transfer learning. 8.9.1.3 Transferred Learned Features
In this research, the authors considered a set of im- 1. Full network adaptation (CaffeNet_FA).
ages to be trained, and built classifiers to segregate 2. Partial network adaptation (CaffeNet_PA).
between the kidney and non-kidney regions [17]. For 3. Zero network adaptation (CaffeNet_NA).
a single test image, to detect or to find the best kid-
ney region of interest (ROI) S* from a set of candi- 8.9.1.4 Traditional Texture Feature Results
date ROIs {S} is done in two steps. The whole set Haar features are reported to have the best performance
{S} is distributed over the classifier models and the for kidney detection. Hence, this study in particular
candidates with positive class labels Y are retained used Haar features.
(Eq. (8.1)). The ROI with largest possibility L from the To quantitatively evaluate the performance on 45
set {S + } is chosen as the identified kidney region validation images, two metrics were used: (i) the num-
(Eq. (8.2)): ber of localization failures, i.e., the number of images
for which the dice similarity coefficient between de-
{Y, L} = MLClassifier(S) & {S+ ∈ S|Y = 1}, (8.1) tected kidney ROI and ground truth annotation was <
S∗ = argmax(L+), where L+ = L(S+). (8.2) 0.80, and (ii) detection accuracy, i.e., the average dice
overlap across 45 images between detection results and
CNNs are employed as feature extractors to enable rela- ground truth.
tionships between traditional texture features. Table 8.4 shows the results in its most accurate
form.
8.9.1.2 Dataset and Training Fig. 8.7 shows the results. In (A) and (B), the base-
In total, 90 long-axis kidney images were acquired on line method was affected by the presence of diaphragm,
GE Healthcare LOGIQ E9 scanner, split into two equal kidney and liver boundaries, creating a texture similar
and distinct sets, for training and validation. The images to renal-sinus portion, while CaffeNet had excellent lo-
were attained at variable depths of ultrasound acquisi- calization. In (C) and (D), CaffeNet resulted in over-
tion fluctuating between 9 and 16 cm. Precise rectangu- segmentation. Illustrating that limited data problems
lar ground truth kidney ROIs were manually marked by require careful feature engineering, incorporating do-
a clinical expert. See Fig. 8.6. main knowledge still carries a lot of relevance.
148 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 8.4
Results of traditional texture features.
Method Haar features CaffeNet_NA CaffeNet_PA CaffeNet_FA Haar + CaffeNet_FA
Average dice overlap 0.793 0.825 0.831 0.842 0.857
No. of failures 12/45 12/45 11/45 10/45 3/45
Images of lower magnifying factors were trained in set comprises 187 images of benign and 410 images of
this research in order to identify the region of interest malignant tumors.
(ROI) in the whole image: 625 imageries of benign and
1397 imageries of malignant tumor were taken in con- 8.9.2.2 Preprocessing
sideration here. The training set comprises 438 images To complement for the lack of data in the training im-
of benign, 960 images of malignant, and the validation ages, data augmentation techniques were used. This was
CHAPTER 8 Deep Learning and Semi-Supervised and Transfer Learning Algorithms 151
done by rotating the images by certain angles, mirroring retinopathy screening, in: AAAI Workshop: Modern Artifi-
them, and even randomly distorted images were added cial Intelligence for Health Analytics, 2014, June.
to the original data set. Now, the total was 11,184 im- 3. H. Pratt, F. Coenen, D.M. Broadbent, S.P. Harding,
ages. Y. Zheng, Convolutional neural networks for dia-
betic retinopathy, Procedia Computer Science 90 (2016)
8.9.2.3 Transfer Learning Part 200–205.
4. R. Zhu, R. Zhang, D. Xue, Lesion detection of endoscopy
Deep CNN and ConvNet model were built to clas-
images based on convolutional neural network features,
sify breast cancer histopathological images to malignant in: Image and Signal Processing (CISP), 2015 8th Interna-
and benign classes. Transfer learning was applied here tional Congress on, IEEE, 2015, October, pp. 372–376.
to overcome the issue of insufficient data and training 5. M.I. Razzak, S. Naz, A. Zaib, Deep learning for medi-
time. See Fig. 8.8. cal image processing: overview, challenges and the fu-
A pre-trained Google’s Inception v3 was utilized us- ture, in: Classification in BioApps, Springer, Cham, 2018,
ing Python API. See Fig. 8.9. pp. 323–350.
6. C. DeSantis, J. Ma, L. Bryan, A. Jemal, Breast cancer statis-
8.9.2.4 Results tics, 2013, CA: A Cancer Journal for Clinicians 64 (1)
• Training Accuracy and cross entropy (2014) 52–62.
• Cross-entropy. See Fig. 8.10. 7. Risk of breast cancer, http://www.breastcancer.org/
• Accuracy of training model. See Fig. 8.11. symptoms/understand_bc/risk/understanding, 2016.
8. Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature
From the results shown above, it’s clear that accuracy
521 (7553) (2015) 436.
increases as the training proceeds. See Table 8.6. Cross- 9. L.V. Fausett, Fundamentals of Neural Networks: Architec-
entropy was used as a cost function, which is calculated tures, Algorithms and Applications, vol. 3, Prentice-Hall,
as Englewood Cliffs, 1994.
10. N. Tajbakhsh, J.Y. Shin, S.R. Gurudu, R.T. Hurst, C.B.
H (x) = H (p) = − p(xi) log (p(xi))
Kendall, M.B. Gotway, J. Liang, Convolutional neural net-
i
works for medical image analysis: full training or fine tun-
ing? IEEE Transactions on Medical Imaging 35 (5) (2016)
TABLE 8.6
1299–1312.
The classification of accuracy for different 11. J. Ker, L. Wang, J. Rao, T. Lim, Deep learning applications in
cut-off values. medical image analysis, IEEE Access 6 (2018) 9375–9389.
Classification accuracy 12. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning
Cut-off Benign Malignant for image recognition, in: Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2016,
0.3 0.74 0.93 pp. 770–778.
0.4 0.83 0.89 13. M. Borga, T. Andersson, O.D. Leinhard, Semi-supervised
0.5 0.89 0.82 learning of anatomical manifolds for atlas-based segmen-
tation of medical images, in: Pattern Recognition (ICPR),
0.6 0.91 0.76
2016 23rd International Conference on, IEEE, 2016, De-
cember, pp. 3146–3149.
14. Z. Feng, D. Nie, L. Wang, D. Shen, Semi-supervised learn-
Finally, it can be concluded that Google’s Inception ing for pelvic MR image segmentation based on multi-
v3 model with breast cancer microscopic biopsy images task residual fully convolutional networks, in: Biomedical
and the trained model performed classification with ac- Imaging (ISBI 2018), 2018 IEEE 15th International Sym-
curacy of 0.83 for the benign class and 0.89 for the posium on, IEEE, 2018, April, pp. 885–888.
malignant class. 15. https://machinelearningmastery.com/transfer-learning-
for-deep-learning.
16. https://www.datacamp.com/community/tutorials/
REFERENCES transfer-learning.
1. T. Chandrakumar, R. Kathirvel, Classifying diabetic 17. C.K. Shie, C.H. Chuang, C.N. Chou, M.H. Wu, E.Y. Chang,
retinopathy using deep learning architecture, International Transfer representation learning for medical image anal-
Journal of Research in Engineering and Technology 5 (6) ysis, in: Engineering in Medicine and Biology Society
(2016) 19–24. (EMBC), 2015 37th Annual International Conference of
2. G. Lim, M.L. Lee, W. Hsu, T.Y. Wong, Transformed repre- the IEEE, IEEE, 2015, August, pp. 711–714.
sentations for convolutional neural networks in diabetic
CHAPTER 9
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00016-6 153
Copyright © 2019 Elsevier Inc. All rights reserved.
154 Deep Learning and Parallel Computing Environment for Bioengineering Systems
with involves two things. First, we need to turn each in- evaluation metric like mean squared error, confusion
put object, which is often called the sample, into a set matrix, ROC curve, etc. The experiment carried out in
of features that describe the purpose. Second, we need the present research work helps the learners in identify-
to pick a learning model, typically the type of classifier ing the past research contributions and current research
that learns the system. Often the process of address- gap as well as in conducting an experiment when ana-
ing machine learning involves an iterative process. The lyzing the performance of an algorithm in R language.
present work is focused on finding out the performance This paper is organized as follows. In the introduction,
of machine learning algorithms and rating them using we highlight the importance of applying MLA in real
evaluation parameters, as well as making the reader un- time. In the methodology section, we aim to give an in-
derstand how a model can be fitted on real-time data troduction about the MLA with past and future trends,
and how to perform analysis similarly as plotted in and evaluate the performance of each MLA using evalu-
Fig. 9.1. In the past, the authors were able to demon- ation metrics. In the results section, a predictive model
strate the implementation of two or three algorithms is built, and we measure the performance of that model
each when addressing the problems in classification and on Iris dataset in detail.
regression [1], whereas in the present research work we
aim to demonstrate linear and nonlinear versions of
regression and classification algorithms for addressing 9.2 METHODOLOGY
each type of problem using the dataset, and the details The experiments carried out throughout this research
are explained in next section. This research work aims to use two datasets: Longley’s Economic Regression Data,
identify the problems faced by beginners who want to consisting of seven variables observed from 1947 to
learn the concepts in machine learning algorithms (re- 1962 and used in predicting the number of people em-
gression and classification), apply the same on real-time ployed yearly [2], and the Iris Dataset available in R
dataset (IRIS), fit a model, evaluate the performance of Dataset Package. Longley’s dataset is used in evaluat-
a model and rate the performance of a algorithm using ing linear and nonlinear regression algorithms, whereas
CHAPTER 9 Survey on Eval. the Performance of MLAs: Past Contributions and Future Roadmap 155
Iris dataset is used in evacuating linear and nonlinear training [3,4], instead of assigning weights manually, to-
classification algorithms (data(iris), data(longley)). The wards extracting sentiments from text, by constructing
R packages and libraries used in the experiment con- 250 rules using “AND”, “OR”, “NOT” connectors for
ducted are: each emotion in FuzzySet application available in MAT-
1. library(pls), library(ipred) LAB. The drawback in this research work is that only text
2. library(earth), library(randomForest) data is considered, whereas in [5] a detailed comparison
3. library(kernlab), library(gbm) is made among predictive models (moving average with
4. library(caret), library(Cubist)
period 3) on time series dataset. The drawback in this re-
5. library(nnet), library(MASS)
search work is that only a single instance of time series
6. library(rpart), library(GGally)
data is considered. In [6], an analysis on PIMA diabetes
7. library(party), library(mda)
dataset is conducted and prediction is made based on
8. library(RWeka), library(klaR)
The evaluation parameter is the mean squared error for insulin. The drawback in this work is that a generalized
comparing the performance of linear and nonlinear re- linear model is used in deciding the important features,
gression algorithms. Similarly, in evaluating the perfor- whereas in [7] a gradient ascent algorithm (incremen-
mance of linear and nonlinear classification algorithms, tal model) is used to find the exact weights of the terms
we computed a confusion matrix and user precision, used in determining the sentiment of the tweet and uti-
recall, and F-score as the evaluation metrics. Weighted lize a boosting approach to improve the accuracy of a
fuzzy logic is used in assigning weights to the data while linear classifier. See Table 9.1.
TABLE 9.1
Comparative study of recent trends in machine learning technology.
Reference Focused on Approach Future scope
[8] Random forest Combination with the To relieve pedestrian safety
cross-validation issues
[9] Predictive approach to Reasonable suspicion and To update the existing data
knowledge and investigation collective interests protection legal framework
[10] Newton–Raphson Proposed functional networks Mixture models simulation
maximum-likelihood based on propensity score and studies
optimizations as a new Newton–Raphson
large-scale machine learning maximum-likelihood
classifier
[11] ICT-producing firms in the UK Developed a novel Finds ICT employment shares
sector-product approach and over double the conventional
used text mining to provide estimates
further detail on key
sector-product cells
[12] To understand the relationships The knowledge fusion taxonomy Improving NPS is not automatic
among traditional marketing and requires strategic choices to
analytics (TMA) obtain its benefits
[13] A priori-based hierarchical Presenting SUBSCALE, a novel This algorithm scales very well
clustering clustering algorithm to find with the dimensionality of the
nontrivial subspace clusters with dataset and is highly
minimal cost parallelizable
[14] Data preprocessing algorithms Illustrative study on two different To connect and accommodate
datasets, banana and sonar several data pre-processing
algorithms
[15] Machine learning techniques Literature survey To analyze machine learning
with modern signal processing
technologies
156 Deep Learning and Parallel Computing Environment for Bioengineering Systems
In [21], the authors formulated a new version of SVM called instance- or memory-based supervised learning.
for a classification problem with two classes, whereas in The k in k-NN refers to the number of nearest neighbors
[9] a new version of SVM was given that can effectively the classifier will retrieve and use to make its prediction.
perform active learning when determining the topic of a In particular, the k-NN algorithm has three steps that
particular document. In [10], an SVM was used in devel- can be specified. First, when given a new previously un-
oping a computer-aided diagnosis (CAD) system which seen instance of something to classify, a k-NN classifier
can perform analysis on MR images, helping in medi- will look into its set of memorized training examples
cal interpretation. In [22], a combined SVM and fuzzy to find the k examples that have closest features. Sec-
inference system was suggested when making a decision ond, the classifier will look up the class lab els for those
of placing vehicles in a lane. In our experiment, SVM has k-nearest neighbor examples. Generally, to use the near-
achieved 0.1448464 of MSE, which is far better than the est neighbor algorithm, one should specify four things.
other two linear regression models and one nonlinear First, one needs to define what distance means in the fu-
regression model. ture space, to properly select the nearby neighbors, e.g.,
one can use the simple straight line, or Euclidean dis-
9.4.2 K-Nearest Neighbors tance, to measure the distance between points. Second,
The K-nearest neighbors algorithm can be used for clas- one needs to tell the algorithm how many of these near-
sification and regression. The k-NN classifiers are often est neighbors to use in making a prediction. Third, one
158 Deep Learning and Parallel Computing Environment for Bioengineering Systems
has to determine which neighbors have more influence programmer while constructing the neural network. In
on the outcome. Finally, giving labels to the k nearby [27], an artificial intelligence system is proposed, which
points, one has to specify how to combine them to can measure heart rate variability with good accuracy.
produce a final prediction. The most common distance In [28], the authors aimed to study the understanding
metric is the Euclidean straight line distance. Euclidean the emotions of others, which helps in critical decision
metric is a particular case of a more general Minkowski making. An NN was used in training the proposed sys-
metric. A straightforward way to assess if the classifier is tem. In [18], the NN approach was used in forecasting
likely to be good at predicting the label of future, previ- photovoltaic energy in terms of sensitivity, and achieved
ously unseen data instances is to compute the classifier’s good accuracy of 99.1% compared to other MLAs used
accuracy on the test set data items. The accuracy is de- in the experiment. In [29], a fuzzy inference system with
fined as the fraction of test set items, whose true label a conventional neural network was combined in con-
was correctly predicted by the classifier, to get a more re- trolling the state of a thermostat. In the experiment con-
liable estimate of likely future accuracy for a particular ducted, NN had achieved 0.0002765618 of MSE, which
value of k. The best choice of the value of k, that is, the is better than for all nonlinear regression algorithms dis-
one that leads to the highest accuracy, can vary greatly cussed so far.
depending on the data set. In general, with k-nearest
neighbors, using a larger k suppresses the effects of
noisy individual labels. Consider that there are (xn , yn ) 9.5 NONLINEAR DECISION TREE
points in the given dataset and that x1 be the new point REGRESSION
that is to be classified based on the k value as in
9.5.1 Regression With Decision Trees
d x, x ) = xi − xj Decision trees are used to solve both classification
d and regression problems in the form of trees that can
(9.7)
= xi k − xj k)2 , where x, x ∈ Rd be incrementally updated by splitting the dataset into
i=1 smaller datasets (numerical and categorical), where the
results are represented in the leaf nodes. The compara-
In [23], Mahanalobis distance was used as a met- tive results are presented in Table 9.2, and a complete
ric in addressing the multi-classification problem and summary is provided in Fig. 9.3.
achieved 1.3% of test error. In [24], neighbor-weighted
K-nearest neighbor algorithm was proposed, in which
Algorithm 1 Greedy decision tree learning algorithm.
big weights were assigned to small class neighbors, and
low weights were assigned to large class neighbors. 1: procedure R EGRESSION WITH D ECISION
In [25], graphical processing unit (GPU) was used in T REES(empty tree, data)
evaluating the distance between points with an aim to 2: Start
optimize the time taken to perform the necessary oper- 3: Consider an empty tree
ations. In [26], KNN was used on Hadoop environment 4: Select a feature to split data. This selection is
in addressing classification problem. In the experiment based on the importance of an attribute, which is
conducted, KNN has achieved 0.9259962 of MSE. From evaluated using GLM model.
the results obtained, it is clear that in evaluating the per- 5: F or each split of a tree check whether is there
formance of KNN, MSE is not the only parameter and something to make the prediction.
that there is need a to learn other evaluation metrics. 6: else Go to step 3 and continue the split.
7: endf or
9.4.3 Neural Network 8: Stop
The behavior of the neuron with only two states is as in
the following equation:
n 9.5.2 Random Forest
y= xi wi Random forest is a versatile machine learning method
i=1 (9.8) capable of performing both regression and classifica-
{if y ≥ T output is 1, otherwise output is 0}, tion tasks. A summary of random forest performance
is presented in Fig. 9.4, and the MSE obtained using the
where xi denotes inputs, wi weights associated, Y out- same algorithm is 0.2940453, which is better than for
come, and T is the threshold value provided by the all other decision tree algorithms.
CHAPTER 9 Survey on Eval. the Performance of MLAs: Past Contributions and Future Roadmap 159
TABLE 9.2
Performance of regression algorithms.
Name of the algorithm MSE
Decision Tree 0.4776895
Conditional decision tree 0.4776895
Model trees 0.663211
Rule system 0.663211
Bagging CART 0.663211
plotted in Fig. 9.7, whereas for nonlinear classification Kappa = (ObservedAccuracy − ExpectedAccuracy)
algorithms their respective confusion matrices are plot- /(1 − ExpectedAccuracy). (9.9)
ted in Fig. 9.8. The way to interpret the confusion matrix
is shown in Table 9.2. The total number of instances in The best model is chosen based on accurately pre-
the dataset representing positive (1) and negative (0) dicting ones (sensitivity (TPR)) and zeros (specificity
is 2014. This is represented in the top-left and bottom- (FPR)). The next step is to combine both measures
CHAPTER 9 Survey on Eval. the Performance of MLAs: Past Contributions and Future Roadmap 161
in evaluating the performance of a model using a re- plying classification can be done by constructing the
ceiver operating characteristics (ROC) curve. The area confusion matrix. In Fig. 9.8, the performance of the
under the ROC curve can be used as an evaluation met- nonlinear classification algorithms can be seen. Simi-
ric to measure the efficiency of the predictive model. larly, for a decision tree and nonlinear classification,
On the other hand, the question arises on how to the generated confusion matrix can be analyzed in
handle data which is nonlinear in nature; here ap-
Fig. 9.10, illustrating the efficiency of the predictive
model.
9.8 CONCLUSIONS
In the present research, we aim to make the reader un-
derstand the procedure when applying MLA on both
linear and nonlinear data in addressing regression and
classification problems. The impact of the present re-
FIG. 9.12 Performance of MLA. search is to find the different ways to evaluate a ma-
chine learning algorithm. The results obtained in this
• The performance of LDA algorithm in terms of accu- research are applicable to address the real time prob-
racy is near to one. lems like classification and regression. The findings in
CHAPTER 9 Survey on Eval. the Performance of MLAs: Past Contributions and Future Roadmap 163
our research are that a support vector machine (SVM) 12. Z. Xu, G.L. Frankwick, E. Ramirez, Effects of big data ana-
performs better than all other classification algorithms, lytics and traditional marketing analytics on new product
and the neural network (NN) approach gives the lowest success: a knowledge fusion perspective, Journal of Busi-
mean squared error (MSE) for the regression problem, ness Research 69 (5) (2016) 1562–1566.
13. A. Kaur, A. Datta, A novel algorithm for fast and scalable
which is a metric used in evaluating the performance of subspace clustering of high-dimensional data, Journal of
both classification and regression algorithms. We found Big Data 2 (1) (2015) 17.
that ROC is best for regression, and constructing a con- 14. S. García, J. Luengo, F. Herrera, Tutorial on practical tips of
fusion matrix works best for classification. In the future, the most influential data preprocessing algorithms in data
we would like to work with data having N dimensions mining, Knowledge-Based Systems 98 (2016) 1–29.
on the distributed platform. The performance of each 15. J. Qiu, Q. Wu, G. Ding, Y. Xu, S. Feng, A survey of machine
learning for big data processing, EURASIP Journal on Ad-
of the MLAs is evaluated using a metric like throughput,
vances in Signal Processing 2016 (1) (2016) 67.
response time, overload on each machine in a cluster of 16. K.J. Preacher, P.J. Curran, D.J. Bauer, Computational tools
machines. for probing interactions in multiple linear regression, mul-
tilevel modeling, and latent curve analysis, Journal of Edu-
cational and Behavioral Statistics 31 (4) (2006) 437–448.
REFERENCES 17. H. Tanaka, K. Asai, S. Uejima, Linear regression analysis
with fuzzy model, IEEE Transactions on Systems, Man and
1. L. Bottou, F.E. Curtis, J. Nocedal, Optimization meth-
Cybernetics 12 (6) (1982) 903–907.
ods for large-scale machine learning, SIAM Review 60 (2) 18. K.M. Shaffer, J.M. Jacobs, R.D. Nipp, A. Carr, V.A. Jackson,
(2018) 223–311. E.R. Park, W.F. Pirl, A. El-Jawahri, E.R. Gallagher, J.A. Greer,
2. R. Durrett, Lecture Notes on Particle Systems and Percola- et al., Mental and physical health correlates among fam-
tion, Brooks/Cole Pub Co, 1988. ily caregivers of patients with newly-diagnosed incurable
3. S.M. Basha, Y. Zhenning, D.S. Rajput, N. Iyengar, D. cancer: a hierarchical linear regression analysis, Supportive
Caytiles, Weighted fuzzy rule based sentiment prediction Care in Cancer 25 (3) (2017) 965–971.
analysis on tweets, International Journal of Grid and Dis- 19. Y.-L. He, X.-Z. Wang, J.Z. Huang, Fuzzy nonlinear regres-
tributed Computing 10 (6) (2017) 41–54. sion analysis using a random weight network, Information
4. S.M. Basha, D.S. Rajput, Sentiment Analysis: Using Artifi- Sciences 364 (2016) 222–240.
cial Neural Fuzzy Inference System, IGI Global, 2018. 20. B. Nagy, C. Mânzatu, A. Măicăneanu, C. Indolean, L.
5. S.M. Basha, Y. Zhenning, D.S. Rajput, R.D. Caytiles, N.C.S. Barbu-Tudoran, C. Majdik, Linear and nonlinear regres-
Iyengar, Comparative study on performance analysis of sion analysis for heavy metals removal using agaricus
time series predictive models, International Journal of Grid bisporus macrofungus, Arabian Journal of Chemistry 10
(2017) S3569–S3579.
and Distributed Computing 10 (8) (2017) 37–48.
21. J.A. Suykens, J. Vandewalle, Least squares support vector
6. S.M. Basha, H. Balaji, N.C.S. Iyengar, R.D. Caytiles, A soft machine classifiers, Neural Processing Letters 9 (3) (1999)
computing approach to provide recommendation on pima 293–300.
diabetes, International Journal of Advanced Science and 22. E. Balal, R.L. Cheu, Modeling of lane Changing Decisions:
Technology 106 (2017) 19–32. Comparative Evaluation of fuzzy Inference System, Sup-
7. S.M. Basha, D.S. Rajput, K. Vishu Vandana, Impact of gra- port Vector Machine and Multilayer Feed-Forward Neural
dient ascent and boosting algorithm in classification, In- Network, Tech. rep., 2017.
ternational Journal of Intelligent Engineering and Systems 23. K.Q. Weinberger, L.K. Saul, Distance metric learning for
(IJIES) 11 (1) (2018) 41–49. large margin nearest neighbor classification, Journal of Ma-
8. X. Jiang, M. Abdel-Aty, J. Hu, J. Lee, Investigating macro- chine Learning Research 10 (Feb) (2009) 207–244.
level hotzone identification and variable importance using 24. S. Tan, Neighbor-weighted k-nearest neighbor for unbal-
big data: a random forest models approach, Neurocomput- anced text corpus, Expert Systems with Applications 28 (4)
ing 181 (2016) 53–63. (2005) 667–671.
25. V. Garcia, E. Debreuve, M. Barlaud, Fast k nearest neigh-
9. X. Wu, X. Zhu, G.-Q. Wu, W. Ding, Data mining with big
bor search using gpu, in: Computer Vision and Pattern
data, IEEE Transactions on Knowledge and Data Engineer-
Recognition Workshops, 2008. CVPRW’08. IEEE Computer
ing 26 (1) (2014) 97–107.
Society Conference on, IEEE, 2008, pp. 1–6.
10. E. Elsebakhi, F. Lee, E. Schendel, A. Haque, N. Kathireason, 26. J. Maillo, S. Ramírez, I. Triguero, F. Herrera, kNN-is: an
T. Pathare, N. Syed, R. Al-Ali, Large-scale machine learning iterative spark-based design of the k-nearest neighbors clas-
based on functional networks for biomedical big data with sifier for big data, Knowledge-Based Systems 117 (2017)
high performance computing platforms, Journal of Com- 3–15.
putational Science 11 (2015) 69–81. 27. M. Patel, S.K. Lal, D. Kavanagh, P. Rossiter, Applying neu-
11. M. Nathan, A. Rosso, Mapping digital businesses with big ral network analysis on heart rate variability data to as-
data: some early findings from the UK, Research Policy sess driver fatigue, Expert Systems with Applications 38 (6)
44 (9) (2015) 1714–1733. (2011) 7235–7242.
164 Deep Learning and Parallel Computing Environment for Bioengineering Systems
28. Y. Fan, N.W. Duncan, M. de Greck, G. Northoff, Is there 29. S. Leva, A. Dolara, F. Grimaccia, M. Mussetta, E. Ogliari,
a core neural network in empathy? An FMRI based quan- Analysis and validation of 24 hours ahead neural net-
titative meta-analysis, Neuroscience and Biobehavioral Re- work forecasting of photovoltaic output power, Mathemat-
views 35 (3) (2011) 903–911. ics and Computers in Simulation 131 (2017) 88–100.
CHAPTER 10
DM. Data mining is the path toward discovering outlines in broad educational lists including procedures at the intersection
purpose of machine learning, bits of knowledge, and database frameworks. It is a fundamental method where watchful
systems are associated with removing data patterns. It is an interdisciplinary subfield of PC science. The general goal
of the data mining process is to expel information from an instructive gathering and change it into a legitimate structure
for help utilization. Beside the rough examination step, it incorporates database and data organization perspectives, data
pre-getting ready, model and determination thoughts, interesting quality estimations, versatile quality considerations,
post-treatment of discovered structures, portrayal, and web based refreshing. Data mining is the examination adventure
of the “learning exposure in databases” process, or KDD.
Deep Learning. Deep learning is one of machine learning techniques which are themselves a subset of the more extensive
field of artificial intelligence. Profound learning is a class of machine learning calculations that utilizes a few layers of
nonlinear handling units for highlight extraction and change. Each progressive layer utilizes the yield from the past layer as
information. Profound neural systems, profound conviction systems and repetitive neural systems have been connected to
fields, for example, PC vision, discourse acknowledgment, regular dialect handling, sound acknowledgment, interpersonal
organization separating, machine interpretation, and bioinformatics, where they created results practically identical to and
sometimes superior to anything human specialists have achieved.
Internet-of-Things. The Internet-of-Things (IoT) eludes to a system containing physical part equipped for social occa-
sion and sharing electronic data. The Internet-of-Things incorporates a wide assortment of ‘smart” gadgets, from modern
machines that transmit information about the creation procedure to sensors that track data about the human body. Fre-
quently, these gadgets utilize web protocol (IP), a similar protocol that recognizes PCs over the internet and enables them
to speak with each other. The objective behind the web-of-things is to have gadgets that self-report progressively, en-
hancing proficiency and conveying vital data to the surface more rapidly than a framework relying upon human mediation.
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00017-8 165
Copyright © 2019 Elsevier Inc. All rights reserved.
166 Deep Learning and Parallel Computing Environment for Bioengineering Systems
1. To significantly grasp IoT, and how it works; have the capacity to perceive important data from non-
2. To significantly grasp irregularities of procedures; accommodating data.
3. To develop a model for intrusion detection relying Examination is a part of every process, and this
on bunching and DT and apply it to IoT; chapter shows also how the computations may func-
4. To survey the model; tion. Inside an IoT based organic network, examination
5. To offer proposals on improving the model. overviews and surveys spout data persistently, including
IoT is a promising advancement, yet it is still in its in- chronicled data from databases to achieve the best re-
fancy and faces various hurdles [4]. Immediately, there sults [8]. Since different sorts of data stream need to
is no standard framework for IoT systems. Since IoT is be taken care of in an informative system, in-streams
still being developed, vendors rush to make protests are inspected in packs or gatherings. The result is an
against consent to other dealers’ things to achieve fi- out-stream with significant information. Despite exami-
nancial benefit and push customers toward merchants’ nations having all the earmarks of being a useful device,
safety. Moreover, IoT is heterogeneous which makes there might be regions where it can’t be implemented.
partners working together, directing, and, what’s more, The clearest example is advancement in light of the fact
tying out IoT concepts a hard task [1]. IoT objects that examination is simply not prepared to get to data
use different correspondence traditions to communi- from past events, so data must rely upon near endeavors
cate over different frameworks (e.g., GSM, WAN, WSN, executed already, for instance. Also, certain quantita-
Bluetooth). Given the number and diverse sorts of IoT tive models can’t cover specific innovative fields. This
dissents and also their compelled gear limits, it is mod- places the fundamental initiative in a vague framework.
erately hard to pass on and have needed safety among With the ultimate objective of handling this issue and
the contraptions. This pushes towards united and non- making an aggregate picture of the fundamental initia-
joined sort out based confirmations and variation from tive reliant on examination, intuition and individual
the norm disclosure and protection instruments. experience are required, and thus emotional data [5].
Besides, elucidation limits give, by the supporting se-
mantic matchmaking, a license to legitimize works as
10.2 INSPIRATION expected, thereby extending trust in a system response.
Decision Support Systems is the term used for data- In case inescapable little scale devices are fit for onboard
based help to programmed proposing to drive deci- overseeing on the covertly recouped data, they can por-
sions and bolster a manager. On an extremely funda- tray themselves and the setting where they are sorted
mental level, the data stream from a couple of sources out toward outside contraptions and, what’s more, ap-
is assessed by models inside a computer program. In- plications. This would refresh interability and, what’s
spouting data stream generally begins from databases more, versatility, empowering unavoidable data based
and results, e.g., in the casing of a report which is imag- structures with high degree of automation not yet al-
ined in a straightforward way. The examination can lowed by customary IoT establishments and systems.
be portrayed as a mix of figure-based models, prac- Machine learning [22], web-of-things and tremendous
tices learned, model checking and quantitative exami- information are reliant on each other, while complet-
nation that basically uses estimations to reinforce deci- ing an amazing activity as found in Fig. 10.1.
sions with respect to deliberate writing of computer pro- Two basic results are provoked in any case, addi-
grams, which is called continuous examination. This ex- tionally the human–computer association could be up-
amination typically encounters two standard occasions. graded, by diminishing the customer effort required
The key time frame is generally fixed on combining in- to benefit by managing systems. In conventional IoT
ternal data and transforming it into a sorted out shape. checks, a customer unequivocally interfaces with one
The second time frame is initiated by the frequent exam- contraption when given a chance to play out an er-
ple of gigantic data. The second time frame is depicted rand [10]. In spite of what may be common, customer
by the headway of new systems, types and the joining of specialists running on versatile figuring contraptions
data spilling from outside sources. In the reference area, should have the capacity to organize in the meantime
a model visual layout of a descriptive programming can the various embedded little scale segments, outfitting
be found. The best approach to update business meth- customers with setting vigilantly changed errand and
ods and, along these lines, decisions of any kind is to decision help. Moreover, paying little heed to whether
process data to be examined. Likewise, this can save machine learning frameworks, estimations and instru-
essential features, valuable time and money. Figuring ments have attracted novel classes of examination, strik-
the focal point of an illustrative programming ought to ingly basic for big data Internet-of-Things perspective,
CHAPTER 10 Miracle of Deep Learning Using IoT 167
the abuse of commence based and repulsive disclosure learning, add up to learning, and improve basic learn-
structures using non-change masterminding happens as ing.
expected, reimbursing possible faults in getting works
out, device flightiness and the nonappearance of na-
ture of remote correspondences taking off adaptable IoT 10.3 DECISIONS IN AN AREA OF DEEP
establishments greatly flexible for paying little heed to LEARNING AND IOT
what you look like at its application [11]. In this section, the subject of data-driven essential ad-
Immense data examination is a rapidly developing ministration is reviewed more in detail. The central fig-
investigation area, comprising the fields of program ure in the process still remains a human, yet the de-
building, information affiliation, and it has changed cision or noteworthiness in detail and the assurance
into an unavoidable term in understanding and over- among decisions relies upon insignificant data and con-
seeing complex issues in different disciplinary fields, victions and not on adapting, long understanding or
for instance, spreading out, related number juggling, intuition. Media transmission associations and the cash
calm, computational science, helpful associations, so- related industry viably associated systems in the midst
cial affiliations, back, business, government, control, of the 1990s to evaluate the enormous proportion of
transportation and media correspondences. The util- data they aggregated. These systems maintained trading,
ity of gigantic data is found, everything considered, in organized exhibiting, coercion area and credit scoring.
the locale of the IoT. Huge data is used to amass IoT Like the pace of relationship headway among associa-
structures which consolidate things-driven, data-driven, tions, data-driven fundamental authority progressed as
advantage-driven sketching out, cloud-based IoT. De- a result of the persistent headway in information devel-
grees of advanced interfacing with IoT support sensors opment. The past couple of years have seen a brisk en-
and radio, keep seeing attestation, low power, and cen- hancement of artificial intelligence (AI) developments
trality gathering, sensor frameworks’ and IoT benefits joined with machine discovering that uses estimations
basically join semantic affiliation, security and assur- with the ultimate objective to satisfy needs in a more
ance, ensuring traditions, plan precedents of sharp af- correct and, all things considered, mechanized way. Rec-
filiations [9]. To sufficiently mix huge data and pass ollecting this, the capacity of pushing ahead essential
on among contraptions using IoT, machine learning initiative is apparently endless. Colossal associations
frameworks are used. Machine involvement isolates vi- passing on information-driven fundamental authority
tality from massive data using obvious structures, which are Amazon and Google. While Amazon gathers profits
when consolidated fall far from the confidence exami- from data with respect to setting strong goods propo-
nation, gathering, Bayesian frameworks, decision trees, sition, Google goes for settling on decisions altogether
and sporadic forests, reinforced vector machines, bolster dependent on amassed data. Data is amassed, secured,
168 Deep Learning and Parallel Computing Environment for Bioengineering Systems
as well as updates for memory utilization, are essen- test to one of a set of possible depictions, e.g., an email
tially used. The downside of this approach is that it re- message is spam or not as illustrated in Fig. 10.3. The
quires a specific device. Apparatus based reviving bosses request has thus a discrete n-ary yield. This is the main
are most sensible for high-regard applications that re- problem considered in this part division, delineated as
quire low power use and shrewd estimation. Basic learn- the estimation of the relationship of a dependent vari-
ing isn’t out of the level of IoT contraptions. Structure able and more than one free factor, e.g., anticipating the
weight, questionable dealing with, and gear reestab- purchase cost of a house thinking about its size, age,
lishing masters all engage in a tremendous neural net, zone and different features [12]. Precisely when all is
showing up on devices with limited CPU, memory, and said and done, both the abundance of information and
power. In a similar route with a specific system, there temptation into wrongdoing can move in steady ranges.
are blocks and checks. Structure weight and collected Get together, i.e., limiting a system of discernment to
figuring are suitable to applications with flexibility for social, augments the similarity of tests inside every get-
a nonappearance of exactness. The two methodologies together and the versatile quality between gatherings.
can be joined with reviving executives to similarly up-
Diverse case accreditation issues depend on the wake of
grade execution at any rate to the detriment of using
squeezing.
specific gear. Structure weight and equivocal dealing
The execution of an ML structure routinely consoli-
with both diminished data used as a touch of esti-
dates two essential stages: getting ready and testing. In
mations and should be used together cautiously. Our
the getting ready stage, the understood ML count de-
knowledge of tremendous learning on IoT contraptions
is constrained, and one should proceed with caution in velops a model of the particular issue inductively from
a couple of zones. masterminded data [13].
Each open dataset to be portrayed is separated into
10.3.2 Rudiments of Machine Learning an organizing set for show building and a test set for
Machine learning is a piece of artificial intelligence guarantee. There exist a few procedures for earnestly
which intends to make structures arranged for getting picking getting ready and test pieces. Among others, in
information from past connections. Particularly, instead k-overlay cross-guaranteeing, the dataset is split into k
of master structures, ML counts and methodologies are subsets of comparable size; one of them is used for
routinely data-driven, inductive and general in nature; testing and the rest of the k − 1 for getting ready. The
they are delineated and associated with a focus on fig- method is underscored k times, each time using a sub-
ures or decisions in some broad class of endeavors, e.g., stitute subset for testing. The less flighty holdout struc-
spam withdrawing, handwriting accreditation or activ- ture, rather, segregates the dataset aimlessly, if all else
ity zone. Three basic groupings of ML issues exist: classi- fails to entrust a more vital level of tests to the strategy
fication, incorporating the relationship of an affirmative set and a one more moment to the test set [17].
170 Deep Learning and Parallel Computing Environment for Bioengineering Systems
Fig. 10.4 demonstrates how information is master- 10.3.4 Secure Deep Learning
minded especially for the attributive language with unfit Various hardware supported strategies have been used
number of restrictions DL. It gives tasteful expressive- to stimulate getting ready, for instance, scattered han-
ness to help the showing continue, while giving poly- dling and GPU animated enrolling. Large neural frame-
nomial multifaceted nature to both standard and non- works and other machine learning classifiers provided
standard affirmation affiliations [19]. an impression of being weak against little disturbances
to inputs. Those wellsprings of data can be indistinct to
10.3.3 Algorithms for Efficient Training the human eye; henceforth, it is possible that the sys-
More significant models with greater parameters seem tem misses the mark without making any noticeable
to achieve better accuracy. In any case, the drawback changes [20]. Picture portrayal using significance learn-
of using further model is a longer getting ready time ing is all about comprehensively grasping and isolat-
with bunch institutionalization, the mean and change ing things. For power-hungry edge contraptions, man-
of the internal centers can be acclimated to be extraor- aging the trade-off between power use and the idea
dinary for getting ready [14]. This empowers us to have of the received images is of uncommon concern. Phe-
a higher learning rate which enables speedier planning. nomenal controllability of the camera parameters en-
Cluster institutionalization, moreover, regularizes the gages shrewd control of sensor data age to enhance the
model, thus reducing the necessity for using other regu- information given a power spending plan. For exam-
larization procedures like dropout [16]. A technique for ple, region-of-interest (RoI) based coding at the sensor
getting ready frameworks, insinuated as dense–sparse– can be engaged by controlling spatial objectives of the
dense (DSD), proceeds in three phases. It trains to orga- camera achieving loss of information with constrained
nize firstly with many parameters, secondly with a few imperativeness use. Furthermore, basic picture sensor
parameters, which amounts to pruning, and then again noise must be considered for productive picture por-
with many parameters. The authors showed better pre- trayal for low-end contraptions [21]. Various prior work
cision for the framework arranged with DSD than with has examined the impact of low-quality images on the
standard planning. The nature of this work is to learn image portrayal. Customary denoising frameworks can
basic parameters first, and take in whatever is left of be used to enhance the idea of the photos as a pre-
the parameters with formally learned basic parameters. getting ready. Super-objectives can be associated with
Another system capable of getting ready uses low pre- low-objective images. Unnecessary process for the un-
cision math, which is less computationally expensive blemished images due to the usage of preprocessing for
than drifting point accomplice dynamic-settled point every data results in adulterated precision for the ideal
calculating, and can help cut down exactness in the images. Data development approaches can, moreover,
midst of planning. be used for both noisy and low-objective images. The
CHAPTER 10 Miracle of Deep Learning Using IoT 171
extent of immaculate and irritated images and hardship sion of the readied frameworks for clean and LR images.
definition in the midst of getting ready chooses execu- For this circumstance as well, clearly, planning with LR
tion for each datum allocation. The trade-off between images upgrades precision for LR images stood out from
the accuracy of the flawless and noisy images must be getting ready just with clean images on both MNIST
considered. Also, thinking about the confusion resolv- and CIFAR10. When planning just with LR images, for
ing question disclosure mastermind is, moreover, cru- MNIST, getting ready just with LR images results in in-
cial as there are almost no works related to dissent area credible exactness for clean images and moreover LR
for noisy and low-objective images. images. This is due to LR images having adequate fea-
tures that can be especially summed up for clean HR
10.3.5 Robust and Resolution-Invariant images. For complex images like CIFAR10 getting ready
Image Classification just with LR images ruins precision for clean HR images
Picture arrangement using significant learning is all since LR images lose features. In all cases, precision for
about comprehensively distinguishing things. For LR images isn’t close to that for clean images in view
power-hungry edge devices, it is essential to manage the of the loss of features in LR images. When getting ready
trade-of between the essentialness and nature of the re- with clean and LR images, inquisitively, frameworks ar-
ceived picture. RoI based coding is transforming into a ranged with both clean and LR images demonstrate
standard for controlling the essentialness quality trade- better precision improvements for both clean and LR
off in resource poor edge devices. Moreover, unavoid- images appearing differently in relation to the frame-
able picture sensor commotion must be considered for work arranged with clean images. This prescribes data
productive picture portrayal for low-end contraptions. development with LR images fills as ok to regularize for
The objective of this examination is to improve the the ideal images too. In any case LR demonstrates re-
healthiness of a classifier against such disturbances [18]. duced precision for LR images stood out from. Because
Various prior work has inspected the impact of low- the framework arranged just with LR images has seen
quality images on the image portrayal. There are two more LR images in the midst of planning, it performs
diverse approaches to improve the gathering exactness. better for LR images. Yet again, the pivot adversity just
One is to clear such disturbance itself before performing forms accuracy for clean and LR images diverged from
gathering. A two-phase getting ready of to some extent the framework arranged without turn difficulty.
coupled framework for LR picture course of action ana- A conventional neural network (CNN) contains
lyzed the effect of racket and picture quality degradation three sorts of layers: an info layer, concealed layer, and
on the precision of a framework, and to re-plan/change a yield layer. Info and yield layers are single-layers while
the framework with data extension. the hidden layer can have a multilayer structure. The
quantity of concealed layers relies upon the span of the
10.3.6 Planning With Flawless and Noisy information set. Commonly, a CNN contains 2–4 hid-
Images den layers, while an intricate system may reach more
Clean, Noisy/Pivot mishap demonstrates an incredible than 10 hidden layers. Each layer incorporates a lot of
trade-off between getting ready just with clean images hubs, which are viewed as the central units of the neu-
and with disorderly and noisy images. The turn setback ral system, called neurons. Their info originates from
just augmentation precision for both flawless and noisy the output of the previous layer with the exception of
images stood out from the unadulterated data exten- the information layer, where the input is crude infor-
sion approach. This exhibits additional difficulty with mation. Similarly, the output of one neuron will be
united embedding results in best regularization over bolstered to the next layer as input. As a principal unit, a
getting ready just with extended data while overseeing neuron executes an arrangement of straightforward and
pixel level disturbance. Low-resolution (LR) images are nonlinear calculations as pursues. In the wake of mak-
delivered with sub-analyzing components of either 2 ing a model structure, the subsequent stage is to prepare
or 4. After down-examining, those photos are up-tried NN’s parameter [6]. Crude information will experience
with “nearest” or “bicubic” technique. Erratically lifted the whole framework and make output units deliver last
sub-sampling factor and up-testing strategy are used per outcomes, which will contrast with the coveted output
each image while getting ready. Occurring LR images, results by assuming a misfortune work. Here we pick
together with the ideal high resolution (HR) images, are Softmax capacity to decide the names of the informa-
mixed in the midst of getting ready with/without pivot tion test. The idea of “Softmax loss” is direct. Given an
setback. Yet again, frameworks are set up with clean im- information test x(i) with the genuine name k, misfor-
ages and with LR images for examination, the test preci- tune work just spotlights on the anticipated likelihood
172 Deep Learning and Parallel Computing Environment for Bioengineering Systems
of the kth class. For the anticipated probabilities of dif- of a PCA are frequently related to segment scores, and it
ferent classes, since the consequences of capacity I(•) was contended that each institutionalized unique vari-
are 0, their misfortune is equivalent to 0. Basically, the able could increase the segment score. A PCA-based
NN’s preparation stage is to finely tune model’s param- edge-safeguarding highlights’ technique was connected
eters, to such an extent that its misfortune would reduce to hyper spectral picture grouping. In-arched detailing
to a global minimum. for kernel PCA was examined and was connected to
semi-supervised learning.
10.3.7 Smart and Fast Data Processing
In this section, to are occupied with joining the benefits
of the above two classifications, in particular, to keep the 10.4 SIMULATION RESULTS AND
physical highlights of the first information, yet utilize PERFORMANCE ANALYSIS OF
a straight change in the information subset determina- HANDWRITTEN DIGITS RECOGNITION
tion process. We attempt to target two indices: substan- IN IOT
tial scale informational index and mass informational Handwritten digits fill in as an interface among human
index. For expansive scale informational index, we pro- and IoT gadgets, which can be represented in the ac-
pose to utilize SVD-QR for the information subset de- companying models. (1) Smart telephones are most
termination, where SVD is utilized to sort the solitary prominent IoT gadgets, and many advanced mobile
qualities and comparing particular vectors, and the span phones utilize written by hand digits/letters in the order
of the information subset could be resolved depending in their touch screens. It’s alluring that written by hand
on solitary qualities; and QR is utilized to choose which digits/letters in order could be 100% perceived contin-
information tests ought to be chosen as a contribution uously. (2) In numerous independent frameworks, for
to profound learning [3]. The SVD is a straightforward example, self-ruling mail scanner and wholesalers, man-
change; anyway, the QR decides on the list of informa- ually written digits’ acknowledgment is basic to perceive
tion subsets to be chosen, which makes the information postal division and beneficiary telephone number so
subset chose the same highlights as the first informa- the postal courses could be resolved and beneficiaries
tional collection. For profound learning with huge in- can get instant messages on the conveyance status of
formation input, say a framework with the size of thou- sends. Self-governing frameworks are vital parts of IoT.
sands times thousand, how to stretch out the SVD-QR (3) In an observation framework, transcribed/printed
strategy to huge information frameworks is not clear. digits’ acknowledgment is essential to recognize cer-
A noteworthy test in monstrous information preparing tain elements, for example, vehicles, and reconnaissance
is to broaden the current practice of taking a shot at a frameworks are mission-basic IoT. (4) In instructive
single machine and medium or vast size information IoT, for example, machine-based evaluating framework,
preprocessing, particularly thinking about real-world written by hand digits/letter sets’ acknowledgment will
frameworks and compositional requirements. Numer- help to rapidly review undergraduates exams and dis-
ous methodologies on gigantic information preparing charge instructors’ outstanding burden. This is excep-
center around dimensionality reduction that process the tionally constrained to numerous decisions issues and
dimensionality-reducing mapping. A prime case of this it’s tedious for undergraduates. (5) In money related
methodology is irregular projection strategies, which IoT, for example, self-ruling investor, written by hand
select the mapping indiscriminately. Several different digits in bank checks must be related to 100% exact-
methodologies regularly tailor the mapping to a given ness to guarantee trustable exchanges, which makes 24
smaller size informational collection, for example, in- hour management of an account conceivable. In this
formation mindful dimensionality decrease strategies. chapter, manually written digits’ acknowledgment is
The mapping isn’t pre-decided, yet information sub- utilized in our reproduction [15]. We apply SVD-QR
ordinate. The PCA calculation utilizes the information pre-handling and LMSVDQR pre-preparing for pro-
to register the mapping, and the mapping is genuinely found learning neural systems manually written digits’
time-shifting since the information is changing, so PCA (from 0 to 9) acknowledgment. Transcribed letters in
can distinguish the hidden structure of the information. order acknowledgment will be considered in our fu-
A PCA calculation is dependent on a streamlined neu- ture works. Reproduction results for SVD-QR approach
ron model, which was a solitary neuron with Hebbian- contain 5000 preparing precedents of transcribed dig-
type learning for the association weights. Regularly, PCA its. Each preparation/testing model contains a 20 × 20
should be possible by SVD or eigenvalue decomposi- pixels grayscale picture of the digit from 0 to 9, and ev-
tion of an information covariance grid. The aftereffects ery pixel is addressed by a floating point number (from
CHAPTER 10 Miracle of Deep Learning Using IoT 173
−0.1320 to 1.1277 in the informational index we uti- with the unified SDN. The commitments of the pro-
lized) demonstrating the grayscale force at that area. found learning based proposition can be clarified in
The 20 × 20 matrix of pixels can be vectorized into a two viewpoints. First, with the focal control worldview
400-dimensional vector. So a lattice can be developed of SDN, switches don’t have to trade their channel states
where every one of these preparation precedents turns any longer. All channel task procedures can be done in
into a solitary column. This gives us a 5000 × 400 grid the local controller. In this way, the flagging overhead
where each column is a preparation case for a manu- of the system is essentially lessened. Second, since the
ally written digit (0 to 9) picture. The second piece of profound taking in methodology can gain from past
the preparation set is a 5000-dimensional vector that channel task forms through preparing with the informa-
contains names (a genuine digit from 0 to 9) for the tion gathered from the current channel task calculations
preparation set. In total we have 5000 models of manu- (e.g., ACPOCA), the channel task can be done in a sim-
ally written digits in the database, and every digit (from ple single cycle.
0 to 9) has 500 precedents. In the first place, we utilize the ground-breaking
profound learning to deal with foreseeing the mind-
boggling activity, which can accomplish over 90% exact-
10.5 AN INTELLIGENT TRAFFIC LOAD ness and have a brisk reaction time (5 ms in those three
PREDICTION diverse forecast techniques), to additionally think about
True Internet-of-Things organization is in a general the expectation precision in those three distinctive con-
sense heterogeneous software defined networking trol frameworks. The outcome demonstrates that the ex-
(SDN); it is a celebrated procedure utilized in the IoT pectation exactness of brought together SDN based fore-
to manage heterogeneous assets and structures. In such cast is in every case superior to those in the two different
SDNIoT, as delineated heterogeneous gadgets sense and frameworks [23]. Finally, with the concentrated SDN
gather information in the detecting plane, and after- control, we join the profound learning based activity
ward send the information to the portal after com- expectation and channel task that uses the anticipated
bination through switches in the information plane movement stack as the rule to play out the insightful
with the expanding number of gadgets, the heap of channel task. Such insightful channel task, which we al-
incorporated movement in switches may turn out to lude to as TP-DLCA, can effectively build the channel
be fundamentally substantial, and different channels task precision and preparing velocity of channel task.
should have been uniformly allocated to each connect The reproduction results exhibit that both the through-
to adjust the heap. Since high obstruction exists be- put and deferral in the SDN-IoT with our proposition
tween non-symmetrical channels and the number of are superior to those of the ordinary calculations.
symmetrical channels is restricted, the partially overlap-
ping channel (POC) can be a decent answer for decline
impedance and enhance arrange throughput [7]. The 10.6 PERFORMANCE OF DEEP LEARNING
current POC calculations for the most part center on the BASED CHANNEL ASSIGNMENT
enhancement of system execution after channel task, This section contrasts the learning execution of POC
however, do not have the thought of waste through- and distinctive learning structures and diverse learning
put because of the suspended transmission amid the parameters. Then we think about the POC precision of
channel task process with the high elements of the cur- our proposition. At last, we think about the through-
rent IoT, the allocated channels should be every now put between deep learning based POCA (DLPOCA)
and again changed to adaptively acclimate to the pow- and conventional channel task calculations (i.e., the
erfully changed system activity. This dynamic change symmetrical channel task, POC, AC-POCA) to contrast
tosses out a basic prerequisite for the snappy handling the preparation exactness and distinctive learning struc-
of the channel task [25]. To take care of this issue, an tures, i.e., deep belief network (DBN) with 2 and 3
anti-coordination based POC assignment (ACPOCA) concealed layers. The quantity of hubs in each layer is
can productively lessen the emphasis times of channel set to 20 and 100. At that point, we change the DBN
task process, and enhance the system throughput. In structure into profound CNN with 1 and 2 convolu-
any case, without a focal controller, both the flagging tion layers and 2 full association layers, respectively. In
and suspension time of the system are restricted by the the CNN, we set the size of convolution layer to 3 × 3,
disseminated setting. Accordingly, we address such dif- the quantity of hubs in full association layer is 100, the
ficulties, in the initial segment of this chapter, using a quantity of diverts in convolution layer is 20, and the
profound learning based clever POC task calculation cushioning and walk are set to 1. At that point, we think
174 Deep Learning and Parallel Computing Environment for Bioengineering Systems
about the distinctive preparing structures in various sys- contrasting and the objective individual. The persons
tem structures. Subsequently we run each one of those on foot proposition net and the recognition net ad-
preparation forms with smaller than usual clump size of just with one another amid the joint advancement. For
20 and 500 pouches, for the precision result. From the instance, the proposition net can concentrate more on
outcome, we can see that the exactness is profoundly the review instead of the accuracy, as false alerts could
identified with the preparation structure and the pro- be wiped out through the last highlights coordinating
found CNN is greatly superior to DBN in our situation. procedure. In the interim, misalignments of the propo-
The amount of emphasis time is dependable, which al- sition are likewise worthy, as they can be additionally
together beats ordinary calculations. In traditional cal- balanced by the recognition net. To enhance the versatil-
culations, the switch picking the channel of its con- ity of the entire framework, motivated by late advances
nections relies upon the choices of different switches. in protest location, we urge the two sections to share
This implies that the switches must hold up until earlier hidden convolution include maps, which fundamen-
switches completed their channel task. The larger the cy- tally quickens the surmising methodology. Customary
cle times, the more drawn out time each change needs re-id include adapting chiefly utilizes pair-wise or triplet
to spend on channel task. This causes repetitive assem- separate misfortune capacities. Notwithstanding, they
bly time. Due to the repetitive assembly time, excess are not as productive as just having a few information
flagging increases correspondingly. Amid the intermin- tests at each time, and there are O(N 2 ) potential in-
gling time, all connections are down as a direct result of formation blends, where N is the number of images.
the channel reassignment, and the throughput dimin- Distinctive testing procedures could fundamentally af-
ishes with such repetitive combination. fect the assembly rate and quality, however, finding pro-
ductive examining techniques turns out to be much
10.6.1 A Deep Learning System for the more troublesome as N increases. Another methodol-
Individual Pursuit ogy is figuring out how to arrange characters with the
Not quite the same as ordinary methodologies that sep- Softmax misfortune work, which successfully looks at
arate the issue into two separate undertakings passerby all the examples in the meantime. As the number of
location and individual recognition, we together handle classes increases, preparing the huge Softmax classifier
the two perspectives in a solitary CNN. The CNN com- framework turns out to be much slower or even can’t
prises two sections, given an entire info display picture, be done. In this section, we suggest a novel Online In-
a walker proposition net is utilized to deliver bouncing stance Matching (OIM) misfortune capacity to adapt to
boxes of competitor individuals, which are encouraged the issues as shown in Fig. 10.5. We keep a query table
into a recognizable proof net to separate highlights for of highlights from all the marked characters, and look
at separations between scaled down cluster tests and all stem highlight maps for every proposition. At that point
the enlisted sections. they are gone through the rest conv4_4 to conv5_3 of
Then again, numerous unlabeled characters could the ResNet-50, trailed by a worldwide normal pool-
show up in scene images, which can be filled in as neg- ing layer to condense into a 2048 dimensional element
atives for named personalities to consequently misuse vector. On the one hand, as the walker recommenda-
around the line to store their highlights additionally for tions would unavoidably contain some false cautions
correlation. This is another favorable position brought and misalignments, we utilize again a Softmax classi-
by the individual hunt issue setting. The without pa- fier and a straight relapse to dismiss non-people and
rameter OIM misfortune appears considerably quicker refine the areas. Then again, we anticipate the highlights
and superior to the Softmax misfortune in our exami- into an L2 -standardized 256-dimensional subspace (id-
nations. accomplishment), and utilize them to process cosine
similitude’s with the objective individual while doing
10.6.2 Network Architecture surmising. Amid the preparation to organize, we direct
A profound learning structure that mutually handles the id-accomplishment with the OIM misfortune work.
the passerby recognition and individual re-recognizable Together with different misfortune capacities for discov-
proof in a solitary convolution neural system (CNN) ery, the entire net is mutually prepared to perform var-
is shown in Fig. 10.5. Given as information an entire ious tasks by learning, instead of utilizing the elective
scene picture, we first utilize a stem CNN to change from improvements.
crude pixels to convolution highlight maps. A person
on foot proposition net is based upon these element 10.6.3 Dataset
maps to foresee jumping boxes of hopeful individuals, We gather and comment on a vast scale CUHK-SYSU
which are then nourished into a distinguishing proof Person Search Dataset to assess our technique to mis-
net with RoI-pooling to remove L2 -standardized 256-D use two information sources to differentiate the scenes.
highlights for every one of them. At deduction organi- On the one hand, to utilize hand-held cameras to shoot
zation, we rank the display individuals as indicated by road snaps around urban communities. Then again, we
their component separations to the objective individ- gather motion picture depictions that contain people
ual. At the preparing stage, we propose an OIM misfor- on foot, as they could improve the variety of perspec-
tune work [24] over the element vectors to manage the tives, lighting, and foundations. In this area the essential
ID net, together with a few different misfortune capac- measurements of our dataset additionally characterize
ities for preparing the proposition net in a way to per- the assessment conventions and measurements.
form various tasks. Beneath we will initially detail the
CNN to demonstrate structure, and afterward expound 10.6.4 Statistics
on the OIM misfortune work. We embrace the ResNet- After gathering all 18,184 images, we first thickly com-
50 as the CNN demonstrates. It has a 7 × 7 convolution ment on all 96,143 walkers bouncing boxes in these
layer in front (named conv1), trailed by four squares scenes, and afterward relate the individual that shows
(named conv2_x to conv5_x) each containing 3, 4, 6, up crosswise over various images, bringing about 8432
and 3 lingering units, respectively. Given an informa- marked personalities. The measurements of two infor-
tion picture, the stem will create 1024 channels of high- mation sources are recorded to do clarify those individ-
lights’ maps, which have 1/16 goals of the first picture. uals who show up with half-bodies or strange postures,
Convolution layers of size 512 × 3 × 3 are first added for example, sitting or hunching down. Besides, individ-
to change the highlights particularly for people on foot. uals who change garments and beautifications in vari-
At that point we pursue to relate 9 neighbors that each ous video outlines are not related in our dataset, since
element delineates, and utilize a Softmax classifier to individual inquiry issue requires perceiving characters
foresee whether each instance is a person on foot or for the most part as indicated by their garments and
not, and in addition a straight relapse to change their ar- body shapes as opposed to faces.
eas. We will keep the main 128 balanced jumping boxes To guarantee that the foundation people on foot
after non-most extreme concealment as our last propo- don’t contain marked personalities, and in this way,
sition. To discover the objective individual among ev- they can be securely filled in as antagonistic examples
ery one of this proposition, we construct a recognizable for ID. Note that we additionally overlook the founda-
proof net to remove the highlights of every proposition, tion walkers whose statures are smaller than 50 pixels,
and look at the objective ones. We first adventure a RoI- as they would be difficult to perceive notwithstanding
pooling layer to pool a 1024 × 14 × 14 district from the for human labelers. The tallness conveyances of marked
176 Deep Learning and Parallel Computing Environment for Bioengineering Systems
tity of contribution to profound learning neural sys- the task it to handle the issue of utilizing numerous cur-
tem without losing its execution, and also can hugely rent little datasets that everyone has in his/her very own
build the information preparing speed for profound information predisposition. A joint single errand learn-
learning in IoT. Consequently, we propose a novel pro- ing calculation and space guided dropout method are
found learning based activity stack expectation calcula- produced to deal with the area assignment unequivo-
tion right off the bat to gauge future movement load cally in a solitary model. From the application point of
and login system. DLPOCA is used to insightfully dis- view, to facilitate the more sensitive issue setting, a re-
pense diverts to each connection in the SDN-IoT orga- markable profound learning structure for concurrent in-
nization to consider a profound learning based forecast, dividual location and recognizable proof is needed. An
and POCA can brilliantly maintain a strategic distance epic Online Instance Matching challenge work makes
from potential clog and rapidly distribute appropriate learning recognition highlights all the more viable. The
direct connections in SDN-IoT. The reenactment result hazardous development of detecting information and
shows that our proposition altogether outperforms tra- brisk reaction prerequisites of the IoT have, as of late,
ditional channel task calculations. prompted the rapid transmissions in the remote IoT
to rise as a basic issue. Distributing appropriately and
directly in remote IoT is a fundamental guarantee of
10.8 CONCLUSION rapid transmission. In any case, the customary settled
There are two fundamental cases in information pre- channel task calculations are not appropriate in the IoT
preparing, to keep a specific component or to include because of heavy activity loads. As of late, the SDN-IoT
lost. Uniform or non-uniform testing could keep the is trying to enhance the transmission quality. Addition-
physical highlights of the information, yet it neglects ally, the profound learning strategy has been generally
the investigation of the more profound relations among examined in high computational SDN. In this way, a
all information other than mean and standard devi- novel profound learning based movement stack expec-
ation; though change-based information pre-handling tation technique emerged to foresee the future activity
further investigates the information inter-correlations load and system blockage. DLPOCA was to insight-
from past insights, even if its physical highlights have fully distribute diverts to each connection in SDN-IoT
been lost. To consolidate the benefits of the classifi- which can astutely stay away from movement block-
cations, SVD-QR and LMSVD-QR calculations are sug- age and rapidly appoint appropriate channels to the
gested for preprocessing of profound learning in multi- remote connections of SDN-IoT. Broad reproduction re-
layer neural systems. The SVDQR is fit to be connected sults show that our proposition fundamentally beats the
to a huge-scale information pool, and LMSVD-QR is for customary channel task calculations.
mass information input. The SVD and LMSVD perform
direct changes to the first information, and the QR task
decides the information test record of an information REFERENCES
subset to be kept, so that the first information physi- 1. Special issue of Big Data research journal on “Big Data
cal highlights have been kept. Connecting them mech- and neural networks”, Big Data Research 11 (2018) iii–iv,
anizes transcribed digits acknowledgment. Recreation https://doi.org/10.1016/s2214-5796(18)30058-3.
results demonstrate that the two methodologies can 2. R. Kashyap, P. Gautam, Fast medical image segmenta-
immensely lessen the quantity of contribution for pro- tion using energy-based method, Biometrics: Concepts,
found learning without losing its execution. In particu- Methodologies, Tools, and Applications 3 (1) (2017)
lar, the SVD-QR could accomplish a similar execution 1017–1042.
(99.7% exactness) with just 103 information sources, 3. S. Liang, Smart and fast data processing for deep learn-
contrasting with the first informational index with 400 ing in the Internet-of-Things: less is more, IEEE Internet
data sources. of Things Journal 20 (10) (2018) 1–9, https://doi.org/10.
1109/jiot.2018.2864579.
Building up a progression of profound learning
4. M. Rathore, A. Paul, A. Ahmad, G. Jeon, IoT-based Big
based ways is needed to deal with making human-
Data, International Journal on Semantic Web and Infor-
recognition adaptable towards genuine information mation Systems 13 (1) (2017) 28–47, https://doi.org/10.
and applications. From the information point of view, 4018/ijswis.2017010103.
on the one hand, the goal is to facilitate the test of 5. P. Yildirim, D. Birant, T. Alpyildiz, Data mining and ma-
sufficiently lacking regulated information. A profound chine learning in the textile industry, Wiley Interdisci-
learning system should use semi-administered noisy plinary Reviews: Data Mining and Knowledge Discovery
naming information, or effective to gather. Then again, 8 (1) (2017) e1228, https://doi.org/10.1002/widm.1228.
178 Deep Learning and Parallel Computing Environment for Bioengineering Systems
6. E. Konovalov, Equivalence of conventional and modified 16. C. Tofan, Optimization techniques of decision making –
network of generalized neural elements, Modeling and decision tree, Advances in Social Sciences Research Journal
Analysis of Information Systems 23 (5) (2016) 657–666, 1 (5) (2014) 142–148, https://doi.org/10.14738/assrj.15.
https://doi.org/10.18255/1818-1015-2016-5-657-666. 437.
7. F. Tang, B. Mao, Z. Fadlullah, N. Kato, On a novel deep- 17. R. Kashyap, A. Piersson, Impact of Big Data on security, in:
learning-based intelligent partially overlapping channel as- Handbook of Research on Network Forensics and Analysis
signment in SDN-IoT, IEEE Communications Magazine Techniques, IGI Global, 2018, pp. 283–299.
56 (9) (2018) 80–86, https://doi.org/10.1109/mcom.2018. 18. R. Kashyap, V. Tiwari, Energy-based active contour method
1701227. for image segmentation, International Journal of Elec-
8. K. Noel, Application of machine learning to system-
tronic Healthcare 9 (2–3) (2017) 210–225.
atic allocation strategies, SSRN Electronic Journal (2016),
19. M. Dzbor, A. Stutt, E. Motta, T. Collins, Representations
https://doi.org/10.2139/ssrn.2837664.
for semantic learning webs: Semantic Web technology in
9. A. Jara, A. Olivieri, Y. Bocchi, M. Jung, W. Kastner, A.
learning support, Journal of Computer Assisted Learning
Skarmeta, Semantic Web of Things: an analysis of the ap-
plication semantics for the IoT moving towards the IoT 23 (1) (2007) 69–82, https://doi.org/10.1111/j.1365-2729.
convergence, International Journal of Web and Grid Ser- 2007.00202.x.
20. L. Kim, DeepX: deep learning accelerator for restricted
vices 10 (2–3) (2014) 244, https://doi.org/10.1504/ijwgs.
Boltzmann machine artificial neural networks, IEEE
2014.060260.
10. L. Urquhart, T. Rodden, A legal turn in human–computer Transactions on Neural Networks and Learning Systems
interaction? Towards regulation by design for the Internet 29 (5) (2018) 1441–1453, https://doi.org/10.1109/tnnls.
of things, SSRN Electronic Journal (2016), https://doi.org/ 2017.2665555.
10.2139/ssrn.2746467. 21. A. Neath, J. Cavanaugh, The Bayesian information cri-
11. Internet of things & creation of the fifth V of Big terion: background, derivation, and applications, Wiley
Data, International Journal of Science and Research Interdisciplinary Reviews: Computational Statistics 4 (2)
(IJSR) 6 (1) (2017) 1363–1366, https://doi.org/10.21275/ (2011) 199–203, https://doi.org/10.1002/wics.199.
art20164394. 22. D. Schwab, S. Ray, Offline reinforcement learning with
12. Z. Chen, B. Liu, Lifelong machine learning, Synthe- task hierarchies, Machine Learning 106 (9–10) (2017)
sis Lectures on Artificial Intelligence and Machine 1569–1598, https://doi.org/10.1007/s10994-017-5650-8.
Learning 10 (3) (2016) 1–145, https://doi.org/10.2200/ 23. F. Tang, Z. Fadlullah, B. Mao, N. Kato, An intelligent traffic
s00737ed1v01y201610aim033. load prediction based adaptive channel assignment algo-
13. D. Wang, C. McMahan, C. Gallagher, A general regres- rithm in SDN-IoT: a deep learning approach, IEEE Internet
sion framework for group testing data, which incorporates of Things Journal (2018) 1–14, https://doi.org/10.1109/
pool dilution effects, Statistics in Medicine 34 (27) (2015) jiot.2018.2838574.
3606–3621, https://doi.org/10.1002/sim.6578. 24. M. Abdechiri, K. Faez, H. Amindavar, Visual object track-
14. A. Hussain, E. Cambria, Semi-supervised learning for ing with online weighted chaotic multiple instance learn-
big social data analysis, Neurocomputing 275 (2018)
ing, Neurocomputing 247 (2017) 16–30, https://doi.org/
1662–1673, https://doi.org/10.1016/j.neucom.2017.10.
10.1016/j.neucom.2017.03.032.
010.
25. J. Liu, B. Krishnamachari, S. Zhou, Z. Niu, DeepNap: data-
15. S. Ouchtati, M. Redjimi, M. Bedda, A set of features extrac-
driven base station sleeping operations through deep rein-
tion methods for the recognition of the isolated handwrit-
ten digits, International Journal of Computer and Com- forcement learning, IEEE Internet of Things Journal (2018)
munication Engineering 3 (5) (2014) 349–355, https:// 1, https://doi.org/10.1109/jiot.2018.2846694.
doi.org/10.7763/ijcce.2014.v3.348.
CHAPTER 11
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00018-X 179
Copyright © 2019 Elsevier Inc. All rights reserved.
180 Deep Learning and Parallel Computing Environment for Bioengineering Systems
functional requirements of business use cases. The func- compared the performance of three different feature se-
tional domains of an analytic platform architecture are lection algorithms and proved that a genetic algorithm
categorized into [3]: provision data domain responsible is giving the best result in selecting the best among all
for ingestion of streaming data, moving data between available features. SVM algorithm gives the best result
RDBMS, HDFS and connectors to unstructured sources; in predicting the level of certainty in breast cancer.
data storage, distribution, and processing domain re-
sponsible for stream processing, analytics database, in 11.2.1 Challenges of Big Data Technologies
memory distributed compute; and data analysis do- At the development stage, the application is developed.
main responsible for social media and text analyt- At the time of deployment, one can come across vari-
ics, statistical methods, machine learning, media and ous challenges. Different methods are used in deploying
speech analytics; embedded analytics domain responsi- the application in standalone phases. Big data technolo-
ble for distributed flow control, event driven business gies support Mesos and Yarn. Constructing dependen-
rules, complex event processing; information delivery cies configuration also helps in deploying application
domain responsible for visual analytics, real time re- without any exception at cluster node.
porting; enterprise information management domain
responsible for distributed compute data management; 11.2.1.1 Memory Issues
development, deployment, and solution maintaining
Big data technologies are ready to assist in transfer-
domain responsible for design tools, analytic model
ring huge amounts of data. Addressing issues related to
management; infrastructure delivery domain respon-
memory is critical for monitoring and measuring sim-
sible for distributed compute security, administration
and monitoring; and, finally, service delivery domain ple usage.
responsible for analytics platform/solutions. In [4,5], • Proper documentation cannot lead to failure – each
the author used weighted fuzzy logic to assign weights and every step should be recorded for future pro-
in training the data to extract sentiments from the la- poses when recovering the system from a single point
beled tweets and achieved good F-score, whereas in [6] of failure.
the author made a detailed comparison of predictive • Frequent changes are more dangerous – change of
models and performed analysis on a time series dataset. version after the setting can be overruled.
In [7], the author performed analysis on PIMA diabetes
dataset and predicted the levels of diabetes based on 11.2.1.2 Limitations of Apache Big Data
insulin feature, whereas in [8] the author used gradi- Technologies
ent ascent algorithm when finding the exact weights of While processing huge amounts of data, big data tech-
the terms used in determining the sentiment of tweet nologies and deep learning are much more suitable
and used a boosting approach to improve the accuracy with small files and duplicate management. Now, the
of a linear classifier. In [9], the author provided a novel industry is trying to shift towards Apache Flink with
way of performing prediction on a breast cancer dataset, respect to big data. A combination of deep learning
CHAPTER 11 Challenges in Storing and Processing Big Data Using Hadoop and Spark 181
and its usability will be more frequent, since stream- 4. Use cases: NaviSite, Yahoo, Wego and Twitter.
ing of live data is divided in to many batches in regular • Apache Spark
intervals, and this batching makes big data technolo- 1. Used for event detection like earthquake by ana-
gies resilient (creates RDDs). Each RDD has operations lyzing Twitter stemming data;
like join, map and reduce. They help building the data 2. Used in real time by the gaming industry for dis-
into batches. But in a real process a micro batch pat- covering patterns and processing;
tern will appear in streaming, and one of the drawbacks 3. Used as E-commerce source in live stemming clus-
of big data technologies is that they are not precisely tering algorithm;
compatible with small files. When big data technolo- 4. Supports parallel processing.
gies and Hadoop are combined, HDFS has limitations • GraphX
on small files rather than large files, and small files 1. It is a part of Spark used in distributed system in-
are stored gzipped in S3. Apache big data technologies memory (RAM) GraphX processing;
are independent as they do not have file management, 2. Used to divide large processes on one machine
and because of that they become dependent on other framework; supports neo.uj and Apache GraphX.
platforms, like Hadoop or Cloud Base, and their big
data technologies, creating one of the issues to con-
sider. To support fast processing, big data technologies 11.3 HADOOP ARCHITECTURE
support memory based transactions for processing huge Apache Hadoop offers a scalable, flexible and reliable
amounts of data. distributed computing big data framework for a cluster
The main research gap of big data technologies and of systems with storage capacity and local computing
deep learning is that big data technologies support only
power by leveraging commodity hardware. Hadoop fol-
a few MLlib algorithms while deep learning is quite
lows a master–slave architecture as shown in Fig. 11.2, in
advanced in algorithms, so big data technologies have
which the Name node has meta data about all the data
a very high latency and they only support time-based
blocks in HDFS for the transformation and analysis of
rather than record-based processing. When data gener-
large datasets using Hadoop MapReduce paradigm [10].
ating is very fast, deep learning is able to handle huge
The three important Hadoop components that play a
amounts of data, yet big data technologies are not ca-
vital role in the Hadoop architecture are Hadoop Dis-
pable of back pressure handling. A buffer should be
tributed File System (HDFS) [11], Hadoop MapReduce
automatically released and used to store new data in
and Yet Another Resource Negotiator (YARN).
the stream to address the above gaps and limitations;
one can instead use Apache Flink with deep learn-
ing.
11.4 MAPREDUCE ARCHITECTURE
11.2.2 Real Time Applications of Big Data MapReduce can handle large-scale data and works well
Frameworks on Write Once and Read Many (WORM) [12] without
• Hadoop mutexes. MapReduce operations are performed by the
1. Analyze patient’s records, e.g., if he/she is sus- same physical processor. All carried out operations need
pected to have a heart attack; examine a series local data (data locality). The runtime takes care of split-
of observations or test recorded results, which are ting and moving data. MapReduce consists of the fol-
analyzed by big data methods; lowing phases: the Map phase reads assigned input split
2. Prevent hardware failure; from HDFS and parses it into Key/Value pairs; in Parti-
3. Understand the trends in selling a product. tion phase, every mapper should determine the reducer
• ZooKeeper to receive each of its outputs; the Shuffle phase can pre-
1. Uses CXF implementation of DOSGI; pare the Reduce task bucket, the Merge phase sorts all
2. Used in real time for serving a registration reposi- map outputs into a single run. Then the Reduce phase
tory; edits the corresponding list of values and writes it to a
3. Used to create resource locks in a real time dis- file in HDFS as shown in Fig. 11.3. A sample applica-
tributed system. tion developed using MapReduce is shown in Fig. 11.4,
• Apache Storm where the frequency of words occurred in any given text
1. Supports real time live stemming and processing; is illustrated.
2. Supports multiple programming languages; The following list of datasets was used for Hadoop
3. Is fault tolerant and has low latency; practice:
182 Deep Learning and Parallel Computing Environment for Bioengineering Systems
• Amazon. It’s no secret that Amazon is among market • LinkedData. You may find almost all categories of
leaders when it comes to cloud. AWS is being used on datasets here.
a large scale with Hadoop. Also, Amazon provides a
lot of datasets for Hadoop practice. You can down-
load these. 11.5 JOINS IN MAPREDUCE
• Wikipedia. Yes! Wikipedia also provides datasets for
A mapper’s job during the Map stage is to “read” the
Hadoop practice. You will have refreshed and real
data from join tables and to “return” the “join key” and
data for use.
• Air traffic. Airport, airline and route data (6977 air- “join value” pair into an intermediate file [13]. Further,
ports, 5888 airlines and 59,036 routes spanning the in the Shuffle stage, this intermediate file is then sorted
globe). and merged. The reducer’s job during the Reduce stage
• UCI Machine Learning Repository. A collection of is to take this sorted result as input and complete the
databases, domain theories, and data generators. task of join as shown in Fig. 11.5.
CHAPTER 11 Challenges in Storing and Processing Big Data Using Hadoop and Spark 183
As the name suggests, in the Reduce side join, the re- availability file system: no files and directories, but a
ducer is responsible for performing the join operation. unified concept of a node, called a Znode (ephemeral
It is comparatively simple and easier to implement than or persistent), which is a container of data (like a file)
the Map side join as the Sorting and Shuffling phase and a container of other znodes (like a directory) [15].
sends the values having identical keys to the same re- The operations performed on Znodes are: connecting a
ducer and therefore, by default, the data is organized
Znode to a host by specifying hostname; creation of a
for us. Now, let us understand the Reduce side join
Znode in the Persistence mode by specifying a group
in detail when the dataset in Fig. 11.6 is given as in-
put. name as part of a path name; joining a new member
The architecture of ZooKeeper is based on a sim- by specifying the path and the group name; listing the
ple client–server model [14]. The clients are the nodes children of a Znode using getChildren() by specifying
which request the server for service, and the server is the path as argument; deleting a member from the ex-
the node which serves the requests. It provides a high- isting path.
184 Deep Learning and Parallel Computing Environment for Bioengineering Systems
Algorithm 1 Algorithm to perform Reduce side join. ure” (SPOF) distributed application [17]. The impor-
1. Start tant high level components that we have in each Su-
2. Define the static class CustsMapper extending Map- pervisor node include: topology, which runs distribu-
per with generic <Object, Text, Text, Text> tively on multiple workers processes on multiple worker
3. Override the method map with arguments (Object nodes; spout, which reads tuples off a messaging frame-
key, Text value, Context context) work and emits them as a stream of messages or it may
4. Get the value to a string named record by using connect to Twitter API and emit a stream of tweets; bolt,
toString() which is the smallest processing logic within a topology.
5. Create an array of strings named parts using split( ) Output of a bolt can be fed into another bolt as input
by specifying (,) as delimiter in a topology.
6. Define the static class TxnsMapper extending Map-
per with generic <Object, Text, Text, Text>
7. Override the method map with arguments (Object 11.7 APACHE SPARK ENVIRONMENT
key, Text value, Context context) Apache Spark is very suitable for data engineering, able
8. Get the value to a string named record by using to handle case datasets without thinking much about
toString() the infrastructure. It helps in data ingestion, processing,
9. Create an array of strings named parts using split( ) machine learning and accuracy, and it provides a frame-
by specifying (,) as delimiter work for construction of a distributed system. The best
10. Update the context with instances Text(parts[2]), part of big data technologies is speed when accessing
Text “tnxn” + parts 3 using write operation the data and transferring it, which leads to implementa-
11. Define the static class ReduceJoinReducer extend- tion of MapReduce for keeping data in memory rather
ing Reducer with generics <Text, Text, Text, Text> than on a disk. Also big data technologies provide many
12. Override the method reduce with arguments (Text libraries for programming languages like Java, Scala and
key, Iterable<Text> values, Context context) Python.
13. For each value update the count and total based on
if (parts[0].equals(“tnxn”)) 11.7.1 Use Cases of Big Data Technologies
14. Else if (parts[0].equals(“cust”)) update the name as • Analytics. It can be used for building analytics in
parts[1] real time for data which is streaming. It has an abil-
15. Create a string str having both count and total us- ity to transfer massive amounts of data from differ-
ing format() ent sources. Big data technologies support Kafka, Ze-
16. Update the context with instances Text(name), romQ, HDFS, Plume and Twitter.
Text(str) using write operation • Trending data. For an online stream, big data tech-
17. Configure the job using job and Configuration nologies will be more suitable for processing the
classes data, in which trending data can be easily stored and
18. Stop analytics can be done at runtime.
• IoT. Internet-of-Things generates huge amounts of
data from sensors installed at various places. Gener-
ated data are pushed for storage and processing. Big
data technologies have been applied for data process-
11.6 APACHE STORM ing and transferring in regular period (every second,
It is a distributed real-time big data processing system minute, hour).
designed to process vast amounts of data in a fault- • Machine learning. Spark can be used in offline pro-
tolerant and horizontally scalable method with highest cessing and may use machine learning algorithms.
ingestion rates [16]. It manages distributed environ- ML can be deployed easily on data sets as ML con-
ment and cluster state via Apache ZooKeeper. It reads tains different algorithms. We can apply it on data
raw stream of real-time data from one end, passes it sets to achieve real time ML system. We can also com-
through a sequence of small processing units and out- bine MLlib with Spark.
puts useful information at the other end. The compo- In the programming context, Spark context is cre-
nents in Fig. 11.7 represent the core concept of Apache ated, which provides the means for loading and saving
Storm. data files of different type [18], thereby SQL context can
One of the main highlights of Apache Storm is that also be created from the Spark context to implicitly con-
it is a fault-tolerant, fast with no “Single Point of Fail- vert RDDs into DataFrames. Using the Spark context, it
CHAPTER 11 Challenges in Storing and Processing Big Data Using Hadoop and Spark 185
Algorithm 2 Algorithm to build a graph. Algorithm 3 TCP based stream processing using spark.
1. Start. 1. Start
2. Import the data. 2. Import Spark Streaming context from Apache.org
3. Build a graph using the structure of the vertices (or 3. Create a configuration instance using sparkconf()
nodes) and structure of the edges. specifying the local host as master node
4. Perform some joins to ensure that the data items 4. Create an instance of streaming context with batch
from the datasets are associated with each other. size of 5 seconds
5. Create a set of vertices and attach the metadata to 5. Create an instance of DStream that can connect to
each of them. hostname and port sockettextstream()
6. Create the edges from all of our individual rows by 6. Perform transformation on DStream
adding a dummy value of 1 to it. 7. Use the MapReduce logic to organize the context
7. Consider a default station for edges that don’t 8. Start capturing the streaming context using start()
point to any of the vertices. 9. Wait for the computation to terminate
8. Now the instance of graphs is created with vertices, 10. Run nc -lk port number that activates netcat as a
edges and default details data server
9. The instance can be used to access the properties 11. Stop
like numVertices, numEdges.
10. Stop. cessed stream of batches that can be stored to a file
system with 0.5 s, as minimum batch size results in one
converts a data stream into batches of X seconds called second end to end latency.
DStreams, which internally is a sequence of RDDs.
Later, Spark is used to processes the RDDs, and the re-
sults of the RDD operations are returned in batches 11.10 FUTER RESEARCH DIRECTIONS
as shown in Fig. 11.10. Spark Streaming batches are In the future, we would like to address the industry
processed using the Spark engine, which returns a pro- needs by proposing a new analytic platform that can
CHAPTER 11 Challenges in Storing and Processing Big Data Using Hadoop and Spark 187
fulfill all the real time requirements with less manpower PIMA diabetes, International Journal of Advanced Science
and infrastructure. and Technology 106 (2017) 19–32.
8. S.M. Basha, D.S. Rajput, K. Vishu Vandana, Impact of gra-
dient ascent and boosting algorithm in classification, In-
11.11 CONCLUSION ternational Journal of Intelligent Engineering and Systems
(IJIES) 11 (1) (2018) 41–49.
From this chapter, we would like to conclude that there 9. S.M. Basha, D.S. Rajput, N.C.S. Iyengar, A novel approach
are many and various challenges in the area of big data to perform analysis and prediction on breast cancer dataset
analytics. The researchers can spend their valuable time using R, International Journal of Grid and Distributed
when deciding on a research topic from this area by un- Computing 11 (2) (2018) 41–54.
derstanding all the concepts discussed in this chapter. 10. A.F. Gates, O. Natkovich, S. Chopra, P. Kamath, S.M.
Big data frameworks are suitable for data engineer- Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, U. Sri-
ing, able to handle case datasets without thinking much vastava, Building a high-level dataflow system on top of
about the infrastructure. They help in data ingestion, map–reduce: the pig experience, Proceedings of the VLDB
processing, machine learning and accuracy, providing Endowment 2 (2) (2009) 1414–1425.
the framework for construction of a distributed system. 11. K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop
distributed file system, in: Mass Storage Systems and Tech-
The best part of big data technologies is speed when
nologies (MSST), 2010 IEEE 26th Symposium on, IEEE,
accessing data and transferring it, which leads to an im- 2010, pp. 1–10.
plementation of MapReduce for keeping data in mem- 12. J. Shafer, S. Rixner, A.L. Cox, The Hadoop distributed
ory rather than on a disk. Also big data technologies filesystem: balancing portability and performance, in:
provide many libraries for programming languages like Performance Analysis of Systems & Software (ISPASS),
Java, Scala and Python. 2010 IEEE International Symposium on, IEEE, 2010,
The limitation of the present research work is about pp. 122–133.
performing experiments on relatively small datasets. In 13. H.-c. Yang, A. Dasdan, R.-L. Hsiao, D.S. Parker, Map–
the future, our aim is to consider the real time big data reduce–merge: simplified relational data processing on
datasets with respect to volume. large clusters, in: Proceedings of the 2007 ACM SIGMOD
International Conference on Management of Data, ACM,
2007, pp. 1029–1040.
14. P. Hunt, M. Konar, F.P. Junqueira, B. Reed, ZooKeeper: wait-
REFERENCES
free coordination for internet-scale systems, in: USENIX
1. X.-W. Chen, X. Lin, Big data deep learning: challenges and Annual Technical Conference, vol. 8, Boston, MA, USA,
perspectives, IEEE Access 2 (2014) 514–525. 2010.
2. P.K. Davis, Analytic Architecture for Capabilities-Based 15. F. Junqueira, B. Reed, ZooKeeper: Distributed Process Co-
Planning, Mission-System Analysis, and Transformation, ordination, O’Reilly Media, Inc., 2013.
Tech. rep., Rand National Defense Research Inst., Santa 16. R. Ranjan, Streaming big data processing in datacenter
Monica, CA, 2002.
clouds, IEEE Cloud Computing 1 (1) (2014) 78–83.
3. H.D. Hunt, J.R. West, M.A. Gibbs Jr, B.M. Griglione, G.D.N.
17. A. Bahga, V. Madisetti, Internet of Things: A Hands-On Ap-
Hudson, A. Basilico, A.C. Johnson, C.G. Bergeon, C.J.
proach, Vpt, 2014.
Chapa, A. Agostinelli, et al., Similarity matching of prod-
18. X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman,
ucts based on multiple classification schemes, US Patent
D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen, et al., ML-
9,262,503 (Feb. 16, 2016).
lib: machine learning in Apache Spark, Journal of Machine
4. S.M. Basha, Y. Zhenning, D.S. Rajput, N. Iyengar, D.
Learning Research 17 (1) (2016) 1235–1241.
Caytiles, Weighted fuzzy rule based sentiment prediction
19. J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J.
analysis on tweets, International Journal of Grid and Dis-
Franklin, I. Stoica Graphx, Graph processing in a dis-
tributed Computing 10 (6) (2017) 41–54.
5. S.M. Basha, D.S. Rajput, Sentiment Analysis: Using Artifi- tributed dataflow framework, in: OSDI, vol. 14, 2014,
cial Neural Fuzzy Inference System, IGI Global, 2018. pp. 599–613.
6. S.M. Basha, Y. Zhenning, D.S. Rajput, R.D. Caytiles, N.C.S. 20. S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S.
Iyengar, Comparative study on performance analysis of Mittal, J.M. Patel, K. Ramasamy, S. Taneja, Twitter Heron:
time series predictive models, International Journal of Grid stream processing at scale, in: Proceedings of the 2015
and Distributed Computing 10 (8) (2017) 37–48. ACM SIGMOD International Conference on Management
7. S.M. Basha, H. Balaji, N.C.S. Iyengar, R.D. Caytiles, A of Data, ACM, 2015, pp. 239–250.
soft computing approach to provide recommendation on
CHAPTER 12
An Efficient Biography-Based
Optimization Algorithm to Solve the
Location Routing Problem With
Intermediate Depots for Multiple
Perishable Products
ERFAN BABAEE TIRKOLAEE, ME • ALIREZA GOLI, ME • MANI BAKHSHI, ME •
ARUN KUMAR SANGAIAH, PHD
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00019-1 189
Copyright © 2019 Elsevier Inc. All rights reserved.
190 Deep Learning and Parallel Computing Environment for Bioengineering Systems
duction and sales of such goods plays an important role minimization. Hwang [22] studied a VRP to obtain an
for manufacturers and vendors. The reason is that, if the optimal scheme of food supplies and inventory allo-
distribution of perishable products is not so effective, cation to save the starvation zones. The goal of the
then: problem is to reduce damages and famine in place of
(1) Products would not be delivered to their desti- total traveled distance or total time. Prindezis et al. [34]
nations timely (grocery stores, restaurants, etc.) suggested a service provider application in food mar-
and quality or value (or both) of products would ket centers considering professions, and information to
plunge, solve VRP. In doing so, a tabu search (TS) algorithm was
(2) Inventory costs (e.g., holding costs) of products employed. They designed specialized software for the
would increase significantly, and roads in Athens and implemented it in a unified deliv-
(3) Market demand would decrease and customers ery logistics problem for 690 companies in the central
may prefer other suppliers. food market. They proposed a two-phase algorithm to
Another real nature of the problem is considering mul- solve the problem. In the first phase, a route generation
tiple trips for vehicles. As one of the first research, Fleis- algorithm was used, and in the second phase, a TS al-
chmann [13] studied the implication of VRP with mul- gorithm was utilized to obtain a solution with higher
tiple trips. Olivera and Viera [31] noticed that single-trip quality. Hsu et al. [21] proposed a stochastic process
VRP was not suitable for the vehicles with small capacity of delivering perishable foods and studied a stochas-
or long services. Therefore, studies on VRP with multi- tic VRPTW for perishable foods distribution model to
ple trips attracted some attention [8,4,23]. The location determine the optimal delivery routes, optimal number
routing problem (LRP) is an important logistics prob- of vehicles, optimal departure time of vehicle from the
lem with many implications for full supply chains [27].
distribution center, etc. Zeng et al. [52] proposed two
LRP is a relatively novel problem considering two main
techniques to solve a real-life VRP of soft drink distri-
tasks of a logistic system simultaneously, the facility lo-
bution problem. The main aim was to minimize the
cation problem (FLP) and the VRP [50].
total number of required vehicles. They developed hy-
LRP is a kind of VRP, in which the number and lo-
brid methodologies for real world problems. They could
cations of depots must be determined simultaneously
improve some best-found solutions in the literature.
with routes of vehicles from depots to customers. The
Osvald and Stirn [33] suggested a heuristic algorithm
goal is to minimize the cost of depots location and
to solve a time-dependent VRPTW for fresh vegetables
the cost of distribution (satisfying customers’ demands)
distribution considering perishability as a critical factor
[51].
in the total distribution cost. The problem was formu-
LRP has various applications such as the distribution
of foods [49], locating blood banks [32], distribution of lated as a VRPTW with time-dependent travel times. The
newspapers [25], garbage collection [9], etc. model considered the impact of perishability as part of
Various research has applied different metaheuristic the overall distribution cost.
algorithms in this field, including simulated annealing Ramos et al. [36] developed a nonlinear mathemat-
(SA), biography based optimization (BBO), accelerated ical model for VRPTW considering the effect of road
cuckoo optimization algorithm (ACOA), and so on [15, irregularities on perishing fresh fruits and vegetables.
16]. The most important reason for using these meth- The main objective of their work was to obtain the op-
ods is the solution complexity of the related problems timal distribution routes for fresh fruits and vegetables
to vehicle routing. On the other hand, most of the real considering different road classes with the least amount
cases have a large number of customers and a large of logistics costs. They employed a genetic algorithm
number of depots, and finding the optimal solutions (GA) to solve the problem. Their results showed that
is very difficult and sometimes impossible. To this end, the vehicle routing problem with time windows, con-
metaheuristic algorithms are always proposed as effec- sidering road irregularities and different classes of toll
tive tools. roads, could significantly influence total delivery costs
A brief literature review of the research done to study as compared to the traditional VRP models.
the VRP and LRP for perishable products distribution Govindan et al. [18] developed a multi-objective LRP
is provided as follows. Adenso-Diaz et al. [1] studied with time windows for perishable products distribution.
the distribution of dairy products firstly. The authors The objective is to specify the number and location
considered demand allocation, visit frequency of each of facilities, the amount of products in the echelons
customer per week, and vehicles routes, and suggested and the reduction of incurred costs of greenhouse gas
a version of VRPTW with the aim of distribution costs (GHG) emissions within the network.
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 191
Mirzaei and Seifi [29] proposed inventory routing • Integrating locational and routing decisions in per-
problems (IRPs) for perishable products considering ishable products distribution by proposing a novel
lost sales. They could present a linear model and solve mathematical model;
it using CPLEX. Moreover, they implemented a hybrid • Developing an efficient BBO algorithm to solve the
algorithm to solve large-sized instances. The efficiency large-sized problems;
of their algorithm was verified compared to CPLEX. • Finding the optimal values of BBO parameters using
Recently, Azadeh et al. [3] developed a model of IRP Taguchi design method.
with transshipment for a single perishable product dis- The remaining sections of our work are organized as
tribution. In their problem, vehicle routing and inven- follows. The assumptions and mathematical model of
tory decisions were made concurrently over a planning the problem are described in Sect. 12.2. The proposed
solution method and numerical results with sensitiv-
horizon to cover the customers under a maximum level
ity analysis are respectively presented in Sects. 12.3 and
policy. They used a GA based technique to solve the
12.4. Finally, the conclusions and outlook of the re-
problem. They generated a numerical example to illus-
search are provided in Sect. 12.5. Moreover, Fig. 12.1
trate the validity of the model and the efficiency of the
depicts the proposed methodology of the research.
proposed algorithm. Hiassat et al. [19] applied a GA to
solve their proposed location inventory routing prob-
lem (LIRP) for the distribution of perishable products. 12.2 MODEL DEVELOPMENT
Their proposed GA could achieve appropriate solutions
This section discusses the assumptions and the pro-
within a short run time. posed mathematical formulation of the problem as a
Tirkolaee et al. [42] offered a novel model for the developed model of research done by Tirkolaee et al.
robust multi-trip VRPTW with intermediate depots to [42].
determine the optimal routes in a single-echelon sup- The problem is defined on a complete undirected
ply chain of perishable products. They solved their pro- graph G = (N T , A), with N T = {1, 2, . . . , nt} = N C ∪
posed MILP model using CPLEX solver and demon- N D, and A = {(i, j ) | i, j ∈ N T , i = j }, where N T ,
strated the validity of their proposed model by gen- N C, and N D respectively denote the set of all nodes,
erating and illustrating different problems. Qiu et al. the set of customer nodes, and the set of intermediate
[35] presented a production routing problem (PRP) for depot nodes. Here, N V denotes the set of all vehicles.
products with perishable inventory. They analyzed the Each vehicle has its set of trips, i.e., N P . The set of per-
optimal integrated decisions on the time and amount ishable product that must be delivered to customers is
of delivering and selling products in different time pe- defined by N R.
riods. They could apply an exact branch-and-cut algo- Consider the situation in which there is a supply
rithm to solve the proposed problem. In another recent chain network including a set of customers. We aim to
work, Soysal et al. [39] proposed a green IRP for perish- find the optimal number and locations of intermediate
able products with horizontal collaboration. They con- depots, the optimal number of vehicles, and the opti-
sidered carbon emissions, total routing and fuel con- mal constructed routes within a supply chain. The ob-
sumption costs, inventory and waste cost and driving jective function is to minimize the total cost including
vehicle usage costs, total traveled distance as converted
time and uncertain demands.
to total transportation cost, earliness and tardiness costs
In this research, a multi-trip LRP with time windows
of services, and establishment costs of intermediate de-
considering intermediate depots is addressed, in which
pots. Since the supply chain network is specific to per-
vehicles may end their trip in a different intermediate
ishable products and time windows play an important
depot. Therefore, we are to make the following contri- role. The main assumptions are listed below:
butions: • Multiple perishable products are considered.
• Considering intermediate depots as key centers to • The demand of each customer for different products
provide fresh perishable products within a given is fulfilled only by an intermediate depot and a vehi-
time window and concerning to their proximity to cle.
customers; • A fleet of heterogeneous vehicles is available at inter-
• Considering multiple trips and maximum allowable mediate depots which has a given capacity for each
traveling time and distance for vehicles, and provid- type of product.
ing the possibility of having different departure and • The supply chain network is asymmetric and com-
destination for vehicles in each trip; plete graph (there is a route between any two nodes).
192 Deep Learning and Parallel Computing Environment for Bioengineering Systems
• Vehicles have a maximum allowable capacity for Tables 12.1–12.4 define the indices, parameters,
their traveling distance. non-decision variables and decision variables of the
• Vehicles have a maximum allowable usage time in- proposed model.
cluding traveling times, loading times in intermedi- In Tables 12.1–12.4, sets, indices, parameters and
ate depots, and unloading times at customers’ nodes. various variables are defined.
• Vehicles may have multiple trips.
• Traveling time, tij , and cost, dij , of all vehicles are
the same, but vehicles have different capacities. TABLE 12.1
• No split delivery is allowed, and capacity constraint Indices and sets of the model.
of each vehicle must be satisfied during route con-
N C = {1, 2, . . . , nc} Set of customers
struction.
N D = {1, 2, . . . , nd} Set of intermediate depots
• Each vehicle starts its trip from a given established
intermediate depot and does not necessarily return N T = {1, 2, . . . , nt} Set of customers and interme-
to that intermediate depot, but it must end the trip diate depots (total nodes of the
graph)
at one of the established intermediate depots.
• For each customer, a soft time window is defined N V = {1, 2, . . . , nv} Set of vehicles
which includes two initial and secondary intervals. N P = {1, 2, . . . , np} Set of trips
Delivery cannot be done outside of the initial inter- N R = {1, 2, . . . , nr} Set of perishable products
val [ei , li ] (hard time window), but it can be done i, j , k, l Indices of all nodes
outside of the secondary interval [eei , lli ] (soft time
v Index of vehicles
window) by incurring penalty costs.
p Index of possible trips
• Intermediate depots have different capacities for dif-
ferent perishable products. r Index for perishable products
• According to the parking capacity of vehicles, inter- S An arbitrary subset of customers’
mediate depots can dispatch a given number of vehi- nodes
cles. M An arbitrary large number
• Demand parameter is deterministic.
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 193
vp vp p p vp
xik + xli ≤ 2, ∀k, l ∈ N D, ∀v ∈ N V , ∀p ∈ N P , ddj = (sdi + dij )xij , ∀j ∈ N C, ∀p ∈ N P ,
i∈NC i∈NC i∈N T v∈N V
(12.5) (12.22)
yj rvk p0, if i ∈ N D and p = 1,
sdi = p (12.23)
j ∈N C r∈N R v∈N V ddi , if i ∈ N C and p ∈ N P ,
(12.6) p vp
vp
≤M xkj , ∀k ∈ N D, dddjv = (ddi + dij )xij , ∀j ∈ N D, ∀v ∈ N V ,
j ∈N C v∈N V p∈N P i∈N T p∈N P
vp (12.24)
xki ≤ Y Yk , ∀k ∈ N D, ∀v ∈ N V , ∀p ∈ N P , (12.7)
i∈N C dddjv ≤ DIv , ∀j ∈ N D, ∀v ∈ N V , (12.25)
vp vp
xki ≥ 1 − M(1 − Y Yk ), ∀k ∈ N D, xkj ≤ MFvk , ∀v ∈ N V , ∀k ∈ N D, ∀p ∈ N P ,
i∈N C v∈N V p∈N P j ∈NC
(12.8) (12.26)
vp
xil ≤ Y Yl , ∀l ∈ N D, ∀v ∈ N V , ∀p ∈ N P , (12.9) LTv = ul DEj r yj rvk ,
i∈N C j ∈N C r∈N R k∈N D (12.27)
vp
xil ≥ 1 − M(1 − Y Yl ), ∀l ∈ N D, ∀v ∈ N V , ∀p ∈ N P ,
i∈N C v∈N V p∈N P
U Tv = uu DEj r yj rvk , ∀v ∈ N V ,
(12.10)
j ∈N C r∈N R k∈N D
vp
xij ≤ 1, ∀j ∈ N C, ∀v ∈ N V , ∀r ∈ N R, ∀p ∈ N P , (12.28)
i∈N T vp
(12.11) LTv + U Tv + tij xij ≤ T max, ∀v ∈ N V ,
vp i∈N T j ∈N T p∈N P
oi − oj + Mxij ≤ M − 1, ∀i, j ∈ N C, ∀v ∈ N V , ∀p ∈ N P , (12.29)
(12.12) vp vp+1
vp xkj ≥ xkj ,
xij ≤ |S| − 1, j ∈N C k∈N D j ∈N C k∈N D (12.30)
i∈S j ∈S
i=j (12.13) ∀p = 1, 2, . . . , P − 1, ∀v ∈ N V ,
vp
∀S ⊆ N C; |S| ≥ 2, v ∈ N V , ∀p ∈ N P , xij , Y Yk , Fvk , yj rvk ∈ {0, 1} , Y ei ≥ 0, Y li ≥ 0,
DEj r yj rvk ≤ QGkr , ∀k ∈ N D, ∀r ∈ N R, oi ∈ Z + , ∀k ∈ N D, ∀i, j ∈ N C, (12.31)
j ∈N C v∈N V ∀v ∈ V , ∀r ∈ N R, ∀p ∈ N P .
(12.14)
vp The objective function (12.1) minimizes the total
DEj r xij ≤ V Cvr ,
cost, including the sum of transportation cost, and the
i∈N T j ∈N C (12.15) other costs including vehicle usage costs in intermedi-
∀v ∈ N V , ∀r ∈ N R, ∀p ∈ N P , ate depots, earliness, and tardiness penalty costs, and
p p vp establishment costs of intermediate depots. Eq. (12.2)
ttj = (tti + tij )xij , ∀j ∈ N C, ∀p ∈ N P ,
indicates that each customer is covered by only one ve-
i∈N T v∈N V
hicle and one intermediate depot. Eq. (12.3) ensures
(12.16)
that each customer receives the service only when a ve-
p
ttj = 0, ∀j ∈ N D, ∀p ∈ N P , (12.17) hicle arrives at its corresponding node. Eq. (12.4) en-
p sures the continuity of the vehicles’ routes. Eq. (12.5)
ttj = ttj , ∀j ∈ N C, (12.18)
indicates the assignment of intermediate depots to end
p∈N P
the trips (considering the least distance). Eq. (12.6) de-
ei ≤ tti ≤ li , ∀i ∈ N C, (12.19) fines the relation of variable y to variable x with re-
spect to the intermediate depot giving service to the
customer. Eqs. (12.7) and (12.8) state that when in-
Y Ei ≥ eei − tti , ∀i ∈ N C, (12.20)
termediate depot k is established, a vehicle can start
the travel from it. Eqs. (12.9) and (12.10) indicate that
Y Li ≥ tti − lli , ∀i ∈ N C, (12.21) when an intermediate depot is established, a vehicle
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 195
p vp vp
can finish its trip at it. Eq. (12.11) ensures that each ddj = qij + dij xij , ∀j ∈ N C, ∀p ∈ N P ,
customer should be visited at most once. Eqs. (12.12) i∈N T v∈N V
and (12.13) eliminate the potential sub-tours (based (12.40)
on Miller–Tucker–Zemlin equation [26]). Eqs. (12.14) vp
qij ≥ 0, ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P . (12.41)
and (12.15) express that customers’ demand (covered
by intermediate depot i and vehicle k) must be less than
the capacity of intermediate depot i and vehicle k for Linearization of Eq. (12.24):
product r in the intermediate depot and vehicle, respec- p vp
tively. Eqs. (12.16)–(12.19) guarantee providing service dddjv = (ddi + dij )xij , ∀j ∈ N D, ∀v ∈ N V ,
to customers within the initial interval. Eqs. (12.20) and i∈N T p∈N P
(12.21) calculate the amount of earliness and tardiness (12.24)
for each customer. Eqs. (12.22)–(12.25) ensure that the vp p
maximum allowable distance capacity of vehicles is not wij ≤ ddi , ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P , (12.42)
violated. Eq. (12.26) indicates that a vehicle can serve vp vp
wij ≤ Mxij , ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P , (12.43)
the customers when its usage cost is paid. Eqs. (12.27) vp vp
and (12.28) compute the total loading and unloading wij ≥ di − M(1 − xij ), ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P ,
times of vehicles, respectively. Eq. (12.29) reflects the (12.44)
usage time limitation of the vehicles. Eq. (12.30) guar- vp vp
antees the sequencing number of the vehicle trips from dddjv = wij + dij xij , ∀j ∈ N D, ∀v ∈ N V ,
p to p + 1 by each vehicle. Eq. (12.31) specifies the types i∈N T p∈N P
of the decision variables. (12.45)
vp
wij ≥ 0, ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P . (12.46)
12.2.1.1 Linearization of the Nonlinear
Equations
12.2.2 An Illustration
Linearization of Eq. (12.16):
Applying the above linearizations, the proposed model
p
ttj =
p vp
(tti + tij ) xij , ∀j ∈ N C, ∀p ∈ N P , turns into an MILP. CPLEX solver in the GAMS opti-
i∈N T v∈N V
mization package is used to validate the model.
(12.16) Fig. 12.2 depicts a schematic description of the prob-
vp p lem. We see four potential locations for the establish-
hij ≤ tti , ∀i, j ∈ N T , v ∈ N V , ∀p ∈ N P , (12.32) ment of intermediate depots and ten customers in the
vp vp
hij ≤ Mxij , ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P , network. Two perishable products and one type of ve-
(12.33) hicle are considered in intermediate depots with each
vp vp vehicle being allowed to make three travels at most.
hij ≥tti − M(1 − xij ), ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P ,
(12.34)
p vp vp
TABLE 12.5
ttj = hij + tij xij , ∀j ∈ N C, ∀p ∈ N P , Solution to the problem specified in Fig. 12.1.
i∈N T v∈N V
(12.35) Vehicle 2 in interme- Intermediate depot 2 – 1 – 2 –
diate depot 2 Intermediate depot 3 – 3 – 4 –
vp
hij ≥ 0, ∀i, j ∈ N T , v ∈ N V , ∀p ∈ N P . (12.36) Intermediate depot 2 – 10 – 9 –
8 – Intermediate depot 2
Linearization of Eq. (12.22): Vehicle 1 in interme- Intermediate depot 3 – 5 – 6 – 7
p p vp
diate depot 3 – Intermediate depot 3
ddj = (sdi + dij )xij , ∀j ∈ N C, ∀p ∈ N P ,
i∈N T v∈N V
(12.22)
After solving the problem, two intermediate depots
vp p
qij ≤ sdi , ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P , (12.37) are established and each vehicle makes two travels as
vp vp specified in Table 12.5. As it is obvious, vehicle 2 is uti-
qij ≤ Mxij , ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P , (12.38)
lized in intermediate depot 2 and has three trips, and
vp p vp
qij ≥ sdi − M(1 − xij ), ∀i, j ∈ N T , ∀v ∈ N V , ∀p ∈ N P , vehicle 1 is utilized in intermediate depot 3 and has just
(12.39) one trip.
196 Deep Learning and Parallel Computing Environment for Bioengineering Systems
12.3 SOLUTION METHOD novel intelligent approach called biography based op-
In recent years, many researchers have designed various timization (BBO). This approach has common char-
metaheuristic algorithms to solve practical optimiza- acteristics with other biologically based optimization
tion problems [14,3,6,17,46]. Two optimization meth- techniques such as GA and particle swarm optimiza-
ods, namely, a BBO algorithm and CPLEX solver, are tion (PSO). BBO is a population-based evolutionary
employed to solve the MILP problem. As no benchmark algorithm that is inspired by the migration of animals
is available in the literature according to the proposed and birds between islands. The islands with appropriate
model, two optimization methods have been adopted conditions for species have high habitat suitability in-
in this paper for validation purposes. The flow graph of dices (HSIs). The features determining an HSI include
the proposed approach is depicted in Fig. 12.3. factors such as rainfall, herbal diversity and temperature
that are called suitability index variables (SIVs).
12.3.1 Introduction to Biography Based The island with a high HSI value has a low immigra-
Optimization Algorithm tion rate because it is already filled with other species
Simon [38] analyzed biography science, the science of and cannot accommodate new ones. As islands with
studying geographical distribution of species, and its low HSI values contain small populations, their immi-
corresponding mathematical development to solve op- gration rate is high. Because suitability of a place is pro-
timization problems, which led to the introduction of a portional to its biographical diversity, immigration of
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 197
new species to a place with a high HSI value increases (1 and n stand for the best and worst solution, respec-
the HSI value. Similar to other evolutionary algorithms tively). A solution can be modified into other solutions
such as GA with mutation and cut operators, in BBO, with a given probability. When a solution is selected
migration and mutation operators introduce appropri- to be modified, then immigration rate λ is applied to
ate changes in the population of the generations. BBO stochastically determine whether each of the SIV values
algorithm has been implemented to solve various opti- should be modified or not. If a given SIV value in a given
mization problems with high efficiency [53]. solution Si is selected to be modified, then we use the
emigration rate μ of the other solutions to probabilis-
12.3.1.1 Migration Operator tically decide which of the solutions should migrate a
A set of candidate solutions are introduced by an array randomly selected SIV to solution Si .
of integers. We can consider each integer in the solu- As with other population-based optimization algo-
tion array as an SIV (similar to a gene in GA). Moreover, rithms, some sort of elitism is typically incorporated in
assume that there are methods to determine the ideal order to retain the best solutions in the population. This
solutions. Ideal solutions have high HSI values (corre- prevents the best solutions from being corrupted by im-
sponding to habitats with high species) and weak so- migration.
lutions have low HIS values (corresponding to habitats
with low species). The HSI value in BBO is similar to the 12.3.1.2 Mutation Operator
fitness value in other population-based optimization al- Cataclysmic events can highly change the HSI of natural
gorithms. habitat. Hence, a habitat’s HSI can change suddenly due
Each habitat (solution) has an immigration rate λ to random events. This can be modeled as mutation of
and an emigration rate μ, which are applied to appor- SIV in BBO.
tion the information between solutions stochastically. This mutation pattern tends to increase diversity
These rates are calculated as follows: among the population. The highly probable solutions
tend to be more dominant among the population. This
ki mutation method makes low HSI solutions likely to
λi = I 1 − ∀i, (12.47) mutate, giving them a chance to be improved. It also
n
makes high HSI solutions likely to mutate, which pro-
ki
μi = E ∀i, (12.48) vides a chance to be improved even more than they
n already have.
The probability of the number of species in the habi-
where I and E are respectively the maximum immigra-
tat is used to specify the mutation rate as follows:
tion rate and the maximum emigration rate, and ki is
the number of species in habitat i. Here, ki takes value 1 − ps
between 1 and n, where n is the size of the population m (s) = mmax ( ) (12.49)
pmax
198 Deep Learning and Parallel Computing Environment for Bioengineering Systems
where mmax is the maximum mutation rate which is set 12.3.3 Initial Solution Generation
by the user, pmax is the maximum species count, and In the initial solution generation phase, the first type
ps is the probability of including s species within the of the string, which is binary and represents the es-
habitat (for more details, see [38]). Note that an elitism tablishment of intermediate depots, is first generated
approach is used to keep the characteristics of the habi- randomly. Each facility is established or not with equal
tat having the best solution in BBO process so that even probabilities.
if mutation takes down its HSI, it can be saved and re- After specification of the established intermediate
covered if required. depots, in order to generate the solution in the second
type of the string, the vehicles being used are selected
12.3.2 Solution Representation first. For this, the objective is to select vehicles with most
To represent a solution, the following coding system is capacity and least usage cost. Hence, a CQR index is de-
used. The coding system includes two types of string in fined as follows:
each solution. The first string is a 1-dimensional string CVv
with length N D having zero–one values. The value 1 in CQRv = (12.50)
cell i means the establishment of an intermediate depot r V Cvr
i. For example, we have 4 potential places for the estab- where CVv is the usage cost of vehicle v in one plan-
lishment of intermediate depots in the following string, ning period and V Cvr is the capacity of vehicle v for
where intermediate depots are established in places 1 product r.
and 4 from left to right: The CQR index is calculated for the vehicles. Then,
vehicles are sorted in an ascending order based on the
1 0 0 1 values of the indices. In other words, a vehicle with the
least CQR value has a higher priority for selection. In
The second string has the length of Lmax + 1 and order to determine the route, a vehicle is selected, and
width of nd (number of intermediate depots.) In this then demand nodes are assigned to the selected vehi-
string, visiting nodes, the order of visits, and the types cle randomly. In each assignment, the capacity, distance
of the used vehicle are determined. In each row of this and usage time constraints of the vehicle are tested. If
string, the first box represents the intermediate depot, the remaining capacity of the vehicle is enough to assign
and other boxes represent demand boxes in order of the another demand node, the remaining distance capacity
visit. In the last box of each row, the type of the used and the remaining usage time of the vehicle are enough
vehicle is encoded. For example, if we consider four to assign at least one more node, and so the assignment
potential nodes for the establishment of intermediate process would continue considering the possibility of
depots, where we have intermediate depots in nodes 1 multiple trips. Otherwise, one of the two following pro-
and 4, and mark demand nodes with 5 to 10 and vehicle cedures is applied randomly:
with 1 and 2, the following string represents a sample • Returning the vehicle to an intermediate depot, re-
solution of the problem (Lmax = 7): plenishing, and continuing to assign demand nodes
to it, or
1 5 8 4 9 4 0 2 • Returning the vehicle to an intermediate depot, se-
4 6 7 10 1 0 0 1 lecting another vehicle according to the CQR index,
and assigning remaining nodes to it according to the
We note that in the above solution, vehicle 2 starts procedure applied to the previous vehicle that has
from intermediate depot 1, visits nodes 5 and 8, and completed its tour.
then goes to intermediate depot 4. In the intermediate Using the above process, we can expect all the gen-
depot 4, it is loaded again and visits node 9, and finally, erated solutions to be feasible. The only constraint that
returns to the intermediate depot 4. Vehicle 1 starts from can be violated is the hard time window constraint. To
intermediate depot 4, visits demand nodes 6, 7, and 10, prevent such a violation, an arbitrary penalty cost of 104
and finally goes to intermediate depot 1. is embedded in the objective function. For example, if a
It should be noted that this solution representation trip finishes at tt = 90 and li = 85, then (90–85) × 104 is
does not allow a vehicle to visit more than Lmax nodes. embedded into the objective function.
Moreover, capacity and distance constraints of vehicles
and vehicles’ usage times are checked to be feasible, and 12.3.4 Immigration Phase
thus the search for the optimal solution is done in the First, variables λ and μ are calculated for different solu-
feasible region. tions and then based on the main mechanism of BBO
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 199
algorithm, the solutions of each area are combined us- As it is clear, the full factorial design needs 36 = 729
ing the one-point crossover and immigration rate λ. experiments for BBO, which is not economical in terms
Similarly, for inter-area emigrations, solutions are com- of cost and time. By searching among different Taguchi
bined using the two-point crossover and emigration rate tables using the Minitab statistical software, the table
μ. As it is obvious, it can be concluded that elitism is related to L27 (36 ) presentation is chosen based on our
implemented by setting λ = 0 for the P n best habitats, goal. After testing the data given in Table 12.6, the aver-
where P n is a user-selected elitism parameter specifying age rate of S/N for 27 states of Taguchi, as done for BBO
how many of the best solutions should be kept from algorithm, and optimal values of parameters of BBO al-
one generation to the next. gorithm are given in Table 12.7.
TABLE 12.8
Information corresponding to randomly generated problems.
Problem n Depot Customers Products Types of vehicles Tmax
P1 5 3 2 2 2 480
P2 7 3 4 2 2 480
P3 9 4 5 3 3 480
P4 10 4 6 3 3 480
P5 13 5 8 3 4 480
P6 14 5 9 3 4 480
P7 15 5 10 3 5 480
P8 18 6 12 3 5 480
P9 21 6 15 4 6 480
P10 32 8 25 4 7 480
P11 38 8 30 5 7 480
P12 50 10 40 5 8 480
P13 65 15 50 8 10 480
P14 80 20 60 9 12 480
P15 100 30 70 10 15 480
TABLE 12.9
Computational results.
Problem CPLEX BBO
TC RT TC RT GAP (%)
P1 6032.2 0.2 6035 61.2 0.21
P2 12046.2 6.79 12050 97.4 0.99
P3 24074.9 198.73 24094.1 117.9 1.72
P4 12055.3 902.3 12106.4 142.7 1.21
P5 12071.4 1449.01 12371.3 168.5 2.50
P6 12079.8 3600 12990.5 193.1 0.39
P7 12761.1 3600 12897.2 207.6 1.15
P8 14672.8 3600 14995.1 267.16 0.98
P9 15128.5 3600 15297.2 304.7 2.51
P10 19796.3 3600 20679.8 325.2 0.93
P11 23498.1 3600 23134.2 346.1 0∗∗
P12 –∗ 3600 28476.6 349.8 0∗∗
P13 –∗ 3600 39421.1 366.1 0∗∗
P14 –∗ 3600 57108.5 382.8 0∗∗
P15 –∗ 3600 94252.3 415.9 0∗∗
AVE – 2570.468 – 249.744 0.839
*No solution found.
**BBO performs better in large sized problems.
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 201
They can determine the optimal policies by studying parison of the proposed algorithm with CPLEX shows
the sensitivity analysis of the key parameters such as de- a high efficiency for the proposed algorithm. Finally,
mand, and provide the required resources to cover all a sensitivity analysis is performed on the demand pa-
the customers timely. rameter to show the behavior of the objective function
against parameter changes. The main limitation of this
research is to find a real case study to indicate the appli-
12.5 DISCUSSION, CONCLUDING REMARKS cation of this research in real-life issues.
AND FUTURE RESEARCH DIRECTIONS For future studies, other metaheuristic algorithms
This research proposes a novel MILP model for LRP in and new strategies to generate better initial solutions
the distribution of multiple perishable products with may be explored and integrated with the proposed algo-
multi-trip heterogeneous vehicles in order to establish rithm. Moreover, studying inventory management poli-
intermediate depots in the best locations for covering cies in intermediate depots can be considered in the
the customer’s demands with respect to their requested model. In addition, the uncertain nature of the demand
time windows. As a main difference, it is assumed that parameter may be investigated using well-known tools
vehicles may not return to the same intermediate depot such as grey system, fuzzy theory and robust optimiza-
which they started their trips from there. This possibility tion.
would help to keep the freshness of products. The ob-
jective is to minimize total costs due to the traveled dis-
tance, vehicle usage cost, intermediate depots’ establish- REFERENCES
ment costs, and earliness and tardiness penalty costs. 1. B. Adenso-Diaz, M. Gonzalez, E. Garcia, A hierarchical ap-
After reviewing the recent studies in terms of mathe- proach to managing dairy routing, Interfaces 28 (2) (1998)
matical modeling and solution method, the advantages 21–31.
of this research have been cleared. We implemented a 2. M. Alinaghian, H. Amanipour, E.B. Tirkolaee, Enhance-
novel metaheuristic algorithm, namely, BBO algorithm, ment of inventory management approaches in vehicle
to solve the large-sized instance problems effectively. routing-cross docking problems, Journal of Supply Chain
Many related works applied famous metaheuristics like Management Systems 3 (3) (2014).
3. A. Azadeh, S. Elahi, M.H. Farahani, B. Nasirian, A ge-
GA, PSO and SA. Furthermore, Taguchi design method
netic algorithm-Taguchi based approach to inventory rout-
is implemented to adjust the algorithm parameters op-
ing problem of a single perishable product with trans-
timally. Several numerical examples are generated ran- shipment, Computers & Industrial Engineering 104 (2017)
domly in small, medium, and large sizes in order to val- 124–133.
idate the proposed mathematical model and evaluate 4. N. Azi, M. Gendreau, J.Y. Potvin, An exact algorithm for a
the effectiveness of BBO. The validation of the mathe- vehicle routing problem with time windows and multiple
matical model and algorithm is done using the CPLEX use of vehicles, European Journal of Operational Research
solver as an exact method. On the other hand, the com- 202 (3) (2010) 756–763.
204 Deep Learning and Parallel Computing Environment for Bioengineering Systems
5. E. Babaee Tirkolaee, M. Alinaghian, M. Bakhshi Sasi, M.M. 19. A. Hiassat, A. Diabat, I. Rahwan, A genetic algorithm ap-
Seyyed Esfahani, Solving a robust capacitated arc routing proach for location–inventory routing problem with per-
problem using a hybrid simulated annealing algorithm: ishable products, Journal of Manufacturing Systems 42
a waste collection application, Journal of Industrial Engi- (2017) 93–103.
neering and Management Studies 3 (1) (2016) 61–76. 20. W. Ho, G.T. Ho, P. Ji, H.C. Lau, A hybrid genetic algo-
6. E. Babaee Tirkolaee, P. Abbasian, M. Soltani, S.A. Ghaffar- rithm for the multi-depot vehicle routing problem, Engi-
ian, Developing an applied algorithm for multi-trip vehi- neering Applications of Artificial Intelligence 21 (4) (2008)
cle routing problem with time windows in urban waste 548–557.
collection: a case study, Waste Management & Research 21. C.I. Hsu, S.F. Hung, H.C. Li, Vehicle routing problem
37 (1_suppl) (2019) 4–13. with time-windows for perishable food delivery, Journal of
7. K. Braekers, K. Ramaekers, I. Van Nieuwenhuyse, The ve- Food Engineering 80 (2) (2007) 465–475.
hicle routing problem: state of the art classification and 22. H.S. Hwang, A food distribution model for famine re-
review, Computers & Industrial Engineering 99 (2016) lief, Computers & Industrial Engineering 37 (1–2) (1999)
300–313. 335–338.
8. J. Brandao, A. Mercer, A tabu search algorithm for 23. C.K.Y. Lin, R.C.W. Kwok, Multi-objective metaheuristics for
the multi-trip vehicle routing and scheduling problem, a location-routing problem with multiple use of vehicles
European Journal of Operational Research 100 (1997) on real data and simulated data, Journal of the Operational
180–191. Research Society 175 (2006) 1833–1849.
9. R. Caballero, M. Gonzalez, F.M. Guerrero, J. Molina, C. Par- 24. O. Jabali, T. Woensel, A.G. De Kok, Analysis of travel times
alera, Solving a multi-objective location routing problem and CO2 emissions in time-dependent vehicle routing,
with a metaheuristic based on tabu search: application to Production and Operations Management 21 (6) (2012)
a real case in Andalusia, European Journal of Operational 1060–1074.
Research 177 (3) (2007) 1751–1763. 25. S.K. Jacobsen, O.B.G. Madsen, A comparative study of
10. I.M. Chao, B.L. Golden, E. Wasil, A new heuristic for the heuristics for a two level routing-location problem, Eu-
ropean Journal of Operational Research 5 (6) (1980)
multi-depot vehicle routing problem that improves upon
378–387.
best-known solutions, American Journal of Mathematical
26. I. Kara, G. Laporte, T. Bektas, A note on the lifted Miller–
and Management Sciences 13 (3–4) (1993) 371–406.
Tucker–Zemlin subtour elimination constraints for the ca-
11. G.B. Dantzig, J.H. Ramser, The truck dispatching problem,
pacitated vehicle routing problem, European Journal of
Management Science 6 (1) (1959) 80–91.
Operational Research 158 (3) (2004) 793–795.
12. R. Dondo, J. Cerdá, A reactive MILP approach to the mul-
27. R.B. Lopes, S. Barreto, C. Ferreira, B.S. Santos, A decision-
tidepot heterogeneous fleet vehicle routing problem with
support tool for a capacitated location-routing problem,
time windows, International Transactions in Operational
Decision Support Systems 46 (1) (2008) 366–375.
Research 13 (5) (2006) 441–459.
28. S.H. Mirmohammadi, E. Babaee Tirkolaee, A. Goli, S.
13. B. Fleischmann, The Vehicle Routing Problem with Dehnavi-Arani, The periodic green vehicle routing prob-
Multiple Use of Vehicles, Working paper, Fachbereich lem with considering of time-dependent urban traffic and
Wirtschaftswissenschaften, Universität Hamburg, 1990. time windows, Iran University of Science and Technology
14. F.P. Goksal, I. Karaoglan, F. Altiparmak, A hybrid discrete 7 (1) (2017) 143–156.
particle swarm optimization for vehicle routing problem 29. S. Mirzaei, A. Seifi, Considering lost sale in inventory rout-
with simultaneous pick-up and delivery, Computers & In- ing problems for perishable goods, Computers & Indus-
dustrial Engineering 65 (1) (2013) 39–53. trial Engineering 87 (2015) 213–227.
15. A. Goli, A. Aazami, A. Jabbarzadeh, Accelerated cuckoo op- 30. S.N. Nemade, M.T. Kolte, S. Nemade, Multi-user detection
timization algorithm for capacitated vehicle routing prob- in DS-CDMA system using biogeography based optimiza-
lem in competitive conditions, International Journal of Ar- tion, Procedia Computer Science 49 (2015) 289–297.
tificial Intelligence 16 (1) (2018) 88–112. 31. A. Olivera, O. Viera, Adaptive memory programming for
16. A. Goli, S.M.R. Davoodi, Coordination policy for produc- the vehicle routing problem with multiple trips, Comput-
tion and delivery scheduling in the closed loop supply ers & Operations Research 34 (2007) 28–47.
chain, Production Engineering (2018) 1–11. 32. I. Or, W.P. Pierskalla, A transportation location-allocation
17. A. Goli, E.B. Tirkolaee, B. Malmir, G.B. Bian, A.K. Sangaiah, model for regional blood banking, AIIE Transactions
A multi-objective invasive weed optimization algorithm 11 (2) (1979) 86–95.
for robust aggregate production planning under uncertain 33. A. Osvald, L.Z. Stirn, A vehicle routing algorithm for the
seasonal demand, Computing (2019) 1–31. distribution of fresh vegetables and similar perishable
18. K. Govindan, A. Jafarian, R. Khodaverdi, K. Devika, Two- food, Journal of Food Engineering 85 (2) (2008) 285–295.
echelon multiple-vehicle location-routing problem with 34. N. Prindezis, C.T. Kiranoudis, D. Marinos-Kouris, A
time windows for optimization of sustainable supply business-to-business fleet management service provider for
chain network of perishable food, International Journal of central food market enterprises, Journal of Food Engineer-
Production Economics 152 (2014) 9–28. ing 60 (2) (2003) 203–210.
CHAPTER 12 An Efficient Biography-Based Optimization Algorithm 205
35. Y. Qiu, J. Qiao, P.M. Pardalos, Optimal production, re- lection considering drivers and crew’s working time, Waste
plenishment, delivery, routing and inventory management Management 76 (2018) 138–146.
policies for products with perishable inventory, Omega 82 45. E.B. Tirkolaee, A.A.R. Hosseinabadi, M. Soltani, A.K. San-
(2019) 193–204. gaiah, J. Wang, A hybrid genetic algorithm for multi-trip
36. T.R.P. Ramos, M.I. Gomes-Salema, A.P. Barbosa-Povoa, A green capacitated arc routing problem in the scope of ur-
multi-product, multi-depot vehicle routing problem in a ban services, Sustainability 10 (5) (2018) 1366.
reverse logistics system: comparative study of an exact for- 46. E.B. Tirkolaee, A. Goli, M. Hematian, A.K. Sangaiah,
mulation and a heuristic algorithm, in: Livro de actas da T. Han, Multi-objective multi-mode resource constrained
14° Congresso da APDIO, IO 2009, 2009, pp. 195–202. project scheduling problem using Pareto-based algo-
37. K. Sawaki, Optimal policies in continuous time inventory rithms, Computing (2019) 1–24.
control models with limited supply, Computers & Mathe- 47. P. Toth, D. Vigo, The Vehicle Routing Problem, SIAM
matics with Applications 46 (7) (2003) 1139–1145. Monographs on Discrete Mathematics and Applications,
38. D. Simon, Biogeography-based optimization, IEEE Trans- SIAM, Philadelphia, PA, 2002.
actions on Evolutionary Computation 12 (6) (2008) 48. T. Vidal, T.G. Crainic, M. Gendreau, N. Lahrichi, W. Rei, A
702–713. hybrid genetic algorithm for multidepot and periodic vehi-
39. M. Soysal, J.M. Bloemhof-Ruwaard, R. Haijema, J.G. van cle routing problems, Operations Research 60 (3) (2012)
der Vorst, Modeling a green inventory routing problem for 611–624.
perishable products with horizontal collaboration, Com- 49. C. Watson-Gandy, P. Dohrn, Depot location with van sales-
puters & Operations Research 89 (2018) 168–182. men: a practical approach, Omega 1 (3) (1973) 321–329.
40. G. Taguchi, S. Chowdhury, Y. Wu, Taguchi’s Quality Engi- 50. V.F. Yu, S.W. Lin, W. Lee, C.J. Ting, A simulated anneal-
neering Handbook, Wiley and Sons, 2005. ing heuristic for the capacitated location routing prob-
41. L. Tansini, M.E. Urquhart, O. Viera, Comparing Assign- lem, Computers & Industrial Engineering 58 (2) (2010)
ment Algorithms for the Multi-depot VRP, Reportes Técni- 288–299.
cos 01-08, UR, FI – INCO 2001. 51. M.H.F. Zarandi, A. Hemmati, S. Davari, The multi-depot
42. E.B. Tirkolaee, A. Goli, M. Bakhsi, I. Mahdavi, A robust capacitated location-routing problem with fuzzy travel
multi-trip vehicle routing problem of perishable products times, Expert Systems with Applications 38 (8) (2011)
with intermediate depots and time windows, Numerical 10075–10084.
Algebra, Control and Optimization 7 (4) (2017) 417–433. 52. L. Zeng, H.L. Ong, K.M. Ng, S.B. Liu, Two composite meth-
43. E.B. Tirkolaee, M. Alinaghian, A.A.R. Hosseinabadi, M.B. ods for soft drink distribution problem, Advances in Engi-
Sasi, A.K. Sangaiah, An improved ant colony optimization neering Software 39 (5) (2008) 438–443.
for the multi-trip Capacitated Arc Routing Problem, Com- 53. Y.J. Zheng, H.F. Ling, X.L. Xu, S.Y. Chen, Emergency
puters & Electrical Engineering (2018), https://doi.org/10. scheduling of engineering rescue tasks in disaster relief op-
1016/j.compeleceng.2018.01.040, in press. erations and its application in China, International Trans-
44. E.B. Tirkolaee, I. Mahdavi, M.M.S. Esfahani, A robust peri- actions in Operational Research 22 (3) (2015) 503–518.
odic capacitated arc routing problem for urban waste col-
CHAPTER 13
13.1 INTRODUCTION dom methods are time consuming and produce subop-
Computational requirements from the current technol- timal results. This calls for automated approaches which
ogy are demanding advancements in consumer prod- have the ability to generate several possible solutions
ucts for accurate and fast decisions from a hardware– efficiently and swiftly. This target can be achieved by us-
software platform [2]. Requirements can be satisfied by ing the computer aided techniques for the exploration
building faster devices such as VLSI/ASIC chips for given of the solution space. Systolic array mapping can be
specifications. The upper limit for the maximum speed formulated mathematically as a problem of constraint
at which a chip can be operated makes building an inte- optimization. The best solution of design vectors is se-
grated circuit (IC) from scratch for a specific application lected through rigorous scrutiny of solution candidates
unfeasible. Expensive cooling systems and limitation on by evaluating an individual fitness function. The main
integration density with silicon are some more reasons intention is to minimize the total delay which is asso-
for not opting the choice of developing an IC. An al- ciated with each edge of the processing element and
ternative method is to adopt pipelining and parallel also to maximize the hardware utilization efficiency.
processing techniques with multiple processors to per- Automated learning–searching algorithms such as EA
form the task. Based on the available choices, the second are chosen for designing systolic array, which tries to
method is more conservative and can achieve maximum learn the best way to reach the optimal solution through
performance through transformation of data processing bio-inspired mechanisms. In designing an evolvable
architecture. hardware that adapts to different run time configura-
According to classification of computers by Flynn tions, evolutionary algorithms are preferred for provid-
[3], any architecture can be classified as one of the four ing minimum evolve time of configuration [5]. Bio-
data processing architectures based on the number of inspired computing (BIC) is considered as the major
instruction and data streams. Signal transmission strate- domain in computational intelligence, where the evo-
gies, array construction, methods of programming com- lution strategy of species has been mimicked to derive a
putational units and algorithmic adaptation are to be mathematical model to reach optimum solution. Learn-
decided for parallel architecture. When multiple instruc- ing from nature perspective started in the early 1950s
tions are being carried on a single data available in a and had picked up pace from 1990s. The concept of
processor, it is an MISD (multiple instruction stream learning a coping mechanism for difficult situations by
single data stream) processor. In systolic arrays, instead a species has been efficiently designed for computa-
of following the process where a sequence of locally tionally intensive tasks. Of all the algorithms based on
stored instruction are fetched and executed, raw and species evolution and group dynamics, a recent devel-
processed data moves around the architecture, and the opment of bio-inspired evolutionary computation had
instructions of data processing are stagnant in the com- mathematically formulated behavior of adjustment of
putational units. Systolic arrays are an example of MISD internal organ configuration in a human body, when
processors and were designed in 1978 [4]. an imbalance condition is forced [6]. This algorithm,
Researchers currently work on developing an archi- termed allostatic optimization (AO), deals with the in-
tecture that will be economically, logically and environ- herent feedback mechanism when an instability is de-
mentally stable, so that the design can be designated tected.
as optimum architecture. In fulfilling such a demand, Evolutionary programming has been employed in
designs are identified and tested using a trial-and-error this chapter to identify several optimal design vectors
method for satisfying parameters such as throughput, for systolic architecture. Swarm intelligence methods
power, area, speed, cost, accuracy, and design time. Ran- like particle swarm and ant colony optimization has
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00020-8 207
Copyright © 2019 Elsevier Inc. All rights reserved.
208 Deep Learning and Parallel Computing Environment for Bioengineering Systems
been applied in discrete form in order to identify the • Computationally, if a processor accepts a credit on
comparative performance analysis and to identify the an aid link at time t, then it accepts a credit at time
optimal solution. Genetic and memetic algorithms im- t + on the interrelated yield link, where is the
prove the genetic character of a population in the di- time length that is independent of the communi-
rection which improves the fitness. The characteristics cation size, adaptation of the link and place of the
and superiority of obtained systolic architecture are de- processor [7].
pendent on the fundamental design vectors for a sys- Pipelining works on the performance of a machine
tolic array, such as projection, processor and schedul- by dividing a process into a number of stages with
ing vectors, selected through iterative algorithms. Since added storage units in the process line [8]. A pipeline
parallelism and pipelining are the superior features of uses the available computation time justifiably. Parallel
systolic arrays, the designed architecture is expected to processing is utilizing many similar processors to deal
offer maximum hardware utilization efficiency. with different data simultaneously to improve through-
put. Implementation of arithmetic, logical or other type
of operation through pipelined architecture improves
13.2 SYSTOLIC ARRAYS the performance of computing machines and gives an
A systolic system is an interconnection of processing ele- efficient output with lesser time of execution [9]. For
ments that specifically compute and distribute data over applications with comparable computational duration
the system [4]. The term “systolic” has been selected and sampling rate, parallel architectures are chosen for
exclusively to represent the data traversal through the tasks with operations proportional to integral power
architecture resembling the function of the heart. In a of points per data unit [10]. Novel architectures with
systolic computing method, each processor frequently high efficiency can be designed when both paralleliza-
pumps the data inside and outside a cell for realizing tion and pipelining are implemented as a hybrid design.
some brief computing, and the data is made available Out of the commonly available parallel MISD designs,
at the local connection. systolic and wavefront arrays [11] are popular for var-
The basic structure of systolic architecture is demon- ious advantages. Wave front processors consist of an
strated in the figure given below. Fig. 13.1 explains the array of similar or dissimilar processors, each with its
own local memory and connected in a nearest-neighbor
topology [12]. Each processor usually performs a ded-
icated computation. Hybrids containing two or more
different cells are possible. The cells fire asynchronously
when all required inputs are available [13]. It is suitable
of computationally exploiting algorithms [14] that use
asynchronous data flow mechanism. Systolic arrays are
most appropriate for computationally intensive appli-
cations with the inherent massive parallelism because
they have capability of dealing with modular, concur-
rent, regular, synchronous, rhythmic processes that re-
quire repetitive and intensive computation. Compared
to wavefront arrays, the systolic model suits very well
for special purpose and high performance computer de-
vices [13]. It works under a global clock.
FIG. 13.1 Systolic architecture. The main principle of operation is to fetch the data
from memory and use it efficiently for computation.
ability to increase the computation power of processors Functional modules are designed for each systolic cell
when arranged in systolic fashion. and data are allotted for unit computation [15]. Sys-
A systolic array is an interaction of processors, in tolic arrays are suitable for designing high-level com-
which the processors can be located at a network edge putation in hardware structures [16]. Simple, regular
of a bound so that interconnections lead to better implementations and
• Topologically, if there is focused link from the pro- higher densities similar to the design and implemen-
cessor at place I to the processor at place I + d for tation of gate arrays. A highly compact design requires
some d, then there is such a link for each I inner the both good performance and low demand for support-
grid. ing entities [17]. By replacing a single processing ele-
CHAPTER 13 Evolutionary Mapping Techniques for Systolic Computing System 209
ment with an array of processing elements or cells, a ral networks [33] and image processing [34–39]. Sys-
higher computation throughput can be accomplished tolic arrays have been used in the most advanced neuro-
without demanding more memory bandwidth. Systolic computer for machine learning [40]. Systolic arrays are
arrays have adaptability, modular structure, local inter- designed based on the dependence graph (DG) which
connections, lesser bandwidth demands and compute- will be transformed to space–time representation. The
accelerated operations. Systolic arrays generally have transformation techniques rely on three vectors, namely
high rate of input/output, and they are well-matched projection, scheduling, and processor vectors. From the
for the demanding parallel operations. three vectors, edge mapping is performed to construct
H.T. Kung et al. invented a systolic array, and many the systolic architecture [41]. From a vast space of pos-
more applications were designed such as QR decom- sible solutions, there is a strong need for developing an
position [18], matrix triangulation, convolution [19], automated and efficient approach for designing the ex-
multipliers [20], programmable chip [21], fault toler- act systolic array for a given application.
ance applications [22], linear time algorithms, polyno-
mial computation, warp processors, etc. The best known
example of a systolic array is iWarp processor of Intel 13.3 EVOLUTIONARY ALGORITHMS
which uses a linear array processor connected by data Design environments have expanded with need for
buses [23]. The most recent systolic array design was more automated processes in real-world optimization
Google’s Tensor Processing Unit (TPU) known for its su- problems. Applications such as computer vision [42],
perior configuration and performance. TPUs have been robotics, big data analytics, and bioinformatics [43]
deployed in Google’s data centers for more than a year require algorithms to be designed for high efficiency
and they have demonstrated the superior performance and robustness. For designing such optimization pro-
of machine learning through systolic arrays. Each com- cesses, the current trend is machine learning and search
putational cell is a matrix multiplication [24] unit based methodologies [44].
on systolic arrays [25]. Each element is an MAC unit, An evolutionary algorithm (EA) is an optimization
and a systolic array produces the highest density of MAC algorithm that has mimicked the biological mechanism
units compared to a full core design. Fig. 13.2 describes such as mutation, recombination, and natural selection
various implementations available for TPU with a men- to find an optimal design within specific constraints
tion of design with systolic arrays. [45].
Systolic arrays are used in various applications, in- Evolutionary algorithms are rapidly developing asso-
cluding language recognition [26], signal processing ciative analysis, in which a collection of techniques and
[27], relational database operations [28], matrix arith- systems are learned for managing a large complicated
metic [29–31], character string manipulation [32], neu- problem. Many techniques are available under the class
210 Deep Learning and Parallel Computing Environment for Bioengineering Systems
of evolutionary algorithms that differ in the representa- problems in run time. Unlike classical system design,
tion of solution, implementation details and how the where the designer decides or calculates the structure
particular problem is applied. The general algorithm of and configuration of the system based on the problem
an evolutionary procedure is given below: specifications, EH uses an evolutionary algorithm (EA)
• Select an initial population x0 = {x10 , x20 , . . . , xN0 }, to tune its parameters or structure in order to find the
xi ∈ S, where S is the search space; optimal configuration for a certain problem according
• Determine the value of objective function f (x0 ) for to a set of training samples. These training samples are
each member of population; representative examples of the problem that needs to be
• Repeat for every iteration j until termination condi- solved.
tion is met; Evolutionary computing (EC) can be basically classi-
Perform selection; fied into four classes: evolutionary strategy (ES), evolu-
Perform crossover with a probability; tionary programming (EP), genetic algorithm (GA) and
Perform mutation with a probability; genetic programming (GP). Recent trends have included
Determine the new population xi and fitness swarm intelligence (SI) with evolutionary computation
function fi ; due to similar methods of evaluation in the two classes
Replace if new members are better in fitness [46]. Evolutionary strategies are specific techniques de-
Else signed for solving optimization problems. Evolutionary
Retain the same members and proceed with programming attempts to develop artificial intelligence
iterations (AI) by predicting possible conditions of a defined sit-
A general design of an evolutionary algorithm is ex- uation from the experience learned from previous in-
plained in Fig. 13.3. The initial operand selection fol- stances though machine learning (ML). Genetic algo-
lowed by fitness evaluation and population reproduc- rithm is a well defined, evolving optimization method.
tion forms the basic process of EA. The iteration con- Evolutionary computing utilizes a community oriented
tinues until termination. Optionally, EA can perform search with disruptive mechanism such as crossover and
adaptation of algorithm or local search. By choosing denotation procedure such as reproduction. Evolution-
various options, new evolutionary methods can be de- ary procedures are well known for their ability to inte-
rived. grate theoretical and computational model, to apply a
wide range domain, to provide parallel convergence, to
involve in self-development, and to provide true global
optimum solutions.
Hybrid evolutionary algorithms (HEA) are successful
methodologies due to their robustness in noisy envi-
ronments, ability to handle huge data, and capability
to produce reliable results [47].
Exhaustive search algorithm or heuristic [48] is an
algorithm, which is non-evolutionary and gives an
approximate solution in less than polynomial time.
Heuristics move from one point to another in the in-
dex space using some transition rules. The value of the
objective function is calculated for each point, and the
transition takes place to optimize the function. The
heuristic used in this chapter is a bounded search op-
timization heuristic. The vectors are chosen by using
constraints on the search (solution) space. The vectors
which give the minimum cost function are optimal.
This method of heuristics gives the vectors in a single
iteration if the search space is of low order.
FIG. 13.3 General framework of evolutionary computation Genetic algorithm (GA) [49] is the most widely used
[46]. evolutionary procedure which stands on the concept
of natural selection since its development in 1975 by
Evolvable hardware (EH) systems are configurable John Holland [50,51]. Probable solution of a genetically
hardware systems which are able to adapt to different designed optimization problem is coded as a genetic
CHAPTER 13 Evolutionary Mapping Techniques for Systolic Computing System 211
strand. There exists a one-to-one mapping between the where each frog from the ordered list is allotted a group.
result points and genetic representations. The possible The entire population is divided into m memeplexes,
solutions are available as a set of populations that are each containing q frogs [32]. Within each memeplex,
allowed to randomly combine and modify until some the frogs with the best and the worst fitnesses are iden-
termination condition like maximum number of itera- tified as Xb and Xw , respectively. Also, the frog with the
tions or a satisfactory fitness function value is reached. global best fitness is identified as Xg . Then, a process
The three main operators are reproduction selection, similar to PSO is applied to improve only the frog with
crossover, and mutation. A wide variation of genetic the worst fitness (not all frogs) in each cycle [45]. Ac-
algorithm exists, tailoring the needs of various appli- cordingly, the position of the frog with the worst fitness
cations [52,53]. Steady state GA is the commonly used is adjusted as follows:
method, where the offspring from crossover replaces the
worst fit candidate only if it is better than the candidates Di = Rand()(Xb − Xw ), (13.1)
already in the population. The main parameters used in =X +D ,
Xw w i (13.2)
the GA procedure are population size, number of gen-
erations, crossover and mutation rates.
where Rand() is a random number between 0 and 1
Memetic algorithm (MA) [54] is designed based on
and Dmax is the maximum allowed change in a frog’s
the inspiration from Dawkins’ notion of a meme. It
position and Di varies within twice of Dmax . When the
belongs to evolutionary computation class with an op-
fitness is improved, the worst frog is replaced. Other-
tional local search process [46]. MAs are similar to GA,
wise, the calculations are repeated with respect to the
which performs on a cluster of elements (memes). The
global frog. The main parameters of SFL are the number
first step is to select a group of memes (candidate so-
of frogs P , number of memeplexes, number of gener-
lutions) and allow them to evolve towards the optimal
ation for each memeplex before shuffling, number of
solution by crossover and mutation along with personal
shuffling iterations, and the maximum step size.
experience of the memes. By adding the local improve-
ment factor along with information variation, MA con-
verges faster compared to GA. Memetic algorithm im-
13.4 SWARM INTELLIGENCE (SI)
proves the population-based global search method by
adding a local search technique. Any advantageous in- Swarm intelligence (SI) is a bio-inspired collective be-
formation available in a local area can be used to guide havior of organisms interacting among themselves and
the search to a better solution. The stopping condition with their environment to achieve a target. SI proves
can be the total number of iterations before reaching how a group of similar objects can work together and
a target, the number of iterations for which the target produce amazing results in terms of creativity and ef-
value has been stable, or a satisfactory target value [45]. ficiency. Stigmergy is the change of behavioral pat-
Shuffled frog leaping algorithm (SFL) combines the tern of a group member due to the influence of other
essence of the group based MAs and the social behavior- group members. Stigmergy forms the basic principle be-
based PSO algorithms [61]. In the SFL, the popula- hind developing swarm intelligence and computational
tion consists of a set of frogs (solutions) that is parti- methods. The common algorithms that come under
tioned into subsets referred to as memeplexes similar swarm intelligence are particle swarm optimization, ant
to memes in MA. Local search is performed by different colony optimization, bacterial foraging optimization,
societies of frogs that are considered as different meme- artificial bee colony optimization, pigeon inspired op-
plexes. Each memeplex includes individual frogs with timization, and many more.
unique ideas to reach the target (food). The ideas of Particle swarm optimization (PSO) [55] is a
frogs in a memeplex can be influenced by other frogs trajectory-evolving biomimic algorithm which imitates
in the same group. The ideas can be evolved and passed a flock of birds (solutions) trying to reach the des-
through other memeplexes through a shuffling process. tination. The birds move by comparing their current
The local search and shuffling processes continue un- position with that of the bird which is leading in the
til defined convergence criteria are satisfied [62,63]. An direction of destination. When the position of a bird is
initial population of P frogs is created randomly. For trailing compared to the best positioned bird, it acceler-
N-dimensional problems (N variables), frog i is repre- ates towards that bird with a velocity, hence updating its
sented as Xi = (xi1 , xi2 , . . ., xiN ). The frogs are arranged best position. The algorithm considers the personal and
in an order based on their fitness. Similar to a roulette social experience of the birds for reaching the target. As
wheel selection of GA, the frogs are sorted into m groups the experience of individual birds and birds as a flock
212 Deep Learning and Parallel Computing Environment for Bioengineering Systems
is utilized for optimizing the direction angle of reach- Biological behavior of ants [57] searching for food
ing the target, the result is obtained swiftly. The particle through the shortest path and ants selecting the path
(bird) is denoted by i. Three parameters of each particle initially followed by a predecessor form the inspira-
i are monitored: tion of generating ant colony optimization (ACO) [58].
• Current position Xi = (Xi1 , Xi2 , . . . , XiS ); Ant colony optimization [59] is a population based op-
• The best of the previous positions assumed by the timization method. The ants, when following a path
particle pi = (pi1 , pi2 , . . . , piS ); to find food, leave a chemical named pheromone to
• Velocity of flight Vi = (Vi1 , Vi2 , . . . , ViS ). be detected by the following ants. If more ants follow
As the particles move, new position Xi and new ve- the same path, then as an effect of positive feedback,
locity Vi are acquired by each article with the goal of the concentration of pheromone increases, thereby in-
reaching the destination: dicating the most traveled path. In implementing ACO,
each iteration consists of the same number of ants
but the representation or the path chosen by the ants
Vi = ω × Vi + c1 × Rand()(Pi − Xi )
(13.3) vary among cycles. Each ant has S representations, and
+ c2 × Rand()(Pg − Xi ), each representation includes a set of path options and
Xi = Xi + Vi , (13.4) pheromone concentrations. To ensure good solutions,
the pheromone concentration associated with each path
where is altered by every iteration. The stagnation of results
at local optima is avoided by using pheromone evap-
Vmax ≥ Vi ≥ −Vmax . oration rate constant that reduces the concentration
with time as there is a half-life of a few minutes for
The improvement in velocity as given by Eq. (13.3) is the chemical [60]. The pheromone concentration in-
formulated using three terms: the first corresponds to creases with fitness. For minimization problems, the
the current velocity scaled by an inertia weight, which fitness is inversely proportional to pheromone concen-
signifies the tendency of the particle to cling to the ac- tration. After updating the pheromone concentration,
tual velocity, the second corresponds to the cognition the next iteration starts by selecting the path for ants.
or personal experience of the particle of attaining a po- The main parameters involved in ACO are the number
sition Xi compared to its own best position Pi scaled of ants m, number of iterations N, exponents α and
by a constant c1 , and the third corresponds to the social β, which control the importance of pheromone con-
behavior of the particle comparing its current position centration by a factor which indicates the goodness of
Xi to the global best position Pg scaled by a constant c2 . a path for the ant to select, pheromone evaporation
rate r; and pheromone reward factor R, indicating the
The constants c1 and c2 are the learning factors, usually
tendency to retain the pheromone concentration. The
set to 2 [45]. The velocity is allowed a maximum value
pheromone concentration associated with each possi-
Vmax in order to have a hold on the range of results.
ble route is given by τi,j :
The main parameters used in the PSO technique are
the population size, number of iterations (generations),
τi,j (t) = ρτi,j (t − 1) + δτi,j (t), t = 1, 2, 3, . . . , T ,
maximum change of a particle velocity Vmax and iner-
(13.5)
tia weight ω. Generally, PSO is used for unconstrained
problems where the variables have no limits, similar to where T is the number of iterations. The change in
the sky limit reached by the birds. PSO is known for its pheromone concentration is determined by the path
simple implementation, derivative-free design, parallel chosen by ant k. An ant chooses a path with a spe-
processing, efficient global search and few parameters cific probability decided by parameters like α, β and τ .
to be monitored. Ant colony optimization can be applied to discrete op-
The hybrid PSO–GA algorithm has been a popular timization and graph problems.
enhancement to EA. Crossover on global best value, The comparative design of evolutionary algorithms
mutation on stagnant best position (personal value of has been pictorially depicted in Fig. 13.4.
position for a particle) and initial population of PSO Each algorithm is unique in its representation,
derived from GA are some hybridized solutions for HEA method of selection and evolutionary concept to reach
[56]. Hybrid algorithms are out of scope for this chap- the next generation. The following sections list the ben-
ter and have been mentioned for the sake of complete- efits of using evolutionary algorithms for systolic array
ness. design.
CHAPTER 13 Evolutionary Mapping Techniques for Systolic Computing System 213
ology maps an N-dimensional dependence graph to a s1
s= .
lower-dimensional systolic architecture using a trans- s2
formation through space time mapping. Systolic archi-
tecture is designed by using linear mapping techniques These matrices have certain constraints on the space
on a regular dependency graph. An edge in dependency based on the projection vector or iteration vector d. Two
graphs represents precedence constraints in signal flow nodes that are displaced by d are processed by the same
graph direction and any node in the DG represents the processor:
presence of an edge in the same direction at all nodes in
d1
the DG [64]. See Fig. 13.5. d= .
d2
Index vector I represents any point in the search
space S:
i
I= .
j
The hardware utilization efficiency (HUE) is calcu-
lated from the scheduling matrix s and the iteration
vector d. The most efficient architecture would result in
utilization efficiency near to 1:
1
HUE = . (13.6)
|s T d|
p T d = 0, (13.7)
s T d = 0. (13.8)
Texture strength can be derived from the easiness of The summation term of the F8 function (Eq. (13.23))
defining a texture from its region. Textures which pro- has a parabolic shape while the cosine function in the
vide a high degree of visual satisfaction are considered product term creates waves over the parabolic surface.
strong. Classification of texture is possible when the ba- These waves create local optima over the solution space
sic patterns are of considerable size and there is suffi- [74]. The F8 function can be scaled to any number of
cient difference in average intensities: variables N. The values of each variable are constrained
to a range from −512 to 511. The global optimum (min-
G
G
G imum) solution for this function is known to be zero
pstr = [ (pi + pj )(i − j )2 ])]/[ω + s(i)], when all N variables equal zero. See Table 13.1.
i=0 j =0 i=0 Twenty trial runs were performed for each problem.
pi = 0, pj = 0. The performance of the different algorithms was com-
(13.22) pared using three criteria: (i) the percentage of suc-
cess, as represented by the number of trials required for
the objective function to reach its known target value;
Texture strength can be related with contrast and
(ii) the average value of the solution obtained in all tri-
coarseness, it gives the boldness of texture patterns.
als; (iii) the processing time to reach the optimum target
The above mentioned five parameters of texture anal-
value. The processing time, and not the number of gen-
ysis are applied at different levels of image processing
eration cycles, was used to measure the speed of each
[69,70]. In many machine learning and image process-
algorithm, because the number of generations in each
ing algorithms, assumptions are made so that the local
evolutionary cycle is different from one algorithm to an-
regions have uniform intensities. The main purpose of
other. See Table 13.2.
such parameters is to sort image data into more readily
The GA had performed poorly compared to other
interpretable information, which is used in a wide range
algorithms and reached a target in 50% of trials and
of applications such as industrial inspection, image re-
also the percentage of success decreased with increase
trieval, medical imaging and remote sensing. in the number of variables. With increase in the num-
Texture analysis methods have been utilized in a vari- ber of variables, the processing time also increased. GA
ety of application domains such as materials inspection was able to get solution accuracy closer to the opti-
[71], biomedical signal analysis [72], tissue characteri- mum for F8 problem. It is also evident that the mean
zation [73], and texture segmentation. solution is high for GA with the number of variables
being more than 20, indicating the wandering nature of
the algorithm. Memetic algorithm had performed better
13.7 RESULTS AND DISCUSSION than GA in terms of success rate and processing time.
This section is organized as follows. The performance of Variation from the optimum value was minimal among
EA is evaluated for an optimization problem and com- the trails. More variation in processing time has been
pared with its performance based on mean solution and noticed for PSO as the social behavior influences the
percentage of success. As a proposed work, systolic array results, and time for reaching the target is inappropri-
mapping of texture analysis is performed through evo- ately high. The success rate of SFL was similar to GA and
lutionary algorithms, and the detailed discussion of the PSO; PSO and SFL have been found to outperform other
results are presented in the later half of the section. algorithms in terms of solution quality mainly due to
CHAPTER 13 Evolutionary Mapping Techniques for Systolic Computing System 217
TABLE 13.1
Parameters for evolutionary algorithms.
Algorithm Parameters
Genetic algorithm Crossover probability = 0.8
Genetic algorithm Mutation probability = 0.08
Genetic algorithm Population size – 200 to 500
Genetic algorithm Stopping condition – no improvement in objective function for 10 generations or target
value reaching
Memetic algorithm Population size of 100
Particle swarm optimization Maximum velocity was set as 20
Particle swarm optimization Numbers of particles is 40 and that of generations is 10,000
Particle swarm optimization Inertia factor is a linear function, decreasing with the increasing number of generations
Ant colony optimization Suited to discrete problems
Ant colony optimization 30 ants and 100 iterations
Ant colony optimization α = 0.5; β = 2.5
Ant colony optimization ρ = 0.4; R = 10
Shuffled frog leaping algorithm population of 200 frogs
Shuffled frog leaping algorithm 20 memeplexes
Shuffled frog leaping algorithm 10 iterations per memeplex
TABLE 13.2
Results of evolutionary algorithms applied to F8 optimization problem [45].
Comparison criteria Algorithm N = 10 N = 20 N = 50 N = 100
Percentage of success GA 50 30 10 0
Percentage of success MA 90 100 100 100
Percentage of success PSO 30 80 100 100
Percentage of success SFL 50 7 90 100
Mean solution GA 0.06 0.097 0.161 0.432
Mean solution MA 0.014 0.013 0.011 0.011
Mean solution PSO 0.093 0.081 0.011 0.011
Mean solution SFL 0.08 0.063 0.049 0.019
the inertia factor. For MA, PSO and SFL, the difference responding to the number of grey levels. GTDM for tex-
between mean solution values reduces with increasing ture analysis is an iterative process of dimension 4 and
number of variables. A random variation in trend is requires extensive procedures for obtaining the proces-
noticed with SFL, where N = 20 results in a very low sor and scheduling matrices.
success rate. See Fig. 13.6. The GTDM matrix defines the grey tone differences
of all pixels of a particular intensity value. Five pa-
13.7.2 Texture Analysis rameters such as coarseness, contrast, complexity, busy-
Texture analysis through grey tone difference matrix has ness and texture strength can be derived from the
been sampled out for implementation in systolic arrays. matrix. Comparing the results in Table 13.3, coarse-
For a 300 × 300 image, the GTDM matrix is a inten- ness is high in image (A) and lowest in image (C).
sity based column vector (GTDM) of length 255 cor- Image (D) is high in contrast and can be clearly
218 Deep Learning and Parallel Computing Environment for Bioengineering Systems
FIG. 13.6 Comparison of evolutionary algorithms with different variables for F8 optimization.
TABLE 13.3
Parameters from GTDM of sample image.
Parameters Image (A) Image (B) Image (C) Image (D)
Coarseness 3.35 × 10−6 3.5 × 10−6 8.4 × 10−8 1.42 × 10−7
Contrast 8.95 × 10−3 6.52 × 10−3 1.3 × 10−2 1.48 × 10−2
Busyness 21.07 24.84 240 93.52
Complexity 1.80 × 107 4.9 × 109 4.9 × 107 4.7 × 109
Texture strength 7.68 × 10−2 7.83 × 10−2 6.3 × 10−2 5.53 × 10−2
Computation time (in s) 0.65 0.53 0.49 0.56
⎛ ⎞
viewed from the sample image in Fig. 13.7. Busyness 0 0 0 −1 0 0
⎜ 0 −1 −1 0 0 −1 ⎟
of texture is more in image (C) with very fine de- ⎜
D=⎝ ⎟.
tails in pixels between the structures of the building. 1 0 0 1 1 0 ⎠
The computational time for all the images was sim- 0 0 1 0 0 0
ilar and close to half a minute run time. The com-
plexity was more in images (B) and (D), resulting in The exhaustive search approach when applied to tex-
a high value of the parameter. The texture strength ture analysis for implementing systolic arrays uses the
had almost of same dynamic range for all the sam- predefined projection vector d, dependence matrix D,
ples. constraints as given in Eqs. (13.7)–(13.12), and cost
Mapping techniques such as exhaustive search tech- function as mentioned in Eq. (13.13). The cost func-
tion is altered as it requires the number of processors in
niques, particle swarm optimization and ant colony
the x- and y-direction.
optimization are employed for systolic array map-
Particle swarm optimization implemented for map-
ping [75]. The mapping procedure is done in MAT-
ping systolic arrays uses 5 particles (solutions) as the
LAB tool. The processor used for all the simulation
total set of solutions. The inertia weight ω is continu-
had a clock speed of 1.9 × 109 Hz. The systolic ar-
ously updated using Eq. (13.24) as a means of including
ray used for mapping in this work was arranged in
the reluctance of the particles to change the direction of
two dimensions to increase the pace of mapping de-
their current movement:
cision. The projection vector d and dependence matrix
D designed for GTDM systolic mapping is given below, 0.8 × (Number of iterations − Current iteration)
ω = 0.4 + .
and they were maintained throughout the proposed de- Number of iterations − 1
sign:
⎛ ⎞ (13.24)
1 0
⎜ 0 1 ⎟ The processing time can be defined as the time taken
⎜
d =⎝ ⎟,
0 0 ⎠ for entire procedure to run and end with the resulting
0 0 numbers of processors and cycles. The average value is
CHAPTER 13 Evolutionary Mapping Techniques for Systolic Computing System 219
FIG. 13.7 Sample images for grey tone difference matrix generation.
a measure of how much the output gets varied between fies the diverse exploring nature of the algorithm. See
the extremes around the optimal value. Since the itera- Table 13.4.
tion count is fixed, 50 trials are run for PSO. The initial The results can be interpreted for the minimum
population, position and velocity are random, and sub- number of processors and cycles. It can be derived that
sequent iterations get varied to improve the solution. ACO produces the minimum number of processors and
The results reveal that the processing time is approxi- cycles. The average value of the solution is 3.98 and the
mately constant, as there is no information exchange final values of processor space and scheduling matrices
operation as opposed to GA and MA. The parameters are given below:
for assessing systolic array implementation are ⎛ ⎞ ⎛ ⎞
• Processors in the x- and y-direction; 0 3 0 0 0 3 0 0
⎜ 0 4 0 0 ⎟ ⎜ 0 2 0 0 ⎟
• Number of cycles for one iteration in the given ar- px = ⎜
⎝ 3
⎟, p = ⎜ ⎟,
rangement of processors; 0 0 0 ⎠ y ⎝ 3 0 0 0 ⎠
• Cost function as mentioned in Eq. (13.13); 1 2 0 0 3 2 0 0
⎛ ⎞
• Number of iterations for termination of program; 4 1 4 3
• Total number of cycles required for the array to com- ⎜ 3 2 4 2 ⎟
plete processing of a sample image; S=⎜
⎝ 1 1 1 1 ⎠.
⎟
• Time taken for simulation of the algorithm; 4 4 4 4
• Average value of the solution denoting the traversal
of the algorithm in reaching the final solution. If the The edge mapping is formed from the derived matrices,
average value varies from the cost function, it signi- resulting in a dependence matrix D. From the obtained
220 Deep Learning and Parallel Computing Environment for Bioengineering Systems
using modified adaptive rood pattern search algorithm, 21. A.L. Fisher, H. Kung, L.M. Monier, Y. Dohi, Architecture
Circuits, Systems, and Signal Processing (2018) 1–20. of the PSC-A programmable systolic chip, ACM SIGARCH
3. M.J. Flynn, Some computer organizations and their effec- Computer Architecture News 11 (3) (1983) 48–53.
tiveness, IEEE Transactions on Computers 100 (9) (1972) 22. H. Kung, M.S. Lam, Fault-Tolerance and Two-Level Pipelin-
948–960. ing in VLSI Systolic Arrays, Tech. rep., Carnegie-Mellong
4. H. Kung, C.E. Leiserson, Systolic Arrays (for VLSI), Sparse Univ., Pittsburgh, PA, Dept. of Computer Science, 1983.
Matrix Proceedings 1978, vol. 1, Society for Industrial and 23. T. Gross, D.R. O’Hallaron, iWarp: Anatomy of a Parallel
Applied Mathematics, 1979, pp. 256–282. Computing System, MIT Press, 1998.
5. J. Mora, E. de la Torre, Accelerating the evolution of a sys- 24. H.H.S. Sidhu, Design and implementation modified Booth
tolic array-based evolvable hardware system, Microproces- algorithm and systolic multiplier using FPGA, Interna-
sors and Microsystems 56 (2018) 144–156. tional Journal of Engineering Research & Technology
6. V. Osuna-Enciso, E. Cuevas, D. Oliva, H. Sossa, M. Pérez- (IJERT) 2.
Cisneros, A bio-inspired evolutionary algorithm: allostatic
25. C.-P. Lu, Ai, native supercomputing and the revival of
optimisation, International Journal of Bio-Inspired Com-
Moore’s law, APSIPA Transactions on Signal and Informa-
putation 8 (3) (2016) 154–169.
tion Processing 6.
7. R. Davis, D. Thomas, Systolic array chip matches the
26. K.T. Johnson, A.R. Hurson, B. Shirazi, General-purpose sys-
pace of high-speed processing, Electronic Design 32 (22)
tolic arrays, Computer 26 (11) (1993) 20–31.
(1984) 207.
8. A. Faraz, F.U.H. Zeya, M. Kaleem, A survey of paradigms 27. R. Urquhart, D. Wood, Systolic matrix and vector multipli-
for building and designing parallel computing machines, cation methods for signal processing, in: IEE Proceedings F
Computer Science & Engineering 5 (1) (2015) 1. (Communications, Radar and Signal Processing), vol. 131,
9. P. Kacsuk, M. Tudruj, Extending grade towards explicit pro- IET, 1984, pp. 623–631.
cess synchronization in parallel programs, Computers and 28. H. Kung, P.L. Lehman, Systolic (VLSI) arrays for relational
Artificial Intelligence 17 (5) (1998) 507–516. database operations, in: Proceedings of the 1980 ACM SIG-
10. J. Speiser, H. Whitehouse, A review of signal processing MOD International Conference on Management of Data,
with systolic arrays, in: Real-Time Signal Processing VI, ACM, 1980, pp. 105–116.
vol. 431, International Society for Optics and Photonics, 29. W.M. Gentleman, H. Kung, Matrix triangularization by sys-
1983, pp. 2–7. tolic arrays, in: Real-Time Signal Processing IV, vol. 298,
11. S.-Y. Kung, B. Rao, et al., Wavefront array processor: lan- International Society for Optics and Photonics, 1982,
guage, architecture, and applications, IEEE Transactions on pp. 19–27.
Computers 100 (11) (1982) 1054–1066. 30. S. Subathradevi, C. Vennila, Systolic array multiplier for
12. P.A. Laplante, S.J. Ovaska, Real-Time Systems Design and augmenting data center networks communication link,
Analysis: Tools for the Practitioner, John Wiley and Sons, Cluster Computing (2018) 1–11.
2011. 31. D.I. Moldovan, On the design of algorithms for VLSI
13. S.-Y. Kung, P.S. Lewis, S.-C. Lo, Performance analysis and systolic arrays, Proceedings of the IEEE 71 (1) (1983)
optimization of vlsi dataflow arrays, Journal of Parallel and 113–120.
Distributed Computing 4 (6) (1987) 592–618. 32. R.J. Lipton, D. Lopresti, A systolic array for rapid string
14. P.S. Kumar, Z. David, Neural Networks and Systolic Array comparison, in: Proceedings of the Chapel Hill Conference
Design, World Scientific, 2002. on VLSI, 1985, pp. 363–376.
15. J. Fortes, K. Fu, B. Wah, Systematic approaches to the design
33. H. Yang, Y. Zhu, J. Liu, End-to-end learning of energy-
of algorithmically specified systolic arrays, in: Acoustics,
constrained deep neural networks, arXiv preprint, arXiv:
Speech, and Signal Processing, IEEE International Confer-
1806.04321.
ence on ICASSP’85, vol. 10, IEEE, 1985, pp. 300–303.
34. H.-T. Kung, Special-purpose devices for signal and im-
16. R.P. Brent, H. Kung, F.T. Luk, Some Linear-Time Algorithms
age processing: an opportunity in very large scale integra-
for Systolic Arrays, Tech. rep., Cornell University, 1983.
17. I.E. Sutherland, C.A. Mead, Microelectronics and computer tion (VLSI), in: Real-Time Signal Processing III, vol. 241,
science, Scientific American 237 (3) (1977) 210–229. International Society for Optics and Photonics, 1980,
18. C. Thiripurasundari, V. Sumathy, C. Thiruvengadam, An pp. 76–85.
FPGA implementation of novel smart antenna algorithm 35. A.L. Fisher, Systolic algorithms for running order statistics
in tracking systems for smart cities, Computers & Electrical in signal and image processing, in: VLSI Systems and Com-
Engineering 65 (2018) 59–66. putations, Springer, 1981, pp. 265–272.
19. H. Kung, S. Song, A Systolic 2-d Convolution Chip, Tech. 36. H. Kung, J.A. Webb, Mapping image processing operations
rep., Carnegie-Mellon Univ., Pittsburgh, PA, Dept. of Com- onto a linear systolic machine, Distributed Computing
puter Science, 1981. 1 (4) (1986) 246–257.
20. B.K. Meher, P.K. Meher, Analysis of systolic penalties and 37. R. Mukherjee, P. Saha, I. Chakrabarti, P.K. Dutta, A.K. Ray,
design of efficient digit-level systolic-like multiplier for bi- Fast adaptive motion estimation algorithm and its efficient
nary extension fields, Circuits, Systems, and Signal Process- vlsi system for high definition videos, Expert Systems with
ing (2018) 1–17. Applications 101 (2018) 159–175.
222 Deep Learning and Parallel Computing Environment for Bioengineering Systems
38. T. Komarek, P. Pirsch, Array architectures for block match- 55. E. García-Gonzalo, J. Fernández-Martínez, A brief histor-
ing algorithms, IEEE Transactions on Circuits and Systems ical review of particle swarm optimization (PSO), Jour-
36 (10) (1989) 1301–1308. nal of Bioinformatics and Intelligent Control 1 (1) (2012)
39. S. Divakara, S. Patilkulkarni, C.P. Raj, High speed mod- 3–16.
ular systolic array-based DTCWT with parallel processing 56. H. Garg, A hybrid pso-ga algorithm for constrained opti-
architecture for 2d image transformation on FPGA, Inter- mization problems, Applied Mathematics and Computa-
national Journal of Wavelets, Multiresolution and Infor- tion 274 (2016) 292–305.
mation Processing 15 (05) (2017) 1750047. 57. R. Alvarez, C. Rahmann, R. Palma-Behnke, P. Estévez, F.
40. P. Jawandhiya, Hardware design for machine learning, In- Valencia, Ant colony optimization algorithm for the mul-
ternational Journal of Artificial Intelligence and Applica- tiyear transmission network expansion planning, in: 2018
tions (IJAIA) 9 (1) (2018). IEEE Congress on Evolutionary Computation (CEC), IEEE,
41. K.K. Parhi, VLSI Digital Signal Processing Systems: Design 2018, pp. 1–8.
and Implementation, John Wiley & Sons, 2007. 58. J.M. Pasteels, J.-L. Deneubourg, S. Goss, Self-organization
42. S. Kumar, E.A. Chauhan, A survey on image feature se- mechanisms in ant societies. I. Trail recruitment to newly
lection techniques, International Journal of Computer Sci- discovered food sources, in: Jacques M. Pasteels, Jean-Louis
ence and Information Technologies (IJCSIT) 5 (5) (2014) Deneubourg (Eds.), From Individual to Collective Behav-
6449–6452. ior in Social Insects: les Treilles Workshop, Birkhauser,
43. M. Yoshida, T. Hinkley, S. Tsuda, Y.M. Abul-Haija, R.T. 1987.
McBurney, V. Kulikov, J.S. Mathieson, S.G. Reyes, M.D. Cas- 59. M. Dorigo, M. Birattari, T. Stutzle, Artificial ants as a com-
tro, L. Cronin, Using evolutionary algorithms and machine putational intelligence technique, IEEE Computational In-
learning to explore sequence space for the discovery of an- telligence Magazine 1 (2006) 28–39.
timicrobial peptides, Chem 4 (3) (2018) 533–543. 60. M. Fera, F. Fruggiero, A. Lambiase, G. Martino, M.E. Nenni,
44. N. Pillay, R. Qu, D. Srinivasan, B. Hammer, K. Sorensen, Production scheduling approaches for operations manage-
Automated design of machine learning and search algo- ment, in: Operations Management, InTech, 2013.
rithms [guest editorial], IEEE Computational Intelligence 61. E. Afzalan, M. Taghikhani, M. Sedighizadeh, Optimal
Magazine 13 (2) (2018) 16–17. placement and sizing of DG in radial distribution networks
45. E. Elbeltagi, T. Hegazy, D. Grierson, Comparison among using SFLA, International Journal of Energy Engineering
five evolutionary-based optimization algorithms, Ad- 2 (3) (2012) 73–77.
vanced Engineering Informatics 19 (1) (2005) 43–53. 62. M.M. Eusuff, K.E. Lansey, Optimization of water distribu-
46. J. Zhang, Z.-h. Zhan, Y. Lin, N. Chen, Y.-j. Gong, J.-h. tion network design using the shuffled frog leaping algo-
Zhong, H.S. Chung, Y. Li, Y.-h. Shi, Evolutionary compu- rithm, Journal of Water Resources Planning and Manage-
tation meets machine learning: a survey, IEEE Computa- ment 129 (3) (2003) 210–225.
tional Intelligence Magazine 6 (4) (2011) 68–75. 63. M. Eusuff, K. Lansey, F. Pasha, Shuffled frog-leaping algo-
47. M.M. Drugan, Reinforcement learning versus evolutionary rithm: a memetic meta-heuristic for discrete optimization,
computation: a survey on hybrid algorithms, Swarm and Engineering Optimization 38 (2) (2006) 129–154.
Evolutionary Computation 44 (2019) 228–246. 64. B. Sundari, Design space exploration of deeply nested loop
48. F. Glover, Heuristics for integer programming using surro- 2d filtering and 6 level FSBM algorithm mapped onto sys-
gate constraints, Decision Sciences 8 (1) (1977) 156–166. tolic array, VLSI Design 2012 (2012) 15.
49. M. Pedemonte, F. Luna, E. Alba, Systolic genetic search, a 65. L. Whitley, A. Howe, S. Rana, J. Watson, L. Barbulescu,
systolic computing-based metaheuristic, Soft Computing Comparing heuristic search methods and genetic algo-
19 (7) (2015) 1779–1801. rithms for warehouse scheduling, in: Systems, Man, and
50. L. Fogel, A. Owens, M. Walsh, Adaptation in Natural and Cybernetics, 1998. 1998 IEEE International Conference
Artificial Systems, 1975. on, vol. 3, IEEE, 1998, pp. 2430–2435.
51. J.H. Holland, Genetic algorithms and adaptation, in: 66. N. Ling, M.A. Bayoumi, Systematic algorithm mapping for
Adaptive Control of Ill-Defined Systems, Springer, 1984, multidimensional systolic arrays, Journal of Parallel and
pp. 317–333. Distributed Computing 7 (2) (1989) 368–382.
52. Q. Wang, Using genetic algorithms to optimise model 67. C.M. Fiduccia, R.M. Mattheyses, A linear-time heuristic
parameters, Environmental Modelling & Software 12 (1) for improving network partitions, in: Papers on Twenty-
(1997) 27–34. Five Years of Electronic Design Automation, ACM, 1988,
53. R. Tyagi, S.K. Gupta, A survey on scheduling algorithms for pp. 241–247.
parallel and distributed systems, in: Silicon Photonics & 68. A. Rosenfeld, E.B. Troy, Visual Texture Analysis, Tech. rep.,
High Performance Computing, Springer, 2018, pp. 51–64. Maryland Univ., College Park (USA), Computer Science
54. P. Garg, A comparison between memetic algorithm and Center, 1970.
genetic algorithm for the cryptanalysis of simplified data 69. M. Tuceryan, A.K. Jain, Texture analysis, in: Handbook of
encryption standard algorithm, arXiv preprint, arXiv:1004. Pattern Recognition and Computer Vision, World Scien-
0574. tific, 1993, pp. 235–276.
CHAPTER 13 Evolutionary Mapping Techniques for Systolic Computing System 223
70. M. Amadasun, R. King, Textural features corresponding to 75. J.W. Haefner, Parallel computers and individual-based
textural properties, IEEE Transactions on Systems, Man and models: an overview, in: Individual-Based Models and
Cybernetics 19 (5) (1989) 1264–1274. Approaches in Ecology, Chapman and Hall/CRC, 2018,
71. J.S. Weszka, A. Rosenfeld, An application of texture analysis pp. 126–164.
to materials inspection, Pattern Recognition 8 (4) (1976) 76. M. Patel, P. McCabe, N. Ranganathan, SIBA: a VLSI systolic
195–200. array chip for image processing, in: Pattern Recognition,
1992. Vol. IV. Conference D: Architectures for Vision and
72. P.M. Szczypiński, A. Klepaczko, Mazda—a framework for
Pattern Recognition, Proceedings, 11th IAPR International
biomedical image texture analysis and data exploration, in:
Conference on, IEEE, 1992, pp. 15–18.
Biomedical Texture Analysis, Elsevier, 2018, pp. 315–347.
77. R.W. Means, H.J. Sklar, Systolic array image processing sys-
73. R. Lerski, K. Straughan, L. Schad, D. Boyce, S. Blüml, I. tem, US Patent 5,138,695, Aug. 11, 1992.
Zuna VIII, MR image texture analysis—an approach to tis- 78. F. Dressler, O.B. Akan, Bio-inspired networking: from the-
sue characterization, Magnetic Resonance Imaging 11 (6) ory to practice, IEEE Communications Magazine 48 (11)
(1993) 873–887. (2010) 176–183.
74. D. Whitley, R. Beveridge, C. Graves, K. Mathias, Test driv- 79. S. Thakoor, Bio-inspired engineering of exploration sys-
ing three 1995 genetic algorithms: new test functions and tems, Journal of Space Mission Architecture 2 (1) (2000)
geometric matching, Journal of Heuristics 1 (1) (1995) 49–79.
77–104.
CHAPTER 14
14.1 INTRODUCTION pend upon the movement of eyes, facial muscle move-
Autism spectral disorder (ASD) stands to be a promis- ments or depend upon the various means to create rela-
ing field for research today as there is no single standard tionships among the different shapes of the face or on
diagnostic measure for ASD. The clinical trials to iden- the variety of emotional characteristics. However, this
tify the autistic nature and the cognitive development information can be gained from the varied sequence
are quite time consuming. This is because the analysis is of images which depicts the drive of emotions. Hence
based on social interaction, verbal and non-verbal com- the system, which classifies the emotions from the im-
munication and imitation of sameness, which involves ages, includes a series of algorithms that combine the
series of screenings over time. Neurodevelopment stud- techniques of feature extraction and classification. One
ies suggest that facial expressions and emotions stand such algorithm of identifying a human face in the image
as a key indicator in analyzing the state of a human’s re- space is a Viola–Jones algorithm which possesses im-
sponse. Thus facial expression is concentrated on differ- proved facial detection accuracy. The detected face can
entiating the neurodevelopmental disorders among the be fed as input to any N classifiers that classify the fa-
ASD positive and normally developing children. Such cial features and support the identification of a facial
ASD children will find it hard to identify the object expression.
influencing them, through their poor gaze factor, the In recent decades, many artificial intelligence tech-
emotion perceived and the emotion to respond. niques such as deep neural networks (DNNs) could be
Facial expression has a significant impact in the com- involved to improve the learning factor of the machine
munication factor and is identified as a basic interest in a very granular way. Convolutional neural network
identification and involvement parameter that is to be (CNN) and recurring neural network (RNN) are the
notified. This form of communication through facial
most common DNN techniques that are applied to ana-
expression and emotional sequence has a faster con-
lyze the feature in a very efficient way among image and
vergence than any other non-visual communications.
video input formats.
These emotions could be perceived in many different
In particular, CNNs are a type of artificial neural net-
methods, and studies converge on the major emotions
faced during communication, namely anger, disgust, work technique with the feed-forward network that ex-
sleepiness, happyness, neutrality, sadness and fear. tracts more features than many ad hoc extractors in their
Psychological analysis and researches suggest that order of evolution. This is because the architecture of
typically developing (TD) individuals more rapidly de- CNNs works on the basis of the activity of the neurons
tect emotional expressions than neutral expressions. in the brain and requires a wide range of factors that are
However, children with high functioning autism are learned from a previously fine labeled image data. The
screened from such emotion detection which leaves a CNN technique, when combined with GPU processing,
clue as to the importance of facial expression and emo- gives an eminent and quick way of analyzing the fea-
tional identification in children with ASD. tures. This CNN can be applied for training the images,
With the advent of these facial expressions in real and thus the trained network could be later deployed
time applications, there arises a need for automatic in real-time video analysis for emotion detection. How-
recognition mechanism. These mechanisms for auto- ever, a major drawback with this CNN way of learning
mated recognition of facial expressions normally de- is that a large amount of data input needs to be fed into
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00021-X 225
Copyright © 2019 Elsevier Inc. All rights reserved.
226 Deep Learning and Parallel Computing Environment for Bioengineering Systems
the system so as to classify and label in an accurate clas- compensatory abilities are caused through the neurode-
sification strategy. velopment of the child varied in the age scale. The sen-
Although CNN shows improved facial expression sory nerves react to the state of children and reflect emo-
identification, the algorithm works well on image tion expressed by them. Facial expressions are one of
frames separated from a video input. Recurring neural the primary signals used to detect the impaired com-
network (RNN) is one such algorithms, which over- munication abilities in social environment among the
comes the challenge faced in CNN. The recurrent neural children with high functioning of autism spectral disor-
network is a classical sequential learning model that der [2].
learns the features of an object in a live stream. The The behavioral characteristics and the diagnostic
features are learned from a series of data by integrating probability show differences among various gender
previous inputs that are influenced from the internal children with autism. But the sensory symptoms and
state of the neural network. The chapter focuses on the the basic characteristics of an autistic child remain the
analysis of CNN for image data that leads to more so- same at the early identification level [3]. Hence the
phisticated results when compared to simple machine chapter deals with expression identification without
learning mechanisms. considering the gender details and age. Also, it is impor-
The paper is organized as follows: Sect. 14.2 de- tant to analyze the expression and emotional behaviors
scribes the state-of-the-art carried out related to the in children post the clinical analysis period, as the sub-
work specified in this paper. Section 14.3 elaborates on clinical levels of autism identification in the absence of
the methodology adopted for the implementation, and compensatory ability identification over time may result
Sect. 14.4 provides the acquired meaningful insights in autism in the future [3,13,14,19].
that were made on facial expressions on-line with ASD Such early identification could be intervened through
children. The section also justifies the implementation basic facial emotions that the child processes. On a very
results. Sections 14.5 and 14.6 present the conclusion granular observation, these emotions could be iden-
and future work identified in this research work. tified by a mother when breast-feeding the baby. The
children who are prone to autism do not show proper
interest in viewing a human and that results in the non-
14.2 STATE-OF-THE-ART engaged facial expression [21,25,29]. Such facial expres-
sion gaps could be identified and must be analyzed in
The state-of-the-art in this paper deals with four major an efficient manner so as to boost up the clinical obser-
discussions, namely (i) the motivation and driving force vations and reviews.
for the research, (ii) characteristics of autism, (iii) cur- In an experiment [4], it was observed that the target
rent screening methods, and (iv) existing computer in- object to the stimuli might influence the emotional be-
terventions that could be incorporated in the screening havior of the children, either ASD or TD. Such emotions
mechanism. The study and survey do not give a compar- are to be carefully notified to improve the efficiency of
ison of existing deep learning techniques to identify the the screening results and impacts [5,15,17]. Thus the
autism based insights. This is because, with the exten- expression faced by the children should be analyzed
sive possible study, the expression of a human being is with and without the object intervention. The chapter
identified through deep learning and the ability of iden- initially focuses on exploring such expressions of the
tification of expression by an autistic children were dealt children in a contactless environment, later pertaining
with. But, it does not include the expression shown by to the assumption of facing a camera and human un-
an autistic child [16,20]. This seems to be a major re- der courtesy. Such in-depth analysis through human–
search gap and the chapter focuses on bridging the gap computer interaction could be established on applica-
by identifying the facial expression of an autistic child tion of deep learning techniques to result in more so-
through computer intervention. phisticated results that support the screening technique.
Even though ASD is famed by major disturbances The early screening method initiates the process by
in social communication, there also exit other psychi- face detection and through feature extraction. Viola et
atric conditions that are associated with impairments al. proposed the Viola–Jones algorithm for face detec-
in social skills, communication, high up restriction and tion that belongs to the class of Haar classifiers of fa-
even repetitive behaviors [30,28]. These impairments cial detection and feature identification [10]. The algo-
lead to intellectual disability, specific language impair- rithm undergoes Haar cascade classification, whereas
ment, attention-deficit/hyperactivity disorder, anxiety Jing-Wein Wang et al. suggested an algorithm for fa-
disorders and disruptive behavior disorders [1]. Such cial feature specification that categorized the face into
CHAPTER 14 Multimodal Deep Learning Based Expression Analysis of Children With ASD 227
tained sums. This dissimilarity value is then used to The Haar-like features are easily computed using the
categorize subsections of an image. calculation of integral image values, which is described
For example, consider a database of images repre- in the next section.
senting the human faces. Suppose the major difference
among all the images is found to be color variation in 14.3.1.2 Construction of an Integral Image
the region of eyes and cheeks. Then the adjacent rectan- The input image from the dataset is transformed into an
gular regions considered for Haar-feature selection are integral image, implying the summation of pixel values
the regions that lie corresponding to the eye and cheek in a recognized rectangular piece of image. The summa-
region. tion of the pixel at a location (x, y) is computed as
The Viola–Jones algorithm uses three classes of fea-
tures as shown in Fig. 14.1. The two-rectangle feature ii (x, y) = i(x , y ) (14.1)
is the variation amid the sum of the pixels contained x ≤x,y ≤y
by two rectangular regions. The three-rectangle feature
calculates the sum contained by two outside rectangles where (x, y) is the location and ii(x’, y’) is the integral
subtracted from the sum in a middle rectangle. Lastly, a transformation for original pixel i(x, y).
four-rectangle feature computes the difference between The integral image (xi, yi) corresponds to one sin-
diagonal pairs of rectangles [22,24]. gle location (say, (x1, y1)). The integral image of (x2,
y2) corresponds to the summation of pixels from both
(x1, y1) and (x2, y2). This implies that the summation
of pixels at location (x, y) is the sum of pixels above
and left of (x, y). The conversion of the integral image
continues until all individual rectangular block of input
image is processed. Thus for a particular input image,
the total transformation of integral image is computed
as
i (x, y) = ii (Z) + ii (W ) − ii (X) − ii(Y )
(x,y)W XY Z
(14.2)
computes the weak classifier as In order to train the cascade classifier, we need a
set of positive and negative samples. In our work, we
1 pf (x) < pθ have incorporated the utility called opencv_createsam-
q (x, f, p, θ) = (14.3)
0 otherwise ples to create the positive samples for opencv_traincas-
cade. The output file of this function serves as an input
where f represents the value of the feature, θ is the value to opencv_traincascade to train the detected face. The
of the threshold and p is the polarity, which indicates negative samples are collected from arbitrary images,
the inequality direction. which do not include the objects to be detected.
The weak classifiers are then further processed to Fig. 14.2 and Table 14.2 show the flow of the cas-
achieve a strong classifier with minimization of low cade classifier. Initially, the classifier was trained with a
false positive rate attainment. The strong classifier is few positive and negative samples, which are arbitrary
computed as images of the same size, of which both samples were
equally scaled in their size. The classifier generates “1” if
⎧ T
⎨ 1 t=1 αt qt (x) ≥ γ t,
the region possibly identifies the face and generates “0”
H (x) = (14.4) otherwise. The major goal of the cascade classifier is to
⎩ find the face objects of interest at diverse sizes, making
0 otherwise
the classifier more proficient without altering the size of
where αt = log βt1 and γ t is taken to ensure that all the the input images.
positive training samples are classified correctly.
TABLE 14.2
14.3.1.4 Cascade Classifier Cascade classification technique.
Haar feature-based cascade classifiers is an effectual ma-
P – Set of positive samples
chine learning based approach, in which a cascade func-
N – Set of negative samples
tion is trained using a sample that contains a lot of
positive and negative images. The outcome of AdaBoost For each feature f
classifier is that the strong classifiers are divided into In each stage, use P & N to train the classifier with
stages to form cascade classifiers. The term “cascade” the selected features
means that the classifier thus produced consists of a Step 1: Assign the weights for the features
set of simpler classifiers which are applied to the re- Step 2: Normalize the weights
gion of interest until the selected object is discarded or Step 3: Based on the output of Step 2, select the
passed. next best (weak) classifier
The cascade classifier splits the classification work Step 4: Update weights and evaluate the features
into two stages: training and detection. The training for the selected criteria
stage does the work of gathering the samples which can
Step 5: If it passes, apply the second stage of
be classified as positive and negative. The cascade clas- features and continue the process. Else normalize
sifier employs some supporting functions to generate a the weights and repeat the steps.
training dataset and to evaluate the prominence of clas- End for
sifiers.
230 Deep Learning and Parallel Computing Environment for Bioengineering Systems
TABLE 14.4
In live analysis there is a chance of not detecting the
Landmark analysis technique. face within some frame. Nevertheless, when this live de-
tection is combined with appropriate GPU processing
For all detected face instances individually support, it avoids the process of former facial detection
Sketch the facial landmarks with the predictor class and the capture rate and analysis will be relatively im-
The landmarks are positioned using x and y axis in x[] proved on the provision of high computing capability.
and y[] This leads to low failure rate in the process of object
Compute xmean and ymean detection with live frames. Therefore, to enhance the
analysis factor in the proposed work, we process the in-
For every value in x[] and y[]
put image using the Viola–Jones Haar-cascade classifier
Compute the mean of both axes to determine center of
to detect a face and then with the landmark using Open
gravity using
CV Dlib Python libraries to examine the detected face.
xcentral = [(x-xmean) for x in xlist]
ycentral = [(y-ymean) for y in ylist] 14.3.3 Expression Classifier
Compute the angle for each face detected as The proposed work classifies the detected facial expres-
if xlist[26] == xlist[29] sion by means of an SVM-linear kernel classifier. SVM is
anglenose = 0
one of the most popular machine learning algorithms
which falls into the category of supervised learning.
else
SVM can be employed for solving the problems related
anglenose = (y[26]-y[29])/(x[26]-x[29]))*180/π ) to classification and regression.
if anglenose < 0 The working principle of SVM is the concept of deci-
anglenose += 90 sion planes which separate the different sets of objects
else based on the specific criteria. The SVM model is widely
used in the classification of images which motivated the
anglenose −= 90
use of SVM in the proposed work for classification of ex-
Compute the angel relative as
pressions and grouping them into their categories.
(z-ymean)/(w-xmean))*180/π ) – anglenose The facial landmark points are analyzed to classify
the expressions such as anger, disgust, fear, happyness,
232 Deep Learning and Parallel Computing Environment for Bioengineering Systems
sadness, surprise and neutrality. These expressions are in details about the frames identified for analysis. The
evaluated in children both with ASD and TD. two major challenges faced during face detection using
The linear classifier classifies the facial expression the Haar-cascade classifier are larger search space and
which is of the form larger visual variations. To overcome this challenge, we
employed CNN to detect features in a frame in a fine
f (x) = wxt + b (14.5) granular manner.
In order to achieve better performance in screening
where x corresponds to the feature, w corresponds methodology, the CNN based facial expression identifi-
to the weight, and b is the bias. Each image from cation is made to undergo facial detection using Haar-
ASD_POSITIVE and ASD_NEGATIVE is divided using cascade classification to detect a face from any frame of
the ratio of 80:20 for training and testing. input as discussed in Sect. 14.3.1.
The CNN involves three basic layers such as (a) con-
14.3.4 Expression Identification Through a volution layer, (b) max pooling layer and (c) fully con-
Convolution Neural Network (CNN) nected network layer as shown in Fig. 14.6. The CNN
The face detected using a Haar-cascade classifier and the technique starts reading the input frame after frame, and
emotion detected using Dlib library functions result in a each frame undergoes a series of convolution layers and
better performance of expression recognition. To further a subsampling layer. These two layers of operation are
improve this performance and to improve the accuracy termed the hidden layer that could be maximized to at-
ratio, a deep learning algorithm is employed to learn tain maximum accuracy in feature extraction.
CHAPTER 14 Multimodal Deep Learning Based Expression Analysis of Children With ASD 233
FIG. 14.6 CNN technique for expression identification in children with ASD.
The convolution neural network based landmark emotion varying in the order of anger, disgust, fear,
identification computes the points with x and y coordi- happyness, sadness, neutrality, sleepiness in our imple-
nates indicated as (lx , ly ). In general, with 68 landmark mentation. Each point in the landmark takes four ma-
points the convolution layer will have neurons defined jor vectors with entries such as x-axis, y-axis, angular
from (l 1x , l 1y ) to (l 68x , l 68y ) as shown in Fig. 14.8 distance from head central point, and angle of land-
where l is the landmark and 1 to 68 are the facial mark.
landmark points pointed on the detected face. These These detections are made from the combinations of
landmarks are annotated and are labeled from each neurons formed out of the convolution layer as shown
in Fig. 14.9.
TABLE 14.5
Basic convolution technique.
Convolutional layer accepts a volume of size
W1 × H1× D1W1 × H1 × D1
Computes through 4 hyperparameters
Number of filters = K,
Spatial extent = F ,
Stride S = 1,
Amount of zero padding = P
Produces a volume of size W2 × H2 × D2 where
W2 = (W1−F + 2P )/S + 1
H2 = (H1−F + 2P )/S + 1 (i.e., width and height are com-
puted equally by symmetry)
D2 = K
Computes weights per filter as (F × F × D)
Computes weight for n bias terms as (F × F × D) × K
Output will be of size W2 and H2 with depth d through
the stride S. FIG. 14.9 Landmark vectorization through combinational
neurons.
TABLE 14.7
Comparison of difference in probabilities across various facial expressions in ASD and TD.
Category Facial Anger Disgust Fear Happyness Sadness Neutrality Sleepiness
expression
ASD Anger 0 0.165923 0.105921 0.050293 0.117268 0.168714 0.068018
ASD Disgust 0.16592 0 0.06 0.11563 0.04866 0.002791 0.09791
ASD Fear 0.10592 0.060002 0 0.05563 0.011347 0.062793 0.0379
ASD Happy- 0.05029 0.11563 0.055628 0 0.066975 0.118421 0.017725
ness
ASD Sadness 0.11727 0.048655 0.01135 0.06697 0 0.051446 0.04925
ASD Neutrality 0.16871 0.002791 0.06279 0.11842 0.05145 0 0.1007
ASD Sleepi- 0.06802 0.097905 0.037903 0.01772 0.04925 0.100696 0
ness
TD Anger 0 0.199906 0.061775 0.166527 0.092745 0.146316 0.011966
TD Disgust 0.19991 0 0.13813 0.03338 0.10716 0.05359 0.18794
TD Fear 0.06177 0.138131 0 0.104753 0.030971 0.084541 0.04981
TD Happy- 0.16653 0.033378 0.10475 0 0.07378 0.02021 0.15456
ness
TD Sadness 0.09275 0.10716 0.03097 0.073782 0 0.053571 0.08078
TD Neutrality 0.14632 0.05359 0.08454 0.020211 0.05357 0 0.13435
TD Sleepi- 0.01197 0.18794 0.049809 0.154561 0.080779 0.13435 0
ness
ommends the neighboring reaction that the children that could be modestly differentiated. This inference
might express. aids in examining the performance of facial expression
Table 14.7 also depicts that for each facial expression analysis and thus endows suitable progression in detec-
there could be some facial expression from the children tion. The insights made from these distraction probabil-
CHAPTER 14 Multimodal Deep Learning Based Expression Analysis of Children With ASD 237
ities are that (i) any child with a maximum probability veloping children increased the probabilities in most
emotion could toggle to the second maximum prob- possessed expressions when compared to SVM classifier
ability; (ii) the probabilities of various emotions dif- implemented through Dlib. The various probabilities
fered for ASD positive and ASD negative children, and among the two techniques are annotated in Table 14.8,
which shows the probability variations among ASD pos-
(iii) imperfect prediction or gap of emotional analysis
itive and ASD negative children through simple SVM
could also be inferred through the expression analysis
linear classifier and then the application of CNN deep
as pictorially represented in Fig. 14.11 to Fig. 14.17 for
learning algorithm.
Sadness, Neutral, Happy, Fear, Disgust, Anger and Sleep The probabilities show that the maximum probabil-
respectively. ity obtained in the SVM classification of expression is
The facial expressions identified among the children influenced at a higher rate when CNN is applied. This
who are prone to autistic behavior and normally de- classification based CNN also identifies and quantifies
238 Deep Learning and Parallel Computing Environment for Bioengineering Systems
the most probable expression that ASD positive and In Fig. 14.20, the left bar indicates the probabilistic
ASD negative children express. value obtained in ASD positive children using the SVM
In Figs. 14.18 and 14.19, the bars indicate that the classifier over the expressions of anger, disgust, fear,
maximum probability of expression faced by an ASD happyness, sadness, neutrality and sleepiness, while
child is predominant on neutral and disgust expres- the right bar indicates the probabilistic value obtained
sions, whereas happy and neutral expressions are pre- when applying CNN. Fig. 14.21 indicates the analo-
dominant in typically developing children. This indi- gous probabilistic values computed through SVM and
cates that the parents and caretakers have to be keen CNN for ASD negative children. Figs. 14.20 and 14.21
observers when a child is prone to disgust expression also indicate that the distraction probabilities of ASD
with no reason and under faster sequence of repetition. +ve and ASD −ve are minimized upon application of
These insights provide an early biomarker for screening CNN. This is because of the deeper analysis made in
of autism rather than admitting the child to be autistic the facial landmark points through CNN computa-
in nature. tion.
CHAPTER 14 Multimodal Deep Learning Based Expression Analysis of Children With ASD 239
TABLE 14.8
Probability comparison of facial expression using SVM and CNN.
Facial expression SVM Linear CNN
ASD positive ASD negative ASD positive ASD negative
Anger 0.046345 0.046512 0.03821 0.05545
Disgust 0.209656 0.245092 0.23595 0.22614
Fear 0.152989 0.105871 0.16954 0.12547
Happyness 0.095634 0.212718 0.12832 0.24249
Sadness 0.16421 0.139531 0.01432 0.11568
Neutrality 0.214629 0.191544 0.28689 0.20115
Sleepiness 0.116538 0.058732 0.12677 0.03362
240 Deep Learning and Parallel Computing Environment for Bioengineering Systems
able expression, whereas ASD negative children show 6. G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algo-
happyness as the major expression and then disgust and rithm for deep belief nets, Neural Computation 18 (2006)
neutrality in order. 1527–1554.
The facial expression identification is then per- 7. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimension-
ality of data with neural networks, Science 313 (2006)
formed using a convolutional neural network that max-
504–507.
imizes the accuracy in classification. The CNN method 8. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classifi-
aims to identify the expression in a still image, sepa- cation with deep convolutional neural networks, Advances
rating into frames to learn in-depth features. The dis- in Neural Information Processing Systems 60 (2012)
traction probabilities of varied expressions are mini- 1097–1105.
mized by improving the appropriate facial expression 9. Qingchen Zhanga, Laurence T. Yang, Zhikui Chenc, Peng
results. The accuracy in screening methodology through Lic, A survey on deep learning for big data, Information
expression could be further improved by applying a Fusion 42 (2018) 146–157.
10. N.K. Bansode, P.K. Sinha, Facial feature extraction and tex-
recurring neural network that identifies the expres-
tual description classification using SVM, in: International
sion in live video input. The deep learning techniques, Conference on Computer Communication and Informat-
RNN and CNN, together give better screening accu- ics, 2014, pp. 1–5.
racy. 11. Kaiqi Cen, Study of Viola–Jones Real Time Face Detector,
2016.
12. Lydia R. Whitaker, Andrew Simpson, Debi Roberson, Brief
14.6 FUTURE WORK report: is impaired classification of subtle facial expres-
sions in children with autism spectrum disorders related to
The major challenge faced in the facial expression de- atypical emotion category boundaries? Journal of Autism
tection is that when using live video analysis the com- and Developmental Disorders (June 2017).
puting capacity needs to be increased to a GPU based 13. Lien Van Eylen, Bart Boets, Jean Steyaert, John Wagemans,
analysis. Furthermore, the screening in contactless envi- Hse Noens, Local and global visual processing in autism
ronment seems to be a basic level screening as the analy- spectrum disorders: influence of task and sample character-
sis should also be made on an object interference to the istics and relation to symptom severity, Journal of Autism
stimuli. Such differentiated screening technique might and Developmental Disorders (August 2015).
enhance the identification factor of neurodevelopmen- 14. Jorieke Duvekot, Jan van der Ende, Frank C. Verhulst,
Geerte Slappendel, Emma van Daalen, Athanasios Maras,
tal disorder in autistic children. The expressions could
Kirstin Greaves-Lord, Factors influencing the probability
be expanded to find emotions when applying time se- of a diagnosis of autism spectrum disorder in girls versus
quences. The entire process could be extended to work boys, SAGE Journals 21 (6) (December 2016) 646–658.
for real time live video analysis of the children using 15. Neri L. Romero, A pilot study examining a computer-based
a recurring neural network combined with the existing intervention to improve recognition and understanding
technique. of emotions in young children with communication and
social deficits, Research in Developmental Disabilities 65
(June 2017) 35–45.
REFERENCES 16. Daniel Bone, Somer L. Bishop, Matthew P. Black, Matthew
S. Goodwin, Catherine Lord, Shrikanth S. Narayanan, Use
1. John N. Constantino, M.D. Natasha Marrus, The Early Ori- of machine learning to improve autism screening and di-
gins of Autism, Elsevier Inc., 2017. agnostic instruments: effectiveness, efficiency, and multi-
2. Tanaya Guha, Zhaojuns Yang, Ruth B. Grossman, instrument fusion, Journal of Child Psychology and Psy-
Shrikanth S. Narayanan, A computational study of expres- chiatry 57 (August 2016) 927–937.
sive facial dynamics in children with autism, IEEE Transac- 17. Yongning Song, Yuji Hakoda, Selective impairment of ba-
tions on Affective Computing (March 2016). sic emotion recognition in people with autism: discrimi-
3. Jorieke Duvekot, Jan van der Ende, Frank C. Verhulst, nation thresholds for recognition of facial expressions of
Geerte Slappendel, Emma van Daalen, Athanasios Maras, varying intensities, Journal of Autism and Developmental
Kirstin Greaves-Lord, Factors influencing the probability Disorders (December 2017) 1–9.
of a diagnosis of autism spectrum disorder in girls versus 18. Philipp Michel, Rana El Kaliouby, Real time facial ex-
boys, SAGE Journals 21 (6) (December 2016) 646–658. pression recognition in video using support vector ma-
4. M.J. Hollocks, A. Ozsivadjian, C.E. Matthews, P. Howlin, chines, in: ICMI’03, Vancouver, British Columbia, Canada,
E. Simonoff, The relationship between attentional bias and November 5–7, 2003.
anxiety in children and adolescents with autism spectrum 19. N. Yirmiya, C. Kasari, M. Sigman, P. Mundy, Facial ex-
disorders, Autism Research 6 (2013) 237–247. pressions of affect in autistic, mentally retarded and nor-
5. Julian Arellano, Noah Makow, Pablo Hernandez, Facial Ex- mal children, Journal of Child Psychology and Psychiatry
pression Recognition, Cs 221 Final Project, Fall 2016. 30 (5) (1989) 725–735.
CHAPTER 14 Multimodal Deep Learning Based Expression Analysis of Children With ASD 243
Deep Learning and Parallel Computing Environment for Bioengineering Systems. https://doi.org/10.1016/B978-0-12-816718-2.00022-1 245
Copyright © 2019 Elsevier Inc. All rights reserved.
246 Deep Learning and Parallel Computing Environment for Bioengineering Systems
machine learning algorithms, where expectation of the etc., are some more applications [25]. Therefore, the re-
distribution is determined by summing over functions lationship between deep learning and bioinformatics is
of the data points. GraphLab is another broader frame- in a path of greater growth and potential.
work used for asynchronous iterative algorithms with For example, microarray data is used to predict the
sparse dependencies [12]. patient’s outcome. On the basis of patients’ genotypic
The development of parallel machine learning algo- microarray data, their survival time and risk of tumor
rithms to process large data sets in real world appli- metastasis or recurrence can be estimated. An efficient
cations is inevitable. Recently, the IBM Haifa Lab and algorithm which considers the correlative information
Watson Labs, which are machine learning based labs, in a comprehensive manner is highly desirable.
produced tools for parallel machine learning [18]. The The rest of this chapter is organized as followed. In
IBM Haifa Lab and Watson Labs work with IBM de- Sect. 15.2, the relation between deep learning and par-
velopment and services arms, partners and clients to allelization is discussed. In Sect. 15.3, the role of deep
answer their needs, and collaborate with universities to learning in bioinformatics is discussed. Section 15.4
promote industrial research. Continuously reinventing presents the application of parallel deep learning al-
and refocusing themselves to stay at the forefront of the gorithms on bio informatics applications. Section 15.5
technology, most projects today fall under artificial in- presents the implementation screenshots of the training
telligence, cloud data services, healthcare informatics, process. Section 15.6 summarizes all the sections and
and image and video analytics, alongside mobile appli- gives the future directions.
cations, security and quality [19]. The labs also focus
on the healthcare domain. These tools are helpful in ex-
ecuting machine learning algorithms on multithreaded 15.2 DEEP LEARNING AND PARALLEL
and multiprocessors machines [20]. PROCESSING
15.1.3 Deep Learning Applications in In this section, basic concepts on parallel processing al-
Bioinformatics gorithms and deep learning based parallel algorithms
are discussed. The discussion will give an insight of in-
Bioinformatics, or computational biology, is the science
tegrating the deep models and parallel algorithms.
of interpreting biological data through computer sci-
ence. Due to the vast development of protein sequence,
15.2.1 Parallel Processing
genomics, three-dimensional modeling of biomolecules
and biological systems, etc. [21], large amount of bi- Parallel processing is basically used to minimize the
ological data is being generated. Complex analysis is computation time of a monotonous process, by split-
needed to derive conclusions from this huge amount of ting the huge datasets into small meaningful parts to
biological data [22]. acquire proper outcomes from it. Web services, social
Good knowledge in molecular biology and com- media, speech processing, medical imaging, bioinfor-
puter science is required to approach the analysis of matics and many similar fields are facing the difficulty
bioinformatics data. As the generation of data from of analyzing terabytes of data they collect daily. There
genomic, proteomic and other bioinformatics applica- are some problems in which the run-time complexity
tions increased on a large scale, analyzing this data gains cannot be improved even with many processors [26].
more attention [23]. Data mining techniques act as a In some cases such as user experience with the inter-
base to analyze these datasets. The outcome of the anal- face, parallel thread handling creates frustration, when
yses of large data should be sensible in terms of the he/she is trying to access something and the parallel
structure inferred by the data [24]. thread hovers to some other location. Parallel algo-
Some of the applications under classification tasks rithms are called efficient when their run-time com-
are cancer cell classification, gene classification and clas- plexity divided by the number of processors is equal to
sification of microarray data. Protein structure predic- the best run-time complexity in sequential processing.
tion, statistical modeling of protein–protein interac- Therefore, some activities must be in a sequential man-
tion, clustering of gene expression data, gene finding, ner only, and judging the tendency of making it parallel
protein function domain detection, function motif de- or serial is a separate domain of problems [25].
tection, protein function inference, disease diagnosis, To rectify this issue, we move to parallel processing
disease prognosis, disease treatment optimization, pro- where more than one processor is used to process the
tein and gene interaction network reconstruction, data instructions. This way the workload is divided between
cleansing, and protein subcellular location prediction, multiple processors. See Fig. 15.1.
CHAPTER 15 Parallel Machine Learning and Deep Learning Approaches for Bioinformatics 247
The disadvantage of parallel computing is the depen- 15.2.3 Deep Learning Using Parallel
dency between processors, i.e., one processor waits for Algorithms
the result of another processor. Dual core, multicore, i3, The tendency of machine learning algorithms to solve
i5, i7, etc., all denote the number of processors in mod- complex problems with simple solutions has to be
ern computing environment. scaled to large-scale problems, too. By exploring ma-
chine learning with large-scale problems, the outcomes
15.2.2 Scalability of Parallelization Methods and patterns which we infer will be definitely out of log-
The different ways in which parallel processing can be ical thinking. But the space and time constraint of the
performed over different machines or on a single ma- sequential mechanism ruined the development. Plat-
chine which has multiple cores is known as a local forms like Spark and Hadoop can be used for parallel
training process. Once the data is localized in the ma- tasks in hyperparameter and ensemble learning where
chine, it takes care of the parallel processing within it multiple models have to be analyzed [29].
[27]. The usage of multiple cores can be of two types.
They are: 15.2.3.1 Advantages of Using Parallelism in
Deep Learning
• The lengthy way of loading multiple data in a single
layer and applying a multicore processor. • When it comes to deep learning, artificial neural net-
works (ANNs) are involved in it. The ANN takes a
• An alternative is to separate the data into batches and
large set of data as input, it learns the parameters
send each core a batch for processing.
and gets trained as a model. The training time taken
Storing of very large datasets in a local machine is
is very high. This learning of many parameters takes
not possible unless the memory increases proportion-
very long computation time. The computation time
ally with usage. To resolve this, the data is stored across
is considered on the order of days, where “q” denotes
many machines in a distributed manner. Here, either the number of cores in the processor. The VGGNet
the model or data can be distributed, which is dis- application takes about 10 hours for training even in
cussed below. In data parallelism, data is distributed am 8q machine. This is a computationally intensive
across multiple machines. When data is large or to at- process which takes a lot of time [30].
tain faster processing, data parallelism can be used. If, • The basic structure of deep neural networks is its dis-
on the other hand, the model is too big to fit in a single tributed nature across the layers. Deep learning paral-
system, model parallelism can be used. When a model lelization leads to the improvements in training time
is placed into a single machine, one model demands from months to weeks, or even days.
the output of another model. This forward and back- • The importance here is acceleration, nothing else.
ward propagation establishes communication between You can run deep learning solutions on a single pro-
the models from different machines in a serial fashion cessor or machine provided you can tolerate the slug-
[28]. gishness [29]. Hence, the sure way of speeding things
248 Deep Learning and Parallel Computing Environment for Bioengineering Systems
up is to use hardware acceleration just like in com- matics problems are still in its beginning stage of evo-
puter graphics since both graphics and deep learning lution after applying the convolutional, deep belief and
are inherently parallel in nature [13]. recurrent neural networks [24]. Fig. 15.2 illustrates var-
ious applications of bioinformatics. Molecular biology,
15.2.3.2 Parallelism Challenges in Deep genomics, etc., are no surprise as bioinformatics appli-
Learning cations. But the evolution of bioinformatics using infor-
Directly applying the parallel algorithms over the prob- mation technology, databases and design of drugs using
lems is difficult. The scalable sequential algorithms are computer aided software are to be noted.
not able to rectify these issues. To overcome this chal-
lenge, a framework which implements parallel machine 15.3.1 Bioinformatics Applications
learning algorithms on large distributed data such as The performance of all the machine learning and deep
social networks based on functional programming ab- learning algorithms in bioinformatics applications is
stractions has been discussed [31]. The algorithms can noticeable. This infers that deep learning is the most
be implemented very easily by using the functional effective among the technologies applied in this field.
combinators, which yields best composition of aggrega- Still, it requires the proper selection of the model for the
tion, distributed and sequential processes. This system problem, as well as parameters. See Fig. 15.3. Some of
also avoids inversion of control in a synchronous paral- the areas in bioinformatics research are discussed next.
lel model. The cost of the parallel processing units (i.e.,
GPU (graphical processing unit)) is yet another chal- 15.3.1.1 Sequence Analysis
lenge to overcome. Analysis of a sequence is a very basic operation in the
field of computational biology. It is used to find simi-
lar biological sequences and varying parts during med-
15.3 DEEP LEARNING AND ical analysis and genome mapping. By analyzing a se-
BIOINFORMATICS quence, it can be aligned in a proper manner. Sequences
Deep learning and bioinformatics go hand in hand which are searched often are stored in the database for
with the base applications like image processing, com- regular access from the computer.
puter vision, medical images, DNA sequencing, RNA
detection, gene structure prediction, drug discovery, re- 15.3.1.2 Genome Annotation
sistance to antibiotics, agriculture, weather forecasting, Dr. Owen White designed the first genome annotation
forensics, bio-weapons, nutrition science, etc. Bioinfor- software model in 1995. Genomics is nothing but the
CHAPTER 15 Parallel Machine Learning and Deep Learning Approaches for Bioinformatics 249
marking of genes and related biological features in a population increases, hence high level systems are re-
DNA sequence. quired to identify the sequence properly.
the pros and cons. The observing job of the researcher • To interpret and analyze data quicker by using
can be completely replaced by the computer modeled machine learning, deep learning and mining tech-
system. Some applications are in the analysis of clinical niques;
images, DNA clones overlapping detection, etc. • To improve the result accuracy;
• To detect mutations with next-generation sequenc-
15.3.1.9 Microarrays ing;
In order to automatically collect large amounts of data, • To facilitate cancer sequencing.
microarrays are very useful. Machine learning can assist
the microarrays in the analysis process, which in turn
15.3.3 Challenges in Using Parallel Deep
Learning for Bioinformatics
helps to identify the patterns and networks between the
Applications
genes.
• Reducing the additional time taken for dividing the
The expression of a gene in a genome is observed by
processes and combining the results;
the microarrays, which results in diagnosing cancer. Ra-
• Making sure the performance of the parallelization
dial basis functions, deep learning, Bayesian networks, process is making the additional time taken negligi-
decision trees and random forests are the most often ble;
used methods for analyzing the observations. • Identifying if the parallelization process takes more
time than a sequential process for a particular task,
15.3.1.10 Systems Biology how to overcome this, and the mechanism needed
By observing the components in a biological system, to reduce the time overhead;
the behaviors of the system can be inferred. DNA, RNA • Developing efficient bioinformatics algorithms and
and metabolites are some of those components which approaches for target identification and validation,
need to be observed. Probabilistic models are devel- lead identification and optimization to improve drug
oped for them and they are used in genetic algorithms, discovery [32].
which comprise Markov models. Some applications are
enzyme function prediction, high throughput microar-
ray data analysis, analysis of genome-wide association 15.4 PARALLEL DEEP LEARNING IN
studies to better understand markers of multiple scle- BIOINFORMATICS APPLICATIONS
rosis, protein function prediction, and identification of WITH IMPLEMENTATION AND REAL
NCR-sensitivity of genes in yeast. TIME NUMERICAL EXAMPLE
Applying the parallel deep learning methodology over
15.3.1.11 Text Mining the bioinformatics applications brings a greater chal-
The need of text mining is very high in biological lenge in implementation. Their overheads, applicability
databases, articles, etc. Just observing the protein se- problems and related issues are addressed in this sec-
quence does not infer all the details about the structure, tion.
there have to be some other additional techniques to A general model has been designed in order to ex-
ecute the bioinformatics application’s data in parallel
extract the potential contents from the metadata, to de-
with deep learning algorithms. Fig. 15.4 and Fig. 15.5
termine the subcellular localization of a protein, to an-
depict the data flow of the parallel execution using deep
alyze DNA-expression arrays, as well as large-scale pro-
learning. The data flow consists of the following steps:
tein and molecule interaction. Another application of
1. The required bioinformatics data is collected by
text mining is the detection and visualization of distinct means of sensor or similar devices from the subject.
DNA regions given sufficient reference data [24]. 2. The collected data is preprocessed based on the type
of application and purpose of the user.
15.3.2 Advantages of Using Parallel Deep 3. Once the preprocessed data is ready, the entire
Learning in Bioinformatics dataset is split into training and test data.
Applications 4. The traditional training and testing split of 70 : 30 is
Bioinformatics is combining biology with computer sci- followed.
ence for the following reasons: 5. Now, the training data is fed to train the CNN model.
• To identify the molecular reason for a disease; 6. In order to perform parallel processing, the datasets
• To explain the influencing factor of a disease at the have to be separated into equal halves for paral-
gene level; lel processing. Certain scheduling algorithms can be
CHAPTER 15 Parallel Machine Learning and Deep Learning Approaches for Bioinformatics 251
• The fact that, once the data set is ready, the data needed to address this dynamic alteration issues among
schema has to be framed in such a way that it is loss- parallel processors.
less and dependency preserving;
• The idea that the first level of dataset separation is the
separation of data as training and test data. This level 15.5 SAMPLE IMPLEMENTATION
of data separation need not be checked for lossless SCREENSHOTS TO VISUALIZE THE
and dependency preserving properties, since we are TRAINING PROCESS
not going to combine it again; In this section, screenshots of the implementation are
• Using the second level of dividing or scheduling the shown step by step. In Fig. 15.6, the backend Ten-
dataset as per the availability of the number of pro- sorFlow initialization and GPU device release for the
cessors given for execution. process computations is shown. Fig. 15.7 illustrates
The deep learning algorithm splitting to the processors the convolutional neural network (CNN) model ini-
can happen in two ways: tialization. The model is stored as a checkpoint file and
• The deep learning algorithm can be fixed as baseline regained by the same net model for the next iterations.
and same pseudocode can be passed to all the pro- The CNN ultimately performs pooling operation in the
cessors; pooling layer after each convolution layer operation.
The frequency of pooling and output format of the av-
• Each processor can have a different algorithm execut-
erage pooling layer are presented.
ing for the same purpose provided there is no data
The ultimate aim of the parallel algorithms is to
dependency between the datasets.
reduce the execution time required for a particular
Either the processor or the process has to be kept con-
task.
stant for the other to perform parallel processing. The
Fig. 15.8 shows the similarity matrix calculated
former will act as the point of reference to the latter. for the classification task. The higher the similarity
By following this principle, any number of parameters value, the closer the samples are to the correspond-
can be manipulated and arranged to suit the paral- ing class; and they are mapped to their respective
lel processing. But, the main requirement is that the classes.
datasets should not have any data dependency among Fig. 15.9 shows the execution time for the training
them. and testing of the samples. The execution time shows us
Tackling the data dependency issues are very tedious how effectively the parallel algorithms played their roles
in parallel processing. Even though structured query in deep learning techniques.
languages can be used to manage the issue effectively, Fig. 15.10 shows the final classified samples for each
dynamic alterations to parallel environment are dif- class. With the help of these, we can easily categorize the
ficult. Thus, separate efficient parallel algorithms are samples and their classes to which they belong.
CHAPTER 15 Parallel Machine Learning and Deep Learning Approaches for Bioinformatics 253
Nielsen, B. Petersen, et al., Netsurfp-2.0: improved pre- 19. C. Zhang, X. Sun, K. Dang, K. Li, X.-w. Guo, J. Chang,
diction of protein structural features by integrated deep Z.-q. Yu, F.-y. Huang, Y.-s. Wu, Z. Liang, et al., Toward an
learning, Proteins: Structure, Function, and Bioinformatics expert level of lung cancer detection and classification us-
(2019). ing a deep convolutional neural network, The Oncologist
4. H. Kashyap, H.A. Ahmed, N. Hoque, S. Roy, D.K. Bhat- (2019).
tacharyya, Big data analytics in bioinformatics: a machine 20. L. Wei, Y. Ding, R. Su, J. Tang, Q. Zou, Prediction of hu-
learning perspective, arXiv preprint, arXiv:1506.05101. man protein subcellular localization using deep learning,
5. D. Ravì, C. Wong, F. Deligianni, M. Berthelot, J. Andreu- Journal of Parallel and Distributed Computing 117 (2018)
Perez, B. Lo, G.-Z. Yang, Deep learning for health infor- 212–217.
matics, IEEE Journal of Biomedical and Health Informatics 21. H. Fu, Y. Yang, X. Wang, H. Wang, Y. Xu, Deepubi: a deep
21 (1) (2017) 4–21. learning framework for prediction of ubiquitination sites
6. A. Akay, H. Hess, Deep learning: current and emerging ap- in proteins, BMC Bioinformatics 20 (1) (2019) 86.
plications in medicine and technology, IEEE Journal of
22. K. Raza, Application of data mining in bioinformatics,
Biomedical and Health Informatics (2019).
arXiv preprint, arXiv:1205.1125.
7. Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature
23. V.I. Jurtz, A.R. Johansen, M. Nielsen, J.J. Almagro Ar-
521 (7553) (2015) 436.
menteros, H. Nielsen, C.K. Sønderby, O. Winther, S.K.
8. J. Schmidhuber, Deep learning in neural networks: an
Sønderby, An introduction to deep learning on biologi-
overview, Neural Networks 61 (2015) 85–117.
9. L. Wei, R. Su, B. Wang, X. Li, Q. Zou, X. Gao, Integration cal sequence data: examples and solutions, Bioinformatics
of deep feature representations and handcrafted features to 33 (22) (2017) 3685–3690.
improve the prediction of n6-methyladenosine sites, Neu- 24. S.Y. Rhee, J. Dickerson, D. Xu, Bioinformatics and its appli-
rocomputing 324 (2019) 3–9. cations in plant biology, Annual Review of Plant Biology
10. F. Luo, M. Wang, Y. Liu, X.-M. Zhao, A. Li, DeepPhos: pre- 57 (2006) 335–360.
diction of protein phosphorylation sites with deep learn- 25. S. Min, B. Lee, S. Yoon, Deep learning in bioinformatics,
ing, Bioinformatics (2019). Briefings in Bioinformatics 18 (5) (2017) 851–869.
11. G.B. Goh, N.O. Hodas, A. Vishnu, Deep learning for com- 26. B. Alipanahi, A. Delong, M.T. Weirauch, B.J. Frey, Predict-
putational chemistry, Journal of Computational Chemistry ing the sequence specificities of DNA- and RNA-binding
38 (16) (2017) 1291–1307. proteins by deep learning, Nature Biotechnology 33 (8)
12. A. Karpathy, L. Fei-Fei, Deep visual-semantic alignments (2015) 831.
for generating image descriptions, in: Proceedings of the 27. H. Gelbart, A. Yarden, Learning genetics through an au-
IEEE Conference on Computer Vision and Pattern Recog- thentic research simulation in bioinformatics, Journal of
nition, 2015, pp. 3128–3137. Biological Education 40 (3) (2006) 107–112.
13. V. Hegde, S. Usmani, Parallel and Distributed Deep Learn- 28. R. Khattree, D. Naik, Machine learning techniques for
ing, Tech. rep., Stanford University, June 2016, https:// bioinformatics, in: Computational Methods in Biomedical
stanford.edu/~rezab/dao/projects_reports/hedge_usmani. Research, Chapman and Hall/CRC, 2007, pp. 57–88.
pdf, 2016. 29. S. Salza, M. Renzetti, Performance modeling of paral-
14. M. Staples, L. Chan, D. Si, K. Johnson, C. Whyte, R. Cao, Ar- lel database systems, Informatica-Ljubljana 22 (1998)
tificial intelligence for bioinformatics: applications in pro- 127–140.
tein folding prediction, bioRxiv (2019) 561027. 30. G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N.
15. A.A. Abdullah, S. Kanaya, Machine learning using H2O Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, et
R package: an application in bioinformatics, in: Proceed- al., Deep neural networks for acoustic modeling in speech
ings of the Third International Conference on Computing, recognition: the shared views of four research groups, IEEE
Mathematics and Statistics (iCMS2017), Springer, 2019,
Signal Processing Magazine 29 (6) (2012) 82–97.
pp. 375–381.
31. K.R. Foster, R. Koprowski, J.D. Skufca, Machine learning,
16. Q. Zou, Q. Liu, Advanced machine learning techniques for
medical diagnosis, and biomedical engineering research-
bioinformatics, 2018.
commentary, Biomedical Engineering Online 13 (1)
17. Y. Yuan, G. Xun, Q. Suo, K. Jia, A. Zhang, Wave2vec: deep
(2014) 94.
representation learning for clinical temporal data, Neuro-
computing 324 (2019) 31–42. 32. Y. Chen, Y. Li, R. Narayan, A. Subramanian, X. Xie, Gene
18. D. Quang, Y. Chen, X. Xie, Dann: a deep learning approach expression inference with deep learning, Bioinformatics
for annotating the pathogenicity of genetic variants, Bioin- 32 (12) (2016) 1832–1839.
formatics 31 (5) (2014) 761–763.
Index
A Bioinformatics, 17–22, 25, 28, 29, 31, CNNs, 30, 31, 42, 43, 50, 75, 92, 93,
Accelerated cuckoo optimization 32, 246, 248, 250, 254 100, 123, 128, 130, 131,
algorithm (ACOA), 190 applications, 23, 26, 31, 246, 248, 141, 143, 147, 225, 227
Accelerated parallel processing (APP), 9 250, 254 architecture, 45, 64, 66, 78–83
Accuracy, 82, 84, 85, 91–93, 99, 102, computations, 23, 31 deep learning, 233, 237
103, 139, 143, 151, 158, data, 25, 31, 246, 250 Complete
161, 162, 180, 184, 187, MapReduce tools, 23 dataset, 159
227, 232, 233, 240, 242, problems, 25, 26, 248 network, 44
245, 250 research, 17, 248 Component substitution (CS), 42
classification, 63, 65, 70, 92, 99, 100, tools, 19, 20, 23 Compound annual growth rate (CAGR),
103, 120, 161, 227 workloads, 23 19
evaluation, 101 Compute unified device architecture
Biospark framework, 25
rate, 120 (CUDA), 9, 105, 112
Brain tumor, 53–56, 58–61, 64, 66, 67,
scores, 104, 153 Computed tomography (CT), 37, 59,
140
AdaBoost classifier, 228, 229 130
diagnosis, 58, 59, 67
Allostatic optimization (AO), 207 Concealed layers, 171, 173
Ant colony optimization (ACO), 212 segmentation, 63, 65, 66
Brain tumors classification, 53 Conditional random field (CRF), 62
Apache Convolution, 82, 86, 90, 99, 108, 141
Flink, 24, 25, 180, 181 algorithm, 233
Hadoop, 20, 22, 23, 181 C
filters, 81, 86
Hadoop ecosystem, 19 Caffe, 65
kernel, 111
Spark, 19, 24, 31, 181, 184 CaffeOnSpark, 31 layer, 75, 79, 86, 145, 173, 175,
Arithmetic logic unit (ALU), 7 Cascade classifier, 227, 229 232–235, 252
Artificial intelligence (AI), 1, 25, 167, Central nervous system (CNS), 54 network, 43, 86
210 Central processing unit (CPU), 105 neural network, 141, 227, 232, 234
Artificial neural networks (ANN), 1, 2, Cerebral blood volume (CBV), 61 operation, 86, 90, 93
4, 26, 29, 102, 104, 140, CIFAR datasets, 99, 121
225, 247 Convolutional
Classification feature map, 82
ASD negative children, 233, 237, 238, accuracy, 63, 65, 70, 92, 99, 100, 103,
240, 242 features, 78, 83
120, 161, 227 layer, 45, 76–79, 81, 82, 84–86, 90,
ASD positive children, 225, 227, 233, algorithms, 101, 154, 155, 159–161,
237, 238, 240 93, 111, 128
163 networks, 86, 104
Autism spectral disorder (ASD), 225
brain tumor, 67 Convolutional neural networks (CNN),
Automatic segmentation, 60, 64, 145
data, 63, 67, 123 26, 30, 31, 37, 42, 43, 64,
Average gradient (AG), 47
layer, 83, 92 67, 75, 123, 128, 130, 141,
neural network, 30 143, 179, 225, 227, 232,
B
object, 84, 92, 101 252
Backbone network, 85
problem, 26, 27, 63, 153, 157, 159, CUDA, 9, 99, 105, 106, 112, 118, 121
Basic local alignment search tool
162 deep neural network, 110
(BLAST), 20
procedures, 100, 130 GPU, 9, 106
Batch dataset, 117
BBO algorithm, 191, 196, 197, 199, 201, process, 101 kernel, 111
203 results, 99, 100 program, 105, 111
Binary classification, 27 subnet, 85
Binary classifier, 112 supervised, 63, 100, 101 D
Binary robust independent elementary SVM, 237 Data
features (BRIEF), 39 tumor, 75, 143 bioinformatics, 25, 31, 246, 250
Binary robust invariant scalable key unsupervised, 63, 100, 101 classification, 63, 67, 123
(BRISK), 39, 40 Classifier, 101, 102, 109, 112, 153, 155, Hadoop, 19
Biography based optimization (BBO), 157, 158, 228, 229, 231, 240 parallelism, 3–5, 9, 11, 14, 31, 105,
190, 196 SVM, 227, 237, 238 247
257
258 INDEX
Dataset, 3, 9, 10, 17, 20, 23, 28, 30, 109, Epigenomics, 130 framework, 18, 30–32
112, 119, 147, 149, 158, 161, Evolutionary algorithm (EA), 209, 210 MapReduce, 23, 181
169, 175, 181, 235, 250, 251 Evolutionary computing (EC), 210 platform, 19
for Hadoop practice, 182 Evolutionary programming (EP), 210 Hadoop distributed file system (HDFS),
DCNN, 50, 99, 121, 142 Evolutionary strategy (ES), 210 19, 181
architecture, 50 Evolvable hardware (EH), 210 Hardware utilization efficiency (HUE),
Deconvolution, 86, 93 Expectation maximization (EM), 27 214
Deconvolution performs filtering, 86 Exponential linear unit (ELU), 91 HDFS, 19–24, 180, 181, 184, 185
Deep Extra fraction (EF), 65 Health informatics, 123–126, 130, 131,
architecture based RNNs, 131 133, 134
belief network, 123, 127 F Hidden layers, 1, 4, 26, 29, 81, 108, 124,
CNN layers, 111 Facial features, 225, 227, 228, 240 171, 232, 233, 240
CNNs, 99 Facility location problem (FLP), 190 Hidden Markov model (HMM), 62
convolutional nets, 104 Fault diagnosis, 126, 127, 129, 131, 133 High resolution (HR), 171
convolutional neural network Feature accelerated segment test (FAST), Hybrid evolutionary algorithms (HEA),
architecture, 83 40 210
convolutional neural networks, 14, Feature pyramid network (FPN), 85 Hyperparameter learning, 6
82, 99, 107 Features
layer neural networks, 104 convolutional, 83 I
learning, 1–3, 7–9, 14, 29, 31, 37, 42, extraction, 101, 123, 124, 127, 128, Image quality index (IQI), 46
64–67, 75, 93, 104, 107, 130, 225, 226, 230 ImageNet large scale visual recognition
108, 123, 130, 131, 134, extraction network, 128 challenge (ILSVRC), 104
139–141, 143, 144, 167, Haar, 147, 227 Independent component analysis (ICA),
170, 173, 174, 176, 179, layers, 93 42
181, 245–248, 250 Info layer, 171
map, 78–80, 83, 86
algorithms, 1, 2, 9, 14, 18, 134, Innermost layer, 85
File transfer protocol (FTP), 20
141, 227, 232, 245, 246, Integrated circuit (IC), 207
248, 250–252, 254 Intermediate representation (IR), 213,
approaches, 37, 125, 130, 131, 133, G
Gated recurrent units (GRU), 126 214
134 Inventory routing problems (IRP), 191
architectures, 133 Gene expression Omnibus (GEO), 20
Genetic algorithm (GA), 190, 210 IoT contraptions, 165, 168, 169, 176
frameworks, 104, 124 Iris dataset, 154, 155
methods, 66, 93, 144 Genetic programming (GP), 210
Genomics, 17, 18, 130, 131, 246, 248 iWarp processor, 209
networks, 50, 75, 127, 129, 130,
134, 142 Genomics datasets, 20
techniques, 26, 123, 124, 130, 134, Google TensorFlow, 109, 110 J
226, 227, 242, 252 GPUs, 1, 2, 5, 8, 9, 14, 99, 104, 105, 118, Jaccard coefficient (JC), 65
network, 75, 82, 91 121, 123 Joint single errand learning calculation,
neural networks, 4, 8, 14, 31, 75, 79, multiple, 1 177
83, 84, 86, 92–94, 99, 103, processors, 8
107, 124, 131, 140–142, Graphics processing unit (GPU), 1, 43, K
145, 225, 245, 247 56, 81, 104, 105, 123, 145, Keras, 31, 65
residual network, 84 158
Deep belief network (DBN), 64, 123, Grey tone distribution matrix (GTDM), L
125, 127, 141, 173, 179, 227 215 Landmark classification method, 230
Deep Boltzmann machine (DBM), 125, GTDM, 215, 217, 218, 220 Layer
127 GTDM matrix, 217 classification, 83, 92
Deep learning (DL), 37, 42, 64 convolution, 79, 86, 145, 173, 175,
Deep neural networks (DNN), 124, 141, H 232–235, 252
143, 225, 245 Haar convolutional, 45, 76–79, 81, 82,
Deeplearning4j, 65 cascade classifiers, 226, 227 84–86, 90, 93, 111, 128
Dependence graph (DG), 209, 213 classifiers, 226 network, 82, 108, 232
Diagnosis brain tumor, 58, 59, 67 features, 147, 227 Learning, 1, 3, 14, 29, 126, 134, 247,
Dice similarity coefficient (DSC), 65 Habitat suitability indices (HSI), 196 254
Dilated convolution, 86 Hadoop, 3, 5, 6, 18, 19, 21, 22, 25, 30, algorithms, 1, 5, 6, 124, 126, 245, 254
Directed acyclic graph (DAG), 24 31, 181 network, 83, 85
Discrete Fourier transform (DFT), 90 Apache, 20, 22, 23, 181 process, 29, 124
DNNs, 124, 134, 142, 143, 245 architecture, 20, 181 supervised, 26, 28, 64, 81, 124, 139,
big data, 18, 19, 22, 31, 32 140, 144, 153, 157, 245
E cluster, 19, 20, 22, 23, 31 unsupervised, 28, 67, 79, 81, 86, 124,
Electronic health records (EHR), 130 common, 19 140, 144, 153
Electronic medical record (EMR), 20 data, 19 Lesion segmentation, 93
Ensemble learning, 3, 6, 247 ecosystem, 18, 21, 30, 31 Linear discriminant analysis (LDA), 159
INDEX 259
learning, 26, 28, 64, 81, 124, 139, Theano, 31, 65 machine learning, 26, 27
140, 144, 153, 157, 245 Tiled convolution, 86 training, 124, 126
machine learning, 26–28 Time constrained (TC) scaling, 6 User defined functions (UDF), 22
Support vector machines (SVM), 25, 27, Torch, 65
31, 42, 146, 156, 163 Translational bioinformatics, 130 V
Support vector regression (SVR), 27 Transposed convolution, 86 Validation dataset, 112, 116, 117
SVM Tumor Vehicle routing problem (VRP), 189
classifier, 227, 237, 238 classification, 70, 75, 143 VGG network, 81, 85
linear classifier, 237, 240 segmentation, 61, 67, 75, 94 Visual graphics group (VGG), 81
Swarm intelligence (SI), 210, 211
Systolic arrays, 207–209, 218, 220 U W
UCI machine learning repository, 159, Weak classifiers, 228, 229
T 182 World health organization (WHO), 54
Tabu search (TS), 190 Unsupervised
Tensor flow, 65 classification, 63, 100, 101 Y
Tensor processing units (TPU), 107, 209 learning, 28, 67, 79, 81, 86, 124, 140, Yet another resource negotiator (YARN),
TensorFlow, 31, 99, 107–110, 118, 121 144, 153 19, 181