0% found this document useful (0 votes)
101 views

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Uploaded by

noaman lachqar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Uploaded by

noaman lachqar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

1996 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO.

4, FOURTH QUARTER 2014

Machine Learning in Wireless Sensor Networks:


Algorithms, Strategies, and Applications
Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, Member, IEEE, and Hwee-Pink Tan, Senior Member, IEEE

Abstract—Wireless sensor networks (WSNs) monitor dynamic tionally viable and robust. In the last decade, machine learning
environments that change rapidly over time. This dynamic be- techniques have been used extensively for a wide range of tasks
havior is either caused by external factors or initiated by the including classification, regression and density estimation in
system designers themselves. To adapt to such conditions, sensor
networks often adopt machine learning techniques to eliminate a variety of application areas such as bioinformatics, speech
the need for unnecessary redesign. Machine learning also inspires recognition, spam detection, computer vision, fraud detection
many practical solutions that maximize resource utilization and and advertising networks. The algorithms and techniques used
prolong the lifespan of the network. In this paper, we present an come from many diverse fields including statistics, mathemat-
extensive literature review over the period 2002–2013 of machine ics, neuroscience, and computer science. The following two
learning methods that were used to address common issues in
WSNs. The advantages and disadvantages of each proposed algo- classical definitions capture the essence of machine learning:
rithm are evaluated against the corresponding problem. We also 1) The development of computer models for learning pro-
provide a comparative guide to aid WSN designers in developing cesses that provide solutions to the problem of knowledge
suitable machine learning solutions for their specific application
challenges. acquisition and enhance the performance of developed
systems [2].
Index Terms—Wireless sensor networks, machine learning, 2) The adoption of computational methods for improving
data mining, security, localization, clustering, data aggregation,
event detection, query processing, data integrity, fault detection, machine performance by detecting and describing con-
medium access control, compressive sensing. sistencies and patterns in training data [3].

I. I NTRODUCTION Applying these definitions to WSNs, we see that the promise


of machine learning lies in exploiting historical data to improve

A wireless sensor network (WSN) is composed typically of


multiple autonomous, tiny, low cost and low power sensor
nodes. These nodes gather data about their environment and
the performance of sensor networks on given tasks without the
need for re-programming. More specifically, machine learn-
ing is important in WSN applications for the following main
collaborate to forward sensed data to centralized backend units reasons:
called base stations or sinks for further processing. The sensor
nodes could be equipped with various types of sensors, such 1) Sensor networks usually monitor dynamic environments
as thermal, acoustic, chemical, pressure, weather, and optical that change rapidly over time. For example, a node’s
sensors. Because of this diversity, WSNs have tremendous location may change due to soil erosion or sea turbulence.
potential for building powerful applications, each with its own It is desirable to develop sensor networks that can adapt
individual characteristics and requirements. Developing effi- and operate efficiently in such environments.
cient algorithms that are suitable for many different application 2) WSNs may be used for collecting new knowledge about
scenarios is a challenging task. In particular, WSN designers unreachable, dangerous locations [4] (e.g., volcano erup-
have to address common issues related to data aggregation, data tion and waste water monitoring) in exploratory applica-
reliability, localization, node clustering, energy aware routing, tions. Due to the unexpected behavior patterns that may
events scheduling, fault detection and security. arise in such scenarios, system designers may develop so-
Machine learning (ML) was introduced in the late 1950’s as a lutions that initially may not operate as expected. System
technique for artificial intelligence (AI) [1]. Over time, its focus designers would rather have robust machine learning al-
evolved and shifted more to algorithms which are computa- gorithms that are able to calibrate itself to newly acquired
knowledge.
Manuscript received September 13, 2013; revised January 21, 2014; accepted
3) WSNs are usually deployed in complicated environments
April 3, 2014. Date of publication April 24, 2014; date of current version where researchers cannot build accurate mathematical
November 18, 2014. The associate editor coordinating the review of this paper models to describe the system behavior. Meanwhile,
and approving it for publication was E. Hossain.
M. A. Alsheikh is with the School of Computer Engineering, Nanyang
some tasks in WSNs can be prescribed using simple
Technological University, Singapore 639798 and also with the Sense and Sense- mathematical models but may still need complex algo-
abilities Programme, Institute for Infocomm Research, Singapore 138632. rithms to solve them (e.g., the routing problem [5], [6]).
S. Lin and H.-P. Tan are with the Sense and Sense-abilities Programme,
Institute for Infocomm Research, Singapore 138632. Under similar circumstances, machine learning provides
D. Niyato is with the School of Computer Engineering, Nanyang Technolog- low-complexity estimates for the system model.
ical University, Singapore 639798. 4) Sensor network designers often have access to large
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. amounts of data but may be unable to extract important
Digital Object Identifier 10.1109/COMST.2014.2320099 correlations in them. For example, in addition to ensuring
1553-877X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 1997

communication connectivity and energy sustainability, Generally, these early surveys concentrated on reinforce-
the WSN application often comes with minimum data ment learning, neural networks and decision trees which were
coverage requirements that have to be fulfilled by limited popular due to their efficiency in both theory and practice.
sensor hardware resources [7]. Machine learning methods In this paper, we decided instead to include a wide variety
can then be used to discover important correlations in the of important up-to-date machine learning algorithms for a
sensor data and propose improved sensor deployment for comparison of their strengths and weaknesses. In particular,
maximum data coverage. we provide a comprehensive overview which groups these
5) New uses and integrations of WSNs, such as in cyber- recent techniques roughly into supervised, unsupervised and
physical systems (CPS), machine-to-machine (M2M) reinforcement learning methods. Another distinction between
communications, and Internet of things (IoT) technolo- our survey and earlier works is the way that machine learning
gies, have been introduced with a motivation of support- techniques are presented. Our work discusses machine learning
ing more intelligent decision-making and autonomous algorithms based on their target WSN challenges, so as to
control [8]. Here, machine learning is important to extract encourage the adoption of existing machine learning solutions
the different levels of abstractions needed to perform the in WSN applications. Lastly, we build on existing surveys
AI tasks with limited human intervention [9]. and go beyond classifying and comparing previous efforts, by
providing useful and practical guidelines for WSN researchers
However, there are a few drawbacks and limitations that and engineers who are interested in exploring new machine
should be considered when using machine learning techniques learning paradigms for future research.
in wireless sensor networks. Some of these are: The rest of the paper is organized as follows:
1) As a resource limited framework, WSN drains a con- • Section II introduces the reader to machine learning algo-
siderable percentage of its energy budget to predict the rithms and themes that will be referred to in later sections.
accurate hypothesis and extract the consensus relation- Simple examples will be given in the context of WSNs.
ship among data samples. Thus, the designers should • In Section III, we review existing machine learning efforts
consider the trade-off between the algorithm’s compu- to address functional issues in WSNs such as routing, lo-
tational requirements and the learned model’s accuracy. calization, clustering, data aggregation, query processing
Specifically, the higher the required accuracy, the higher and medium access control. Here, an issue is functional if
the computational requirements, and the higher energy it is essential to the basic operation of the wireless sensor
consumptions. Otherwise, the developed systems might network.
be employed with centralized and resource capable com- • Section IV investigates machine learning solutions in
putational units to perform the learning task. WSNs for fulfilling non-functional requirements, i.e. those
2) Generally speaking, learning by examples requires a large which determine the quality or enhance the performance of
data set of samples to achieve the intended generalization functional behaviors. Examples of such requirements in-
capabilities (i.e., fairly small error bounds), and the al- clude security, quality of service (QoS) and data integrity.
gorithm’s designer will not have the full control over the In this section, we also highlight some unique efforts in
knowledge formulation process [10]. specialized WSN applications.
• Section V outlines major difficulties and open research
During the past decade, WSNs have seen increasingly in- problems for machine learning in WSNs.
tensive adoption of advanced machine learning techniques. In • Finally, we conclude in Section VI and present a compar-
[11], a short survey of machine learning algorithms applied in ative guide with useful paradigms for furthering machine
WSNs for information processing and for improving network learning research in various WSN applications.
performance was presented. A related survey that discussed
the applications of machine learning in wireless ad-hoc net-
II. I NTRODUCTION TO M ACHINE L EARNING IN W IRELESS
works was published in [12]. The authors of [13] discussed
S ENSOR N ETWORKS
applications of three popular machine learning algorithms (i.e.,
reinforcement learning, neural networks and decision trees) at Usually, sensor network designers characterize machine
all communication layers in the WSNs. In contrast, special- learning as a collection of tools and algorithms that are used
ized surveys that touch on machine learning usage in specific to create prediction models. However, machine learning experts
WSN challenges have also been written. For instance, [14], recognize it as a rich field with very large themes and patterns.
[15] addressed the development of efficient outlier detection Understanding such themes will be beneficial to those who
techniques so that proper actions can be taken, and some of wish to apply machine learning to WSNs. Applied to numer-
these techniques are based on concepts from machine learning. ous WSNs applications, machine learning algorithms provide
Meanwhile, [16] discusses computational intelligence methods tremendous flexibility benefits. This section provides some of
for tackling challenges in WSNs such as data aggregation and the theoretical concepts and strategies of adopting machine
fusion, routing, task scheduling, optimal deployment and lo- learning in the context of WSNs.
calization. Here, computational intelligence is a branch of ma- Existing machine learning algorithms can be categorized by
chine learning that focuses on biologically-inspired approaches the intended structure of the model. Most machine learning
such as neural networks, fuzzy systems and evolutionary algorithms fall into the categories of supervised, unsupervised
algorithms [17]. and reinforcement learning [18]. In the first category, machine

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
1998 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

learning algorithms are provided with a labeled training data


set. This set is used to build the system model representing
the learned relation between the input, output and system
parameters. In contrast to supervised learning, unsupervised
learning algorithms are not provided with labels (i.e., there
is no output vector). Basically, the goal of an unsupervised
learning algorithm is to classify the sample sets to different
groups (i.e., clusters) by investigating the similarity between
the input samples. The third category includes reinforcement
learning algorithms, in which the agent learns by interacting
with its environment (i.e., online learning). Finally, some ma-
Fig. 1. Illustration example of node localization in WSNs in 3D space using
chine learning algorithms do not naturally fit into this clas- supervised neural networks.
sification since they share characteristics of both supervised
and unsupervised learning methods. These hybrid algorithms k-nearest neighbor algorithm is in the query processing sub-
(often termed as semi-supervised learning) aim to inherit the system (e.g., [24], [25]).
strengths of these main categories, while minimizing their 2) Decision Tree (DT): It is a classification method for
weaknesses [19]. predicting labels of data by iterating the input data through a
This section is mainly to introduce the reader to the al- learning tree [39]. During this process, the feature properties
gorithms that will be referred to in later sections. Moreover, are compared relative to decision conditions to reach a specific
examples will be given to demonstrate the process of adopting category. The literature is very rich with solutions that use DT
machine learning in WSNs. In Sections III and IV, such details algorithm to resolve different WSNs’ design challenges. For
will be omitted. For interested reader, please refer to [18], [20] example, DT provides a simple, but efficient method to identify
and references therein, for thorough discussions of machine link reliability in WSNs by identifying a few critical features
learning theory and its classical concepts. such as loss rate, corruption rate, mean time to failure (MTTF)
and mean time to restore (MTTR). However, DT works only
with linearly separable data and the process of building optimal
A. Supervised Learning
learning trees is NP-complete [40].
In supervised learning, a labeled training set (i.e., predefined 3) Neural Networks (NNs): This learning algorithm could
inputs and known outputs) is used to build the system model. be constructed by cascading chains of decision units (e.g.,
This model is used to represent the learned relation between perceptrons or radial basis functions) used to recognize non-
the input, output and system parameters. In this subsection, linear and complex functions [9]. In WSNs, using neural net-
the major supervised learning algorithms are discussed in the works in distributed manners is still not so pervasive due to
context of WSNs. In fact, supervised learning algorithms are the high computational requirements for learning the network
extensively used to solve several challenges in WSNs such weights, as well as the high management overhead. However,
as localization and objects targeting (e.g., [21]–[23]), event in centralized solutions, neural networks can learn multiple
detection and query processing (e.g., [24]–[27]), media access outputs and decision boundaries at once [41], which makes
control (e.g., [28]–[30]), security and intrusion detection (e.g., them suitable for solving several network challenges using the
[31]–[34]), and quality of service (QoS), data integrity and fault same model.
detection (e.g., [35]–[37]). We consider a sensor node localization problem (i.e., de-
1) K-Nearest Neighbor (k-NN): This supervised learning termining node’s geographical position) as an application ex-
algorithm classifies a data sample (called a query point) based ample of neural network in WSNs. Node localization can be
on the labels (i.e., the output values) of the near data sam- based on propagating angle and distance measurements of the
ples. For example, missing readings of a sensor node can received signals from anchor nodes [42]. Such measurements
be predicted using the average measurements of neighboring may include received signal strength indicator (RSSI), time
sensors within specific diameter limits. There are several func- of arrival (TOA), and time difference of arrival (TDOA) as
tions to determine the nearest set of nodes. A simple method illustrated in Fig. 1. After supervised training, neural network
is to use the Euclidean distance between different sensors. generates an estimated node location as vector-valued coor-
K-nearest neighbor does not need high computational power, as dinates in 3D space. Related algorithms to neural networks
the function is computed relative to local points (i.e., k-nearest include self-organizing map (or Kohonen’s maps) and learning
points, where k is a small positive integer). This factor cou- vector quantization (LVQ) (see [43] and references therein
pled with the correlated readings of neighboring nodes makes for an introduction to these methods). In addition to function
k-nearest neighbor a suitable distributed learning algorithm for estimation, one of the important applications of neural networks
WSNs. In [38], it has been shown that the k-NN algorithm is for big data (high-dimensional and complex data set) tuning
may provide inaccurate results when analyzing problems with and dimensionality reduction [44].
high-dimensional spaces (more than 10–15 dimensions) as the 4) Support Vector Machines (SVMs): It is a machine learn-
distance to different data samples becomes invariant (i.e., the ing algorithm that learns to classify data points using labeled
distances to the nearest and farthest neighbors are slightly training samples [45]. For example, one approach for detecting
similar). In WSNs, the most important application of the malicious behavior of a node is by using SVM to investigate

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 1999

Fig. 2. Example of non-linear support vector machines. Fig. 3. Simple 2D visualization of the principal component analysis algo-
rithm. It is important to note that the potential of the PCA algorithm is high
temporal and spatial correlations of data. To illustrate, given mainly when dealing with high-dimensional data [62].
WSN’s observations as points in the feature space, SVM divides
node clustering problem due to its linear complexity and sim-
the space into parts. These parts are separated by as wide
ple implementation. The k-means steps to resolve such node
as possible margins (i.e., separation gaps), and new reading
clustering problem are (a) randomly choose k nodes to be
will be classified based on which side of the gaps they fall
the initial centroids for different clusters; (b) label each node
on as shown in Fig. 2. An SVM algorithm, which includes
with the closest centroid using a distance function; (c) re-
optimizing a quadratic function with linear constraints (that is,
compute the centroids using the current node memberships and
the problem of constructing a set of hyperplanes), provides an
(d) stop if the convergence condition is valid (e.g., a predefined
alternative method to the multi-layer neural network with non-
threshold for the sum of distances between nodes and their
convex and unconstrained optimization problem [39]. Potential
perspective centroids), otherwise go back to step (b).
applications of SVM in WSNs are security (e.g., [33], [34],
2) Principal Component Analysis (PCA): It is a multivariate
[46]–[48]) and localization (e.g., [49]–[51]). For a detailed
method for data compression and dimensionality reduction that
discussion of the SVM theory, please refer to [45].
aims to extract important information from data and present
5) Bayesian Statistics: Unlike most machine learning algo-
it as a set of new orthogonal variables called principal com-
rithms, Bayesian inference requires a relatively small number
ponents [62]. As shown in Fig. 3, the principal components
of training samples [52]. Bayesian methods adapt probability
are ordered such that the first component corresponds to the
distribution to efficiently learn uncertain concepts (e.g., θ)
highest-variance direction of the data, and so on for the other
without over-fitting. The crux of the matter is to use the current
components. Hence, the least-variance components can be dis-
knowledge (e.g., collected data abbreviated as D) to update
carded as they contain the least information content. For exam-
prior beliefs into posterior beliefs p(θ|D) ∝ p(θ)p(D|θ), where
ple, PCA reduces the amount of transmitted data among sensor
p(θ|D) is the posterior probability of the parameter θ given
nodes by finding a small set of uncorrelated linear combinations
the observation D, and p(D|θ) is the likelihood of the obser-
of original readings. Furthermore, the PCA method simplifies
vation D given the parameter θ. One application of Bayesian
the problem solving by considering only few conditions in
inference in WSNs is assessing event consistency (θ) using
very large variable problems (i.e., tuning big data into tiny
incomplete data sets (D) by investigating prior knowledge
data representation) [63]. A thorough discussion of the PCA
about the environment. However, such statistical knowledge
theory (e.g., the eigenvalue, eigenvector, and covariance matrix
requirement limits the wide adoption of Bayesian algorithms
analysis) is given in [62].
in WSNs. A related statistical learning algorithm is Gaussian
process regression (GPR) model [53].
C. Reinforcement Learning
B. Unsupervised Learning Reinforcement learning enables an agent (e.g., a sensor node)
to learn by interacting with its environment. The agent will
Unsupervised learners are not provided with labels (i.e., there
learn to take the best actions that maximize its long-term
is no output vector). Basically, the goal of an unsupervised
rewards by using its own experience. The most well-known
learning algorithm is to classify the sample set into differ-
reinforcement learning technique is Q-learning [64]. As shown
ent groups by investigating the similarity between them. As
in Fig. 4, an agent regularly updates its achieved rewards based
expected, this theme of learning algorithms is widely used
on the taken action at a given state. The future total reward (i.e.,
in node clustering and data aggregation problems (e.g., [54]–
the Q-value) of performing an action at at a given state st is
[60]). Indeed, this wide adoption is due to data structures (i.e.,
computed using
no labeled data is available) and the desired outcome in such
problems. Q(st+1 , at+1 ) = Q(st , at ) + γ (r(st , at ) − Q(st , at )) (1)
1) K-Means Clustering: The k-means algorithm [61] is used
to recognize data into different classes (known as clusters). where r(st , at ) denotes the immediate reward of performing
This unsupervised learning algorithm is widely used in sensor an action at at a given state st , and γ is the learning rate

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2000 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

Fig. 4. Visualization of the Q-learning method.

that determines how fast learning occurs (usually set to value


between 0 and 1). This algorithm can be easily implemented
Fig. 5. Example of a sensor network routing problem using a graph along
in a distributed architecture like WSNs, where each node seeks with each path routing cost, traditional spanning tree routing, and the generated
to choose actions that are expected to maximize its long term sub-problems using machine learning that require only local communication to
rewards. It is important to note that Q-learning has been ex- achieve optimal routing (i.e., require only single-hop neighborhood information
exchange). (a) Original graph. (b) Traditional routing. (c) Simplified problems
tensively and efficiently used in WSN routing problem (e.g., using machine learning.
[65]–[68]).
• Meet QoS requirements in routing problem using rela-
III. F UNCTIONAL C HALLENGES tively simple computational methods and classifiers.
Fig. 5(a) and (b) illustrate a simple sensor network routing
In the design of WSNs, it is important to consider power problem using a graph, and the traditional spanning tree routing
and memory constraints of sensor nodes, topology changes, algorithm, respectively. To find the optimal routing paths, the
communication link failures, and decentralized management. network nodes have to exchange their routing information
Machine learning paradigms have been successfully adopted with each other. In the other side, Fig. 5(c) demonstrates how
to address various functional challenges of wireless sensor machine learning reduces the complexity of a typical routing
networks such as energy aware and real-time routing, query problem by only considering neighboring nodes’ information
processing and event detection, localization, node clustering that will be used to predict the full path quality. Each node
and data aggregation. will independently perform the routing procedures to decide
which channels to assign, and the optimal transmission power.
A. Routing in WSNs As we will discuss in this subsection, such mechanism is proven
to provide a near optimal routing decision with a very low
Designing a routing protocol for WSNs has to consider computational complexity.
various design challenges such as energy consumption, fault In this subsection, a wide range of machine learning-based
tolerance, scalability, and data coverage [6]. Sensor nodes are routing protocols developed for WSNs are described. Table I
provided with limited processing capabilities, small memory provides a summary and comparison of these routing protocols.
and low bandwidth. Traditionally, it is common to formulate The column “Scalability” implies the solutions’ capability to
a routing problem in wireless sensor networks as a graph route data in large scale networks.
G = (V, E), where V represents the set of all nodes, and E 1) Distributed Regression Framework: In [69], Guestrin et al.
represents the set of bidirectional communication channels con- introduced a general framework for sensors data modeling.
necting the nodes. Using this model, the routing problem can be This distributed framework relies on the network nodes for
defined as the process of finding the minimum cost path starting fitting a global function to match their own measurement. The
at the source vertex, and reaching all destination vertices, by nodes are used to execute a kernel linear regression in the form
using the available graph edges. This path is actually a spanning of weighted components. Kernel functions map the training
tree T = (V, E) whose vertices include the source (i.e., a root samples into some feature space to facilitate data manipulation
node) and destinations (i.e., leaf nodes that do not have any (refer to [71], [72] for an introduction to kernel methods).
child nodes). Solving such a tree with optimal data aggrega- The proposed framework exploits the fact that the readings
tion is found to be NP-hard, even when the full topology is of multiple sensors are highly correlated. This will minimize
known [5]. the communication overhead for detecting the structure of the
Machine learning allows a sensor network to learn from sensor data. Collectively, these results serve as an important
previous experiences, make optimal routing actions and adapt step in developing a distributed learning framework for wireless
to the dynamic environment. The benefits can be summarized networks using linear regression methods. The main advantages
as follows: of utilizing this algorithm are the good fitting results, and the
• Able to learn the optimal routing paths that will result in small overhead of the learning phase. However, it cannot learn
energy saving and prolonging the lifetime of dynamically non-linear and complex functions.
changing WSNs. 2) Data routing Using Self-Organizing Map (SOM):
• Reduce the complexity of a typical routing problem by Barbancho et al. [70] introduced “Sensor Intelligence Routing”
dividing it into simpler sub-routing problems. In each sub- (SIR) by using SOM unsupervised learning to detect optimal
problem, nodes formulate the graph structures by consid- routing paths as illustrated in Fig. 6. SIR introduces a slight
ering only their local neighbors, thus achieving low cost, modification on the Dijkstra’s algorithm to form the network
efficient and real-time routing. backbone and shortest paths from a base station to every node

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2001

TABLE I
S UMMARY OF W IRELESS S ENSOR N ETWORK ROUTING P ROTOCOLS T HAT A DOPT M ACHINE L EARNING PARADIGMS

determined in two phases. The first phase is “Join Query


Forward” that discovers an optimal route, as well as updates
the Q-values (a prediction of future rewards) of the Q-learning
algorithm. The second phase, called “Join Reply Backward”,
creates the optimal path to allow multicast transmissions. Using
Q-learning for multicast routing in mobile ad hoc networks
can reduce the overhead for route searching. However, energy
efficiency is the key requirement for WSNs, so Q-MAP needs
to be modified for WSNs (e.g., considering hierarchical and
geographic routing).
The Federal Communications Commission (FCC) has ded-
icated the frequency band from 3.1 to 10.6 GHz (7,500 MHz
of spectrum) for the use of unlicensed ultra-wideband (UWB)
communication [73]. UWB is a technique for transmitting
bulky data for short distances using a wide spectrum of fre-
Fig. 6. SOM construction of the SIR algorithm, where routing link is selected quency bands with relatively low power. In [66], Dong et al.
based on the multi-hop path QoS metrics (latency, throughput, error rate, and used a similar idea as [65] to enhance geographic routing
duty cycle) and the Dijkstra’s algorithm [70]. in UWB equipped sensor networks. “Reinforcement Learning
based Geographic Routing” (RLGR) protocol considers the
in the network. During route learning, the second layer neurons sensor node energy and delay as metrics for formulating the
compete with each other to reserve high weights in the learning learning reward function. This hierarchical geographic routing
chain. Accordingly, the weights of the winning neuron and its uses the UWB technology for detecting the nodes’ locations,
neighboring neurons are updated to further match the input where only the cluster heads are equipped with UWB devices.
patterns. Clearly, the learning phase is a highly computational Moreover, each node uses a simple look-up table to maintain
process due to the neural network generation task. As a result, the information about its neighbors (as location and energy
it should be performed within a resourceful central station. of the neighbors are needed during network learning). These
However, the execution phase does not incur computational information are exchanged between nodes using short “hello”
cost, and can be run on the network nodes. As a result, this messages to learn the best routing actions. The main benefit of
hybrid technique (i.e., a combination of the Dijkstra’s algorithm using reinforcement learning in routing is that it does not re-
and the SOM model) takes into account the QoS requirements quire information about the global network structure to achieve
(latency, throughput, packet error rate, and duty cycle) during an acceptable routing solution.
the process of updating neurons’ weights. The main obstacles of In [68], Arroyo-Valles et al. introduced “Q-Probabilistic
applying such an algorithm are the complexity of the algorithm Routing” (Q-PR), an enhanced geographic routing algorithm
and the overhead of the learning phase in the case that the for WSNs that learns from previous routing decisions (e.g., to
network’s topology and setting change. select the routing path that has the highest delivery rate over
3) Routing Enhancement Using Reinforcement Learning the past period of time). This protocol differs from RLGR
(RL): In multicast routing, a node sends the same message [66] in the QoS support. Depending on the importance of
to several receivers. Sun et al. [65] demonstrated the use of messages, expected delivery rate, and the power constraints, Q-
Q-learning algorithm to enhance multicast routing in wireless PR determines the optimal routes using reinforcement learning
ad hoc networks. Basically, the Q-MAP multicast routing al- and a Bayesian decision model. This algorithm discovers the
gorithm is designed to guarantee reliable resource allocation. next hop during the message routing time (i.e., an on-line
A mobile ad hoc network may consist of heterogeneous nodes, operation). A Bayesian method is used to handle the decision of
where different nodes have different capabilities. In addition, transmitting the packets to the set of candidate neighbor nodes,
it is not feasible to maintain a global, up-to-date knowledge taking into account the data importance, nodes’ profiles, and
about the whole network structure. The multicast routes are expected transmission and reception energy.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2002 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

network. Principally, ML techniques improve the operation of


node clustering and data aggregation as follows:

• Usage of machine learning to compress data locally at


cluster heads by efficiently extracting similarity and dis-
similarity (e.g., from faulty nodes) in different sensors’
readings.
• Machine learning algorithms are employed to efficiently
elect the cluster head, where appropriate cluster head
selection will significantly reduce energy consumption and
enhance the network’s lifetime.

Table II compares data aggregation and node clustering so-


lutions. The column “Balancing energy consumption” indicates
whether the protocol distributes computationally intensive tasks
into all nodes while considering the remaining energy informa-
tion. The column “Topology aware” indicates the requirement
for full network topology knowledge.
1) Large Scale Network Clustering Using Neural Network:
Fig. 7. Data aggregation example in a clustered architecture, where the nodes Hongmei et al. [79] discussed the development of self-managed
are marked as working, dead and cluster heads.
clusters using neural networks. This scheme targets the clus-
tering problem in large scale network with short transmissions
Förster and Murphy [67] also introduced an enhancement radii in which centralized algorithms may not work efficiently.
to routing in WSN using reinforcement learning. A novel However, for large transmission radii, the performance of this
technique for exchanging node local information as a feedback algorithm is close to that of centralized algorithms in terms of
response to other nodes, named “Feedback Routing for Opti- efficiency and quality of service.
mizing Multiple Sinks in WSN” (FROMS) is introduced. The 2) Electing a Cluster Head Using Decision Trees:
main advantage of FROMS is to allow efficient routing from Ahmed et al. [80] applied a decision tree algorithm to solve
multiple sources to multiple sinks. The Q-values are initialized the cluster head election problem. This approach uses several
based on the hop counts to every node in the network. The critical features while iterating the input vector through the
hop counts can be collected using short “hello messages”, decision tree such as distance to the cluster centroids, battery
exchanged between the nodes at earlier stages of the network level, the degree of mobility, and the vulnerability indications.
deployment. FROMS extends the basic mechanism of RLGR The simulation reveals that this scheme enhances the overall
[66] by assuming that all nodes can directly communicate with performance of cluster head selections when compared to the
their neighbors. “Low Energy Adaptive Clustering Hierarchy” (LEACH) [87]
The key disadvantage of reinforcement learning-based rout- algorithm.
ing algorithms is the limited recognition of future knowledge 3) Gaussian Process Models for Sensor Readings: Gaussian
(i.e., inability to look ahead). Therefore, the algorithms are not process (GP) is a combination of random variables (stochas-
suitable for highly dynamic environments as they require a long tic variables) that is parameterized using mean and covari-
time to learn optimal routes. ance functions. Ertin [81] presented a scheme for initializing
probabilistic models of the readings based on Gaussian pro-
cess regression. Comparatively, Kho et al. [82] also extended
Gaussian process regression to adaptively sample sensor data
B. Clustering and Data Aggregation
depending on its importance. Focusing on energy consumption,
In large scale energy-constrained sensor networks, it is inef- [82] studied a trade-off between computational cost and so-
ficient to transmit all data directly to the sink [74]. One efficient lution optimality. Broadly speaking, Gaussian process models
solution is to pass the data to a local aggregator (known as a are preferable in the problems with small training data sets
cluster head) which aggregates data from all the sensors within (less than a few thousand samples) and for predicting smooth
its cluster and transmits to the sink. This will typically result in functions [53]. However, WSN designers must consider the
energy savings. There are several works that have discussed the high computational complexity of such methods when dealing
optimal selection of the cluster head (i.e., cluster head election with large scale networks.
process), such as in [75]–[77]. Taxonomy and comparison of 4) Data Aggregation Using Self-Organizing Map (SOM):
classical clustering algorithms are presented in [78]. The SOM algorithm is an unsupervised, competitive learning
Fig. 7 represents the cluster-based data aggregation from method for mapping from high dimensional spaces to low
sources to a base station in WSNs. In this case, there could be dimensions. Lee et al. [54] proposed a novel network architec-
some faulty nodes which must be removed from the network. ture called “Cluster-based self-Organizing Data Aggregation”
Such faulty nodes may generate incorrect readings that could (CODA). In this architecture, the nodes are able to classify the
negatively affect the accuracy of the overall operation of the aggregated data using a self-organizing algorithm. The winning

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2003

TABLE II
C OMPASSION OF D IFFERENT M ACHINE L EARNING -BASED DATA AGGREGATION AND N ODE C LUSTERING M ECHANISMS

neuron j ∗ , that has a weight vector w(t) closest to the input 6) Data Aggregation Using Principal Component Analysis:
vector x(t), is defined as We begin by introducing two important algorithms that are effi-
ciently used in combination with principal component analysis
j ∗ = arg min xj (t) − wj (t) , j = 1, . . . , N (2) (PCA) to enhance data aggregation in WSNs.
j
• Compressive sensing (CS) has been recently explored to
where N represents the number of neurons in the second layer. replace the traditional scheme of “sample then compress”
Further, the winning node and its neighbors are updated as with “sample while compressing”. CS explores sparsity
follows: property of signals to recover the original signal from few
random measurements. A simple introduction to CS is
wj (t + 1) = wj (t) + h(t) (xj (t) − wj (t)) (3) provided in [88].
• Expectation-maximization (EM) [89] is an iterative al-
where w(t) and w(t + 1) represent the values of a neuron at gorithm composed of two steps, i.e., an expectation (E)
time t and t + 1, respectively. In addition, h(t) is the Gaussian step and a maximization (M) step. During its E-step, EM
neighborhood function given as formulates the cost function while fixing the current ex-
  pectation of the system parameters. Subsequently, the M-
1 j ∗ − j2 step recomputes parameters that minimize the estimation
h(t) = √ exp − . (4)
2πσ 2σ 2 (t) error of the cost function.
Masiero et al. [55], [56] developed a method for estimat-
Using CODA for data aggregation will result in enhancing ing distributed observations using few collected samples from
the quality of data, saving network energy, and reducing the a WSN. This solution is based on the PCA technique to
network traffic. produce orthogonal components used by compressive sensing
5) Applying Learning Vector Quantization for Online Data to reconstruct the original readings. Moreover, this method
Compression: While the above methods require a complete is independent of the routing protocol due to its ability to
knowledge about the network topology, some algorithms may estimate data spatial and temporal correlations. Similarly,
not have such a restriction. For example, Lin et al. [83] in- Rooshenas et al. [57] applied PCA to optimize the direct
troduced a technique called “Adaptive Learning Vector Quan- transmission of readings to a base station. PCA results in
tization” (ALVQ) to accurately retrieve compressed versions considerable traffic reduction by combining nodes’ collected
of readings from the sensor nodes. Using data correlation and data into fewer packets. This distributed technique is executed
historical patterns, ALVQ uses the LVQ learning algorithm to in intermediate nodes to combine all the incoming packets
predict the code-book using past training samples. The ALVQ instead of forwarding them to destinations.
algorithm minimizes the required bandwidth during transmis- Equally important, Macua et al. [58] introduced distributed
sion, and enhances the accuracy of original reading recovery consensus-based methods for data compression using PCA and
from the compressed data. maximum likelihood of the observed data. These methods are
The crucial disadvantage of using LVQ for online data aggre- “Consensus-based Distributed PCA” (CB-DPCA) which relies
gation is that dead neurons, that are far away from the training on exploring the eigenvectors of local covariance matrices,
samples, will never take part in the competition. Therefore, it is and “Consensus-based EM Distributed PCA” (CB-EM-DPCA).
important to develop algorithms that are robust against outliers. The latter uses a distributed EM algorithm. These methods
By the same token, LVQ is suitable for representing big data set adopt the consensus algorithm [90] to predict the probability
by few vectors [43]. distribution of the data, and hence calculate the global dominant

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2004 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

eigenvectors using only local communication parameters (i.e.,


single hop communications). CB-DPCA and CB-EM-DPCA
can be tuned to provide a trade-off between the achieved
approximation quality and the communication cost by adjusting
the consensus round parameter. For example, to increase the
algorithm accuracy, the number of consensus rounds should be
increased which will increase the computational requirements
of the algorithm.
Recently, Fenxiong et al. [84] have tackled the problem of
data compression using PCA by transforming the data from a
high dimensional space to a lower one. The data is collected
Fig. 8. Event detection and query processing enhancement using machine
over time, and then it is transmitted from each node to its learning methods by assessing event validity and delimiting queried areas.
corresponding cluster head. At the cluster head, the data matrix System controller initiates query that is spread by the query processing unit
is compressed to eliminate the data redundancy. The data com- to intended nodes. In contrast, events are detected by nodes to monitor specific
signs within the monitored area.
pression is achieved by ignoring principal components through
which the data has the least variation values. saves the node energy budget during data gathering process, and
The high computational requirement is the main issue of hence prolong the network lifetime.
PCA-based data aggregation solutions. Other than increasing
throughput, these solutions elegantly cope with the high dimen- C. Event Detection and Query Processing
sionality of collected data by keeping only important informa-
tion (data dimensionality reduction). Event detection and query processing are considered to be
7) Collaborative Data Processing Through k-Means Algo- functional requirements of any large scale sensor network. This
rithm: Li et al. [60] addressed the fundamental concepts for introduces the need for trustworthy event scheduling and de-
distributed detection and tracking of a single target using tection with minimal human intervention. Monitoring in WSNs
sensor networks. “Collaborative Signal Processing” (CSP) is can be classified as: event-driven, continuous, or query-driven
a framework for information gathering from the monitored [6]. Fig. 8 illustrates event detection and query processing
environment. Additionally, this algorithm can track multiple operations in WSNs. Fundamentally, machine learning offers
targets using classification techniques such as SVM and k- solutions to restrict query areas and assess event validity for
nearest neighbors. efficient event detection and query processing mechanisms.
Classical surveillance systems have to collect massive data This adoption will result in the following benefits:
from surveillance cameras. Together with the requirement • Learning algorithms enable the development of efficient
of highly complex computation and analysis process, this event detection mechanisms with limited requirements of
introduces the need for more practical methods. Therefore, storage and computing resources. Besides they are able to
Tseng et al. [59] proposed “Integrated Mobile Surveillance and assess the accuracy of such events using simple classifiers.
Wireless Sensor System” (iMouse) which adopts powerful mo- • Machine learning facilitates the development of effective
bile sensors to enhance traditional surveillance systems. iMouse query processing techniques for WSNs, that determine
divides the monitored sites into a number of clusters using the the search regions whenever a query is received without
k-means unsupervised learning algorithm. Each cluster will be flooding the whole network.
repeatedly monitored by only one mobile sensor. The design of effective event detection and query processing
Although these ideas (using k-means for data processing) are solutions has recently received increased attention from WSNs
appealing because of the straightforward implementations and research community. The simplest techniques rely on defining a
low complexity, they are still sensitive to outliers and to initial strict threshold value for the sensed phenomenon and alarming
seed selections. the system manager of any violations. However, in most recent
8) Role-Free Clustering: In [85], Förster and Murphy intro- applications of WSNs, event and query processing units are
duced the WSN cluster formulation method called “Role-Free often complicated and require more than a predefined threshold
Clustering with Q-Learning for Wireless Sensor Networks” value. One such emerging technique is to use machine learning
(CLIQUE). Instead of performing an election process, CLIQUE to develop advanced event detection and query processing
enables each node to investigate its ability to function as a solutions. Table III presents a comparison of functional aspects
cluster-head node. This is achieved through the use of Q- of different machine learning-based event detection and query
learning algorithm in combination with some dynamic network processing solutions for WSNs.
parameters such as energy levels. 1) Event Recognition Through Bayesian Algorithms:
9) Decentralized Learning for Data Latency: Mihaylov et al. Krishnamachari and Iyengar [91] investigated the use of WSNs
[86] addressed the problem of high data latency in random for detecting environmental phenomenon in a distributed
topology sensor networks using reinforcement learning. Each manner. Readings will be considered as faulty if their values
node executes the learning algorithm locally to optimize the exceed a specific threshold. This study employs decentralized
data aggregation without the need for a central control station. Bayesian learning that detects up to 95 percent of the faults,
Consequently, the efficiency of the whole network is enhanced and will result in recognizing the event region. It is important
with smaller learning transmission overhead. The approach to note that Chen et al. [94] provided corrections to several

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2005

TABLE III
C OMPARISON OF F UNCTIONAL A SPECTS OF D IFFERENT M ACHINE L EARNING -BASED E VENT D ETECTION AND Q UERY
P ROCESSING S OLUTIONS FOR WSN S

processing technique in WSNs. For example, Winter et al. [24]


developed an in-network query processing solution using the k-
nearest neighbor algorithm, namely the “K-NN Boundary Tree”
(KBT) algorithm. Each node that is aware of its location will
determine its k-NN search region whenever a query is received
from the application manager.
Correspondingly, Jayaraman et al. [25] extended the query
processing design of [24]. “3D-KNN” is a query processing
scheme for WSNs that adopts the k-nearest neighbor algorithm.
This approach restricts the query region to bound at least k-
Fig. 9. Human activity recognition using the hidden Markov model and the nearest nodes deployed within a 3D space. In addition, signal-
naive Bayes classifier [92].
to-noise ratio (SNR) and distance measurements are used to
errors related to the distributed Bayesian algorithms that have refine the k-nearest neighbor.
been derived in [91]. In summary, these corrections result in The primary concerns of such k-NN-based algorithms for
enhanced error and performance calculations for the distributed query processing are the requirement of large memory footprint
Bayesian algorithm proposed in [91]. to store every collected sample and the high processing delay in
Additionally, Zappi et al. [92] presented a real-time approach large scale sensor networks.
for activity recognition using WSNs that accurately detects 4) Distributed Event Detection for Disaster Management
body gesture and motion. Initially, the nodes, that are spread Using Decision Tree: Bahrepour et al. [27] developed decision
throughout the body, detect the organ motion using an ac- tree-based event detection and recognition for sensor network
celerometer sensor with three axis measurements (positive, neg- disaster prevention systems. The main application of this de-
ative and null), where these measurements are used by a hidden centralized mechanism is the fire detection in residential areas.
Markov model (HMM) to predict the activity at each sensor. Most noteworthy, the final event detection decision is made by
Sensor activation and selection rely on the sensor’s potential using a simple vote from the highest reputation nodes.
contributions in classifier accuracy (i.e., select the sensors that 5) Query Optimization Using Principal Component Analysis
provide the most informative description of the gesture). To (PCA): Malik et al. [93] optimized traditional query processing
generate a final gesture decision, a naive Bayes classifier is used in WSNs using data attributes and PCA, thus reducing the
to combine the independent node predictions so as to maximize overhead of such a process. PCA has been used to dynamically
the posterior probability of the Bayes theorem. The architecture detect important attributes (i.e., dominant principal compo-
of the proposed system is shown in Fig. 9. nents) among the whole correlated data set. Fig. 10 shows
2) Forest Fire Detection Through Neural Network: WSNs the workflow of the proposed algorithm in four fundamental
were actively used in fire detection and rescue systems (see [95] steps. In Step 1, the structured query language (SQL) request,
and references therein for requirements and challenges of such which contains the human intelligible attributes, is sent to the
systems). Moreover, the use of WSNs for forest fire detection database management and optimization system. At the database
can achieve better performance than using satellite-based solu- management and optimization system, the original query is
tions while costing much less. Yu et al. [26] presented a real- optimized where the high-variance components are extracted
time forest fire detection scheme based on a neural network from historical data using the PCA algorithm (Step 2). Then,
method. Data processing will be distributed to cluster heads, the optimized query is diffused to the wireless sensor network to
and only important information will be aggregated to a final extract the sensory data as shown in Steps 3 and 4, respectively.
decision maker. Although the idea is creative and beneficial Later, the original attributes (i.e., human intelligible attributes)
to the environment, the classification task and system core are can be extracted from the optimized attributes by reversing the
hardly interpretable when introducing such systems to decision process of PCA.
makers. As a result, this algorithm guarantees 25 percent improve-
3) Query Processing Through k-Nearest Neighbors: K- ment in energy saving of the network nodes while achieving
nearest neighbor query is considered as a highly effective query 93 percent of accuracy rates. However, this enhancement is at

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2006 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

Fig. 10. Workflow of the query optimization and reduction system using PCA
proposed in [93].

the cost of accuracy of the collected data (as some of the data
components will be ignored). Therefore, this solution may not
be ideal for the applications with high accuracy and precision Fig. 11. Localization using few beacon nodes by utilizing machine learning
requirements. algorithms and other signal strength indicators (reformulated from [96]).

D. Localization and Objects Targeting • Beacon node (or anchor node) is any node that is able
to recognize its location by using positioning hardware or
Localization is the process of determining the geographic co- from its manual placement. In most systems, the beacon
ordinates of network’s nodes and components. Position aware- node is used as a reference point to estimate the coordi-
ness of sensor nodes is an important capability, since most nates of other unknown nodes.
sensor network operations are typically based on the location • Received signal strength indication (RSSI) is an indi-
[96]. In most large scale systems, it is financially infeasible cator of the received signal strength, used to represent
to use global positioning system (GPS) hardware in each node transmission performance or distance.
for this purpose. Moreover, GPS service may not be available
Next, we discuss some seminal WSN localization tech-
in the observed environment (e.g., indoor). Relative location
niques that use machine learning and summarize our reviews in
measurement is sufficient for certain uses. However, by using
Table IV.
the absolute locations for a small group of nodes, relative
1) Bayesian Node Localization: Morelande et al. [21] used
locations can be transformed into absolute ones [97]. In order
a Bayesian algorithm to develop a localization scheme for
to enhance the performance of proximity based localization,
WSNs using only few anchor points. This study focuses on
additional measurements relying on distance, angle or a hybrid
the enhancement of progressive correction [109], which is a
of them can be used. Distance measurements can be obtained by
method for predicting samples from likelihoods to get closer to
utilizing various techniques such as RSSI, TOA, and TDOA.
the posterior likelihood. The proposed algorithm is efficiently
Furthermore, angle of the received signal can be measured
applicable for node localization in large scale systems (i.e.,
using compasses or special smart antennas [98]. A valuable in-
networks with a few thousands of nodes). The idea of using
troduction about the basics of different range-based localization
the Bayesian algorithm for localization is appealing as it can
techniques is provided in [42].
handle incomplete data sets by investigating prior knowledge
Sensor nodes may encounter changes in their location after
and probabilities.
deployment (e.g., due to movement). The benefits of using ma-
2) Robust Location-Aware Activity Recognition: Lu and Fu
chine learning algorithms in sensor node localization process
[22] addressed the problem of sensor and activity localization in
can be summarized as follows:
smart homes. The activities of interest include using the phone,
• Converting the relative locations of nodes to absolute listening to the music, using the refrigerator, studying, etc. In
ones using few anchor points. This will eliminate the such applications, designers need to comply with both human
need for range measurement hardware to obtain distance and environment constraints in a convenient and easily operated
estimations. way. The proposed framework, named “Ambient Intelligence
• In surveillance and object targeting systems, machine Compliant Object” (AICO), facilities the human interaction
learning can be used to divide the monitored sites into a with the home electric devices in a more intelligent manner
number of clusters, where each cluster represents specific (e.g., automatic power supply management). At its core, AICO
location indicator. uses multiple naive Bayes classifiers to determine the resident’s
We begin by defining some terms that are widely used in current location and evaluate the reliability of the system by
WSN localization literature, as illustrated in Fig. 11. detecting any malfunctioned sensors. Although this method
• Unknown node is a node that cannot determine its current provides a robust mechanism for localization, it is still
location. application-dependent and the designers must predefine a set of

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2007

TABLE IV
S UMMARY OF L OCALIZATION A LGORITHMS IN WSN S T HAT A DOPT M ACHINE L EARNING C ONCEPTS AND T HEIR P RIME A DVANTAGES . T HE C OLUMN
“A PPLICATIONS ” S PECIFIES THE TARGETED A PPLICATION ( S ) OF THE P ROPOSED S OLUTION (E ITHER G ENERAL -P URPOSE OR A S PECIFIC A PPLICATION )

supported activities in advance. This is because the used learn- a mobile node localization scheme by employing SVM and
ing features are selected and evaluated manually depending connectivity information capabilities. In its initial step, the
on the activities and the domain of interest. To overcome this proposed method has to detect node movement using their
limitation in this centralized system, we recommend investi- radio frequency oscillation such as RSSI metric. For movement
gating unsupervised machine learning algorithms for automatic detection, SVM will be executed to provide the new location.
feature extraction such as the deep learning methods [9] and the Similar to [51], Tran and Nguyen [50] proposed “Localiza-
non-negative matrix factorization algorithm [110]. tion Based on Support Vector Machines” (LSVM) method for
3) Localization Based on Neural Network: Shareef et al. node localization in WSNs. To achieve its design goals and
[23] compared three localization schemes that are based on given an appropriate training data, LSVM adopts several de-
different types of neural networks. In particular, this study con- cision metrics such as connectivity information and indicators.
siders WSN localization using multi-layer perceptron (MLP), Even though LSVM offers distributed localization in a fast and
radial basis function (RBF), and recurrent neural networks effective manner, its performance is still sensitive to outliers in
(RNN). In summary, the RBF neural network results in the training samples.
minimum error at the cost of high resource requirements. 5) Localization Using Support Vector Regression (SVR):
In contrast, MLP consumes the minimum computational and Limited resources and high data dimensionality impede the
memory resources. wide adoption of SVR learning in WSNs. Therefore, Kim et al.
Likewise, Yun et al. [99] adopted a similar design, in which [49] developed the idea of using lightweight implementation of
two classes of algorithms for sensor node localization using SVR by dividing the original regression problem into several
RSSI from anchor nodes are proposed. The first class utilizes sub-problems. Basically, the algorithm starts by dividing the
the fuzzy logic system and genetic algorithm. In the second network into a set of sub-networks, thus a small number of
class, the neural network is adopted to predict the sensor data has to be processed by each regression algorithm (i.e.,
location by using RSSI measurements from all anchor nodes SVR’s sub-predictors). Then, the learned hypothesis models of
as an input vector. In the same way, Chagas et al. [100] applied the sub-predictors are combined together using a customized
neural networks for WSNs localization with RSSI as an input ensemble combination technique. Thus, in addition to its low
to the learning network. computational requirements and robustness against noisy data,
The main advantage of these NNs-based localization al- this solution converges to the preferred solution with low com-
gorithms is their ability to provide coordinates in the form putational requirement.
of continuous-valued vectors (e.g., coordinates in 3D space). 6) Decision Tree-Based Localization: Based on decision
However, unlike statistical or Bayesian alternatives, neural tree learning, Merhi et al. [101] developed an acoustic target
network is a non-probabilistic method. This fact limits the de- localization method for WSNs. Exact locations of targets are
signers’ certainty about precision of unknown node’s predicted determined using the time difference of arrival (TDOA) metric
coordinates, and hence restricts their ability to manage the cost in a spatial correlation decision tree. Also, this work proposed
of localization errors. the design of “Event Based MAC” (EB-MAC) protocol, that en-
4) Localization Using Support Vector Machine (SVM): The ables event-based localization and targeting in acoustic WSNs.
SVM technique has been widely used for node localization in The proposed framework was implemented using a MicaZ
WSNs, where having a self-positioning device to each sensor board that supports ZigBee 802.15.4 specifications for personal
is infeasible. As an illustration, Yang et al. [51] developed area networks.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2008 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

Using the GPS functionality to support localization in un- for anchor nodes. The algorithm is based on SOM, and it
derwater wireless sensor network’s applications may not be operates efficiently for any number of nodes. The contribution
feasible due to the propagation limitation of the GPS signal of [107] over [106] is that the proposed algorithm distributes the
through water [111]. Erdal et al. [102] developed a system computation tasks to all nodes in the network, which eliminates
for submarine detection in underwater surveillance systems, the needs for a central unit and minimizes the transmission
so that a randomly deployed node finds its location in the 3D overhead of the algorithm.
space based on beacon node coordinates. Each monitoring unit 10) Path Determination Using Reinforcement Learning:
consists of a sensor that is fixed with a cable to a surface buoy. Li et al. [108] developed a reinforcement learning-based
Data is collected using the buoys, where they are transmitted to localization method for WSNs, called “Dynamic Path deter-
the central processing unit. At the central unit, a decision tree mination of Mobile Beacons” (DPMB), suitable for real-time
classifier is used to recognize any submarines in the monitored management of the mobile beacons. The mobile beacon (MB),
sites. which is aware of the physical location during its movement,
7) Sensor Placements Through Gaussian Processes: will be used to determine the positions of large number of
Krause et al. [103] provided an optimized solution to sensor sensor nodes. In brief, the states of the Q-learning algorithm
placement in applications with spatially correlated data such as are used to represent the different positions of the MB, and the
temperature monitoring systems. One interesting feature of this algorithm target is to cover all the sensors in the monitored area
solution is the development of a lazy learning scheme based (i.e., all the sensors should hear a location update message from
on Gaussian process model for the investigated phenomenon. the MB at some stages). The entire operation will be run in the
Lazy learning algorithms store training samples and delay the mobile beacon, and hence, this will save the resources of the
major processing task until a classification request is received. unknown nodes. However, as a centralized method, the entire
Moreover, this solution aims to achieve robustness against node system will fail in the event of mobile beacon malfunctions.
failures and model ambiguity when choosing optimal locations
for sensors. E. Medium Access Control (MAC)
8) Spatial Gaussian Process Regression: Gu and Hu [104]
developed a distributed protocol for collective node motion. In WSNs, a number of sensors cooperate to efficiently trans-
This approach employs distributed Gaussian process regres- fer data. Therefore, designing MAC protocols for WSNs poses
sion (DGPR) to predict optimal locations for mobile nodes’ different challenges from typical wireless networks, as well as
movements. Traditional Gaussian process regression (GPR) energy consumption and latency [112]. Also, the duty cycle
algorithms have computational complexity of O(N 3 ), where (i.e., fraction of time that a sensor node is active) of the node has
N is the size of samples. However, this solution adopts a sparse to be controlled to conserve energy. Therefore, the MAC proto-
Gaussian process regression algorithm to reduce such com- cols have to be modified to support efficient data transmission
putational complexity. Each node will execute the regression and reception of the sensor nodes. A comprehensive survey of
algorithm independently using only spatiotemporal information MAC protocols in WSNs is provided in [113].
from local neighbors. Recently, machine learning methods have been used to en-
9) Localization Using Self-Organizing Map (SOM): Given hance the performance of MAC protocols in WSNs. Specifi-
some anchor positions, Paladina et al. [105] introduced the cally, this is achieved through the following points:
SOM-based positioning solution for WSNs consisting of thou- • Machine learning can be used to adaptively determine the
sands of nodes. The proposed scheme is executed in each duty cycle of a node using the transmission history of the
node with a simple SOM algorithm that consists of a 3 × 3 network. In particular, the nodes, which are able to predict
input layer connected to the 2 neurons of the output layer. when the other nodes’ transmissions will finish, can sleep
In particular, the input layer is formulated using the spatial in the meantime and wake up (to transmit data) just when
coordinates of 8 anchor nodes surrounding the unknown node. the channel is expected to be idle (i.e., when no other node
After a sufficient training, the output layer is used to represent is transmitting). For WSNs, many factors, such as energy
the unknown node’s spatial coordinates in a 2D space. The consumption and latency, are more important than fairness
main disadvantage of this scheme is that the nodes should when designing MAC protocols.
be distributed uniformly and equally spaced throughout the • Achieving secured data transmission by combining the
monitored area. concepts of machine learning and MAC protocols. Such
Unlike traditional methods that require absolute locations MAC layer security schemes are independent of the pro-
of a few nodes to find the positions of the unknown nodes, posed application and are able to iteratively learn sporadic
Giorgetti et al. [106] introduced a localization algorithm that is attack patterns.
only based on connectivity information and the SOM algorithm. Table V gives a brief comparison between MAC protocols
The developed method is highly suitable for networks with reviewed in this subsection. The column “Synchronization” in-
limited resources, as it does not require a GPS-enabled device. dicates whether the protocol assumes that time synchronization
However, since this is a centralized algorithm, each node trans- is achieved externally, and “Adaptivity to changes” indicates the
mits the information of its neighbors to the central processing ability to handle topology changes such as nodes failure.
unit to determine the adjacency matrix and hence the node’s 1) Bayesian Statistical Model for MAC: Kim and Park [28]
location. Similarly, Hu and Lee [107] presented a scheme that presented a contention-based MAC protocol for managing ac-
provides node localization service in WSNs without the need tive and sleep times in WSNs. Instead of continuously sensing

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2009

TABLE V
C OMPARISON OF MAC P ROTOCOLS

the medium, this scheme utilizes a Bayesian statistical model


to learn when the channel can be allocated, and hence save
network energy. Furthermore, in its basic design, this scheme is
targeted for CSMA contention-based protocols such as “Sensor
MAC” (S-MAC) [116], and “Timeout MAC” (T-MAC) [117].
2) Neural Network-Based MAC: Time division multiple ac-
cess (TDMA)-based protocols employ periodic time frames to Fig. 12. Example of the Q-values of a node over three frames in a WSN that
separate the medium access of different nodes. This process employs ALOHA-QIR to manage medium access [114].
requires a central unit to broadcast a transmission schedule
in case of topology changes. Shen and Wang [29] proposed a broadcast their future transmission allocation such that other
solution to broadcast the transmission schedule in TDMA using nodes can sleep during reserved frame. The Q-value map in
a fuzzy hopfield neural network (FHNN) technique. Time slots each node represents the willingness for slot reservation, where
are distributed among the nodes in a network while maximizing the node with higher Q-value will attain the right of slot allo-
the cycle length, preventing any potential transmission colli- cation and hence transmission of its own data. Fig. 12 demon-
sions and reducing the processing time. strates the steps of updating the Q-values over three frames of
In the same way, Kulkarni and Venayagamoorthy [30] pre- a node that is allowed to transmit a maximum of two packets
sented an innovative CSMA-based MAC solution, that can in each frame. Initially, the Q-values are initialized to zero,
prevent denial-of-service (DoS) attacks in WSNs. Denial-of- i.e., Q(f rame#0) = {0, 0, 0}. Upon successful transmission,
service is a type of attacks that generates huge useless traffic the Q-value of each time slot is updated using the Q-learning
(i.e., flood the network), thus preventing the delivery of useful update rule given by (1) where the learning rate is set to 0.1.
data. In such cases, attackers exploit the limitations of WSNs Upon successful transmission, the reward value is equal to +1,
such as limited bandwidth and buffering capabilities. The pro- and it is −1 for a failed transmission. Certainly, the nodes will
posed solution is based on neural network learning to prevent decide to transmit data using the time slots with the maximum
flooding the network with untruthful data traffic by investigat- Q-values.
ing the network properties and variations such as packet request Although the idea of using reinforcement learning for duty
rate and average packet waiting time. Consequently, the MAC cycle management is appealing because of its distributed oper-
layer will be blocked if the neural network output exceeds a ation and small memory and computational resource require-
predefined threshold level. More importantly, only nodes in the ment, it may result in high collision rates during the initial
affected sites will be blocked, as this solution is designed to exploration phases.
work in a distributed manner. 4) Adaptive MAC Layer: In a variety of modern applica-
3) Duty Cycle Management Using Reinforcement Learning: tions, such as in healthcare and assisted living systems, WSNs
Liu and Elhanany [114] employed a reinforcement learning are used to directly share the collected data with the users’
technique to introduce RL-MAC, an adaptive MAC protocol mobile phones. This introduces new design challenges that are
for WSNs. Basically, RL-MAC reduces energy usage and in- related to the dynamic communication patterns and service
creases throughput by optimizing the duty cycle of the network requirements over time. Sha et al. [115] studied this problem
node. Similar to S-MAC [116] and T-MAC [117], RL-MAC at the MAC layer, hence proposing the “Self-Adapting MAC
synchronizes node’s transmission on a common schedule in a Layer” (SAML) design. SAML is composed of two main
frame-based structure. RL-MAC adaptively determines the slot components: The “Reconfigurable MAC Architecture” (RMA)
length, duty cycle and transmission active time depending on to switch between the different MAC protocols, and the MAC
the traffic load and the channel bandwidth. engine that is used to learn the suitable MAC protocol for
Similarly, Chu et al. [112] integrated slotted ALOHA and the current network conditions. The learning process is per-
Q-Learning algorithms to introduce a new MAC protocol for formed using the decision tree classier as illustrated in Fig. 13.
WSNs, called “ALOHA and Q-Learning based MAC with The learning features of the decision tree are: inter packet
Informed Receiving” (ALOHA-QIR). ALOHA-QIR inherits interval (IPI) and received signal strength indication (RSSI)
the features of both ALOHA and Q-Learning to achieve the statistical parameters (i.e., the mean and the variance), the
benefits of simple design, low resource requirements and low application QoS requirements (reliability, energy usage, and
collision probability. During their transmission frames, nodes latency), packet delivery rate (PDR), and the traffic pattern.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2010 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

Fig. 14. Example of anomaly detection in phenomena monitoring sensor


system using machine learning clustering and classification techniques (data
set in Euclidean space).
Fig. 13. Decision tree classifier used to select the optimal MAC algorithm in
the SAML architecture [115].
enhancements by adopting machine learning techniques will
The supported MAC protocols are Pure TDMA [118], Adaptive result in the following earnings:
TDMA [119], Box-MAC [120], RI-MAC [121], and ZigBee • Save node’s energy and significantly expand WSN lifetime
[122]. Even though the SAML scheme provides an adaptive by preventing the transmission of the outlier, misleading
MAC solution in dynamic environments, it introduces a level of data.
complexity and additional expense into the designed systems. • Enhance network reliability by eliminating faulty and ma-
licious readings. In the same way, avoiding the discovery
of unexpected knowledge that will be converted to impor-
IV. N ON -F UNCTIONAL C HALLENGES
tant, and often critical actions.
Non-functional requirements include specifications that are • Online learning and prevention (without human interven-
not related to the basic operational behavior of the system. tion) of malicious attacks and vulnerabilities.
For example, WSN designers may need to ensure that the In this subsection, we explore various machine learning-
proposed solution is always capable of providing up-to-date based algorithms addressing the security issue in WSNs.
information about the monitored environment. This section Table VI summarizes the reviewed methods in this subsection.
provides a comprehensive review of recent machine learning The column “Predicting missing data” indicates the ability of
advances that have been adopted to achieve non-functional the proposed solution to provide predictions for any missing
requirements in WSNs such as security, quality of service, sensors’ readings.
and data integrity. Moreover, this section also highlights some 1) Outlier Detection Using Bayesian Belief Network:
unique efforts in specialized WSN applications. Such studies Janakiram et al. [31] used Bayesian belief networks (BBNs) to
could inspire researchers to a variety of WSN applications that develop an outlier detection scheme. Given that the majority of
can be improved using machine learning techniques. node’s neighbors will have similar readings (i.e., temporal and
spatial correlations), it is reasonable to use this phenomenon to
build conditional dependencies among nodes’ readings. BBNs
A. Security and Anomaly Intrusion Detection
infer the conditional relationships among the observations to
The major challenge to implement security techniques in discover any potential outliers in the collected data. Further-
WSNs is the limited resource constraints [14]. Moreover, some more, this method can be used to evaluate missing values.
attack methods aim to produce unexpected, mistaken knowl- 2) Outlier Detection Using k-Nearest Neighbors:
edge, by introducing misleading observations to the network. Branch et al. [32] developed an in-network outlier detection
Fig. 14 presents the general concept of anomaly detection in method in WSNs using k-nearest neighbors. Moreover, any
phenomenon monitoring sensor system using machine learn- missing nodes’ readings will be replaced by the average
ing clustering and classification algorithms. In this example, value of the k-nearest nodes. However, such non-parametric,
machine learning techniques classify the data into two correct k-NN-based algorithm requires large memory to store every
reading regions. Since most observations lie in these two re- collected readings from the monitored environment.
gions, the points that are inconsistent (e.g., from an attack) with 3) Detecting Selective Forwarding Attacks Using Support
these regions are considered as anomalies. Vector Machine (SVM): In black hole attacks, malicious nodes
Machine learning algorithms have been employed to detect send misleading “Routing Reply” (RREP) messages whenever
outlying and misleading measurements. Simultaneously, sev- the malicious nodes receive “Route Request” (RREQ) mes-
eral attacks could be detected by analyzing well-known ma- sages, indicating that routes to the destinations are found. Ac-
licious activities and vulnerabilities. Basically, WSN security cordingly, source nodes will stop the process of route discovery,

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2011

TABLE VI
S UMMARY OF W IRELESS S ENSOR N ETWORK O UTLIER D ETECTION T ECHNIQUES T HAT A DOPT M ACHINE L EARNING PARADIGMS

and will ignore other RREP messages. Therefore, malicious 5) Analyzing Attacks With Self-Organizing Map (SOM):
nodes will drop all network’s messages, while the source nodes Avram et al. [123] addressed the issue of detecting network
assume that their packets were delivered to the destination. attacks in wireless ad hoc networks using self-organizing
Kaplantzis et al. [33] presented packet dropping attack pre- map unsupervised learning. Learning the weights are obtained
vention technique based on one class support vector machine through statistical analysis of the input data vectors. The main
classifier. The proposed scheme is capable of detecting black issue of this scheme is the complexity in determining input
hole attacks and selective forwarding attacks. Basically, routing weights. Moreover, SOM-based algorithms are not suitable for
information, bandwidth and hop count are used to determine detecting attacks in very large and complex data sets (i.e., large
the malicious nodes in the network. scale sensor network).
4) Outlier Detection Using Support Vector Machine (SVM):
By using a quarter-sphere centered at the origin, the drawback B. Quality of Service, Data Integrity and Fault Detection
of high computational requirements of traditional SVM could
be alleviated. For instance, Rajasegarar et al. [34] introduced a Quality of service (QoS) guarantees high-priority delivery
one-class quarter-sphere SVM anomaly recognition technique. of real-time events and data. In the context of WSNs, there
The motivation of this distributed scheme is to distinguish are potential multi-hop transmissions of data to the end user,
anomalies in data while minimizing communication overhead. in addition to distributing queries from a system controller
In [46], Yang et al. tackled the design of an online outlier to the network nodes [125]. WSNs suffer from energy and
detection method using quarter-sphere SVM. The unsupervised bandwidth constraints that limit the quantity of information to
learning method investigates the local data to reduce the com- be transmitted from a source to destination nodes. Furthermore,
putational complexity of traditional SVM-based outlier detec- data aggregation and dissemination in WSNs can be faulty
tion algorithms. This outlier detector is similar to the method and unreliable [4]. These issues coupled with random net-
introduced in [34]. work topologies introduce an important challenge for designing
Artificial immunity algorithm is a computationally intelligent reliable algorithms for such networks. The state of the art
algorithm for problem solving inspired by the biological immu- and general QoS requirements in WSNs have been reviewed
nity systems [124]. The biological immunity systems automat- in [126].
ically generate the immune body (antibody) against the antigen In the following, we review the latest efforts of using ma-
(e.g., a virus) through the cell fission. In [47], Chen et al. chine learning techniques to achieve specific QoS and data
extended the basic idea of using SVM for detecting intrusion by integrity constraints. In brief, this adoption results in the fol-
combining it with immunity algorithm. In summary, an immune lowing advantages:
algorithm was introduced as a preprocessing step for the sensor • Different machine learning classifiers are used to recog-
data, that will be used by SVM to detect intruders. Furthermore, nize different types of streams, thus eliminating the need
Zhang et al. [48] also investigated the temporal and spatial for flow-aware management techniques.
correlations of the collected readings using a one-class SVM • The requirements for QoS guarantee, data integrity and
learning algorithm to develop an outlier detection method. This fault detection depend on the network service and appli-
study adopts an ellipsoidal one-class SVM that can be solved cation. Machine learning methods are able to handle much
using linear optimization instead of the quadratic optimization of this while ensuring efficient resource utilization, mainly
problem in traditional SVM methods. bandwidth and power utilization.
The main advantages of these SVM-based methods are their Table VII summarizes the methods that are reviewed in this
good performance (efficient learning) and ability to learn non- subsection. The column “Characteristics” indicates features or
linear and complex problems. However, they still suffer from a qualities belonging to each study.
scalability issue to large data set due to their high computational 1) QoS Estimation Using Neural Network: Recently,
and large memory requirements [45]. there is growing interest in estimating and improving the

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2012 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

TABLE VII
S UMMARY OF Q UALITY OF S ERVICE , DATA I NTEGRITY AND FAULT D ETECTION S OLUTIONS

performance of WSNs. For example, Snow et al. [35] (e.g., sea surface temperature) is calculated using the general-
introduced a method to estimate a sensor network dependability ized multivariate Gaussian distribution given by
metric using a neural network method. Dependability is a  
Δ 1 1 −1
metric that represents availability, reliability, maintainability, p(x|μ, K, I) = √ exp − (x − μ) K (x − μ)
T

and survivability of a sensor network. Several attributes are det 2πK 2


(5)
used to estimate such a metric including mean time between
where μ, K are the prior mean and covariance of the variable x,
failure (MTBF) and mean time to repair (MTTR).
respectively. Further, I denotes the historical data (a sequence
Moustapha and Selmic [36] introduced a dynamic fault
of time-stamped samples) that is updated online to consider the
detection model for WSNs. This model captures the nodes’
new sequentially collected observations.
dynamic behavior and their effects on other nodes. In addi-
4) QoS Provisioning Using Reinforcement Learning:
tion, neural network learning, which is trained using back-
Ouferhat and Mellouk [128] introduced a QoS task scheduler
propagation method, was used for node identification and fault
for adaptive multimedia sensor networks based on Q-learning
detection (a similar idea as in [35]). This study results in an
technique. This scheduler significantly enhances the network
effective nonlinear sensor model that suits applications with
throughput by reducing the transmission delay. Comparatively,
fault detection requirements.
Seah et al. [129] considered coverage as a QoS metric in
2) MetricMap (Link Quality Estimation Framework): Link
WSNs that represents how efficiently the area of interest will
quality measurement tools may provide inaccurate and unstable
be observed. A Q-learning method was used to develop a
readings across different environments due to different con-
distributed learner that is able to find weakly monitored sites.
ditions such as signal variations and interference [132]. As a
These sites can be resolved in future re-deployment stages.
result, Wang et al. [37] presented MetricMap, a link quality
It is important to note that energy harvesting has not
estimation framework using supervised learning methods. Met-
been considered in the above QoS mechanisms. Conversely,
ricMap enhances the MintRoute [133] protocol by adopting
Hsu et al. [130] introduced a QoS-aware power management
online and offline learning methods, such as decision tree
scheme for WSNs with energy harvesting capabilities, namely
learners, to derive link quality indicators. This framework uses
“Reinforcement Learning based QoS-aware Power Manage-
several local features to build the classification tree such as the
ment” (RLPM). This scheme is able to adapt to the dynamic
received signal strength indicator (RSSI), transmission buffer
levels of nodes’ energy in systems with energy harvesting ca-
size, channel load, and the forward and backward probabilities.
pabilities. QoS-aware RLPM employs reinforcement learning
The forward probability pf (l) is defined as the ratio of the
to attain QoS awareness and to manage nodes’ duty cycle under
received to the total transmitted packets over the link l, whereas
the energy restriction. Furthermore, Liang et al. [131] designed
the backward probability pb (l) is calculated over the reverse
“Multi-agent Reinforcement Learning based multi-hop mesh
path. The local features are preferred over the global one as
Cooperative Communication” (MRL-CC) to be a structure
they can be found without costly communications with far away
modeling tool for QoS provisioning in WSNs. Basically, MRL-
nodes. Experiments reveal that up to three times improvement
CC is adopted to reliably assess the data in a cooperative
in data delivery rate over basic MintRoute method can be
manner. Moreover, MRL-CC might be used to examine the
achieved.
impact of traffic load and node mobility on the whole network
3) Assessing Accuracy and Reliability of Sensor Nodes Us-
performance.
ing Multi-Output Gaussian Processes: Osborne et al. [127]
presented a real-time algorithm to determine a set of nodes
C. Miscellaneous Applications
that are capable of handling information processing tasks such
as assessing the accuracy of collected sensor readings and This subsection presents miscellaneous and unique research
predicting the missing readings. This algorithm provides a efforts that are not discussed previously.
probabilistic Gaussian process based iterative implementation 1) Resource Management Through Reinforcement Learning:
that is trained to re-use previous experience (i.e., the historical Shah and Kumar [134] presented the “Distributed Independent
data) and maintain a reasonable training data size. Yet, the Reinforcement Learning” (DIRL) algorithm that utilizes local
posterior distribution of an observed environmental variable x information and application constraints to optimize various

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2013

Fig. 15. Example of task management using the DIRL middleware algorithm:
Object tracking application [134].

tasks over time while minimizing energy consumption of the


network. Each sensor node learns the minimum required re- Fig. 16. Hierarchical clustering of network’s nodes based on data spatial and
sources to perform its scheduled tasks, and maximizes its temporal correlations in a temperature monitoring system.
future rewards by finding optimal parameters of the intended
application. As a typical case, consider an object recognition for measuring air pollution levels using inexpensive gas sensor
application, as shown in Fig. 15, which consists of five funda- nodes, while eliminating the effects of temperature and humid-
mental tasks: (a) aggregate two or more readings into a single ity on sensor readings. This solution detects the air quality and
reading, (b) transmit a message to the next hop, (c) receive gas concentration using neural networks implemented using
incoming messages, (d) sample and take readings, and (e) put JavaScript (JS). As a result, the solution is able to distribute
the radio into sleep mode. These tasks must be executed in some processing between web server and end user computers (i.e., a
priority to maximize the network lifetime, where the network combination of client and server side scripts).
do not have such knowledge of priority as there is no static 5) Intelligent Lighting Control Using Neural Networks:
schedule for the events. For example, a node does not have Gao et al. [139] introduced a new standard for lighting con-
the knowledge of when the object is going to move near to it trol in smart building using the neural network algorithm. A
to start taking samples. Here, the DIRL algorithm can be used radial basis function (RBF) neural network is used to extract
to generate the required knowledge of priority by using the Q- a new mathematical expression, called “Illuminance Matrix”
learning algorithm and after specifying the set of reward and (I-matrix), to measure the degree of illuminance in the lighted
price value for each task. area. Fundamentally, in the field of lighting control, converting
2) Decision Tree-Based Animals Behavior Classification: the collected data from the photosensors to a form that is
WSNs were applied in many applications such as environmen- suitable for digital signal processing is a crucial issue and can
tal and habitat monitoring [135]. As an example, Nadimi et al. highly affect the performance of the developed system. The
[136] employed decision tree to accurately classify the behavior article shows that using the I-matrix scheme can achieve about
of a herd of animals (active or inactive) using parameters such 60% more accuracy compared to the standard methods.
as the pitch angle of the neck and movement velocity. The
advantages of the proposed solution for animals behavior clas-
sification are the simple implementation and low complexity V. F UTURE A PPLICATIONS OF M ACHINE L EARNING IN
due to the use of a few critical features. W IRELESS S ENSOR N ETWORKS
3) Clock Synchronization Using Self-Organizing Map: Although machine learning techniques have been applied to
Clock synchronization between sensor nodes is an important many applications in WSNs, many issues are still open and need
process, since most operations of the nodes must be consistent further research efforts.
with each other. Moreover, the design of such methods for
WSNs has to consider the limited resource constraints. For
A. Compressive Sensing and Sparse Coding
example, Paladina et al. [137] proposed to use SOM to ensure
reliable clock synchronization for large scale networks. Nodes In practice, a large number of sensor measurements are
predict near-optimal estimation of the current time without usually required to maintain a desired detection accuracy. This
having a central timing device and with limited storage and introduces several challenges to network designers such as
computing resources. However, this method assumes a uniform network management and communication issues. Given that
deployment of the nodes over the monitored area, as well as the 80 percent of the nodes’ energy is consumed while sending
same transmission powers for all nodes. and receiving data [140], data compression and dimensionality
4) Air Quality Monitoring Using Neural Networks: Posto- reduction techniques can be used to reduce transmission and
lache et al. [138] proposed a neural networks-based method hence prolong the network lifetime.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2014 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

TABLE VIII
S UMMARY OF P UBLICATIONS R ESOLVING VARIOUS WSN C HALLENGES BY THE A DOPTION OF M ACHINE L EARNING T ECHNIQUES

Traditional data compression techniques may result in ex- C. Resource Management Using Machine Learning
tra energy consumption due to their high computational and
Energy saving is a crucial issue in developing efficient WSNs
memory requirements. In [141], Barr and Asanović studied the
algorithms and techniques. This design goal can be achieved
tradeoff between energy consumption in data transmission and
using two main techniques, namely, by enhancing communica-
compression. This study approximates the efficiency threshold
tion related protocols (e.g., routing and MAC protocols design)
of data compression in WSNs to be 1 bit data reduction using
and by detecting nonfunctional and energy wasteful activities.
485–1267 ADD instructions.
The first technique includes physical, MAC and networking
Even though compressive sensing can be recast as a linear
layer protocols. As it is discussed in this survey, this technique
program, it still not applicable for on-node compression. As a
has been widely studied and enhanced using machine learning
result, it is important to apply and extend the basic concept of
algorithms. The second technique focuses on decreasing the
compressive sensing to meet the resource constraint of WSNs.
consumed energy in minor and nonfunctional requirements. For
For more on the theoretical performance of decentralized com-
example, sensor nodes will consume their energy when over-
pressive sensing, please refer to [142]–[144]. Examples of
listening to other nodes’ transmissions [150]. Accordingly, such
similar emerging techniques include independent component
operations unnecessarily increase the active time of the nodes
analysis, dictionary learning, non-negative matrix factorization
(i.e., increase nodes’ duty cycle). The nodes that are equipped
and singular value decomposition.
with machine learning techniques will be able to optimize their
resource management and power allocation operations under
B. Distributed and Adaptive Machine Learning Techniques
those circumstances.
for WSNs
Distributed machine learning techniques suit limited re-
source devices such as WSNs. Compared to centralized
D. Detecting Data Spatial and Temporal Correlations Using
learning algorithms, distributed learning methods require less
Hierarchical Clustering
computational power and smaller memory footprint (i.e., they
do not need to consider the whole network information). The Hierarchical clustering is an unsupervised learning algorithm
decentralized learning techniques enable the nodes to rapidly that aims to build a hierarchy of clusters. Basically, hierarchical
adapt their future behavior and predictions in tune with the cur- clustering algorithms generate decomposition of the set of
rent environment conditions. For such reasons, distributed and objects, which could be a set of sensor nodes in WSNs. Broadly
adaptive learning algorithms are adequate for in-network pro- speaking, hierarchical clustering can provide an emerging clus-
cessing of data while avoiding exhausting the nodes with high tering technique in WSNs using some clustering criteria such as
computational tasks [145]. Examples of recent online learn- spatial and temporal correlations of readings. Fig. 16 illustrates
ing algorithms include “Adaptive Regularization of Weights” such hierarchically clustered network based on spatial and
(AROW) [146], “Improved Ellipsoid Method for Online Learn- temporal correlations of readings in a temperature monitoring
ing” (IELLIP) [147] and “Soft Confidence-Weighted” (SCW) system. In this example, Cluster C is formed by combining
[148]. Kotecha et al. [149] studied some distributed classifica- Clusters A and B, and so on for the rest of the clusters in the
tion algorithms for WSNs. network.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2015

The study of data correlation based on hierarchical clustering [12] A. Forster, “Machine learning techniques applied to wireless ad-hoc
method will provide simple methods for energy saving. In such networks: Guide and survey,” in Proc. 3rd Int. Conf. Intell. Sensors,
Sensor Netw. Inf., 2007, pp. 365–370.
formations, only one node from each cluster is activated at [13] A. Förster and M. Amy L, Machine Learning Across the WSN Layers.
a time to cover and monitor the whole cluster area. Typical Rijeka, Croatia: InTech, 2011.
methods of hierarchical clustering include “Balanced Iterative [14] Y. Zhang, N. Meratnia, and P. Havinga, “Outlier detection techniques
for wireless sensor networks: A survey,” IEEE Commun. Surveys Tuts.,
Reducing and Clustering using Hierarchies” (BIRCH) [151] vol. 12, no. 2, pp. 159–170, Apr. 2010.
and “Clustering Using Representatives” (CURE) [152]. [15] V. J. Hodge and J. Austin, “A survey of outlier detection methodologies,”
Artif. Intell. Rev., vol. 22, no. 2, pp. 85–126, Aug. 2004.
[16] R. Kulkarni, A. Förster, and G. Venayagamoorthy, “Computational intel-
ligence in wireless sensor networks: A survey,” IEEE Commun. Surveys
VI. C ONCLUSION Tuts., vol. 13, no. 1, pp. 68–96, May 2011.
[17] S. Das, A. Abraham, and B. K. Panigrahi, Computational Intelligence:
Wireless sensor networks are different from traditional net- Foundations, Perspectives, and Recent Trends. Hoboken, NJ, USA:
work in various aspects, thereby necessitating protocols and Wiley, 2010, pp. 1–37.
[18] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning From
tools that address unique challenges and limitations. As a con- Data, AMLBook, 2012.
sequence, wireless sensor networks require innovative solutions [19] O. Chapelle, B. Schlkopf, and A. Zien, Semi-Supervised Learning, vol. 2.
for energy aware and real-time routing, security, scheduling, Cambridge, MA, USA: MIT Press, 2006.
[20] S. Kulkarni, G. Lugosi, and S. Venkatesh, “Learning pattern
localization, node clustering, data aggregation, fault detection classification—A survey,” IEEE Trans. Inf. Theory, vol. 44, no. 6,
and data integrity. Machine learning provides a collection of pp. 2178–2206, Oct. 1998.
techniques to enhance the ability of wireless sensor network to [21] M. Morelande, B. Moran, and M. Brazil, “Bayesian node localisation
in wireless sensor networks,” in Proc. IEEE Int. Conf. Acoust., Speech
adapt to the dynamic behavior of its surrounding environment. Signal Process., 2008, pp. 2545–2548.
Table VIII summarizes studies that have adopted machine [22] C.-H. Lu and L.-C. Fu, “Robust location-aware activity recognition using
learning methods to address these challenges from distinct wireless sensor network in an attentive home,” IEEE Trans. Autom. Sci.
Eng., vol. 6, no. 4, pp. 598–609, Oct. 2009.
research areas. [23] A. Shareef, Y. Zhu, and M. Musavi, “Localization using neural networks
From the discussion so far, it became clear that many design in wireless sensor networks,” in Proc. 1st Int. Conf. Mobile Wireless
challenges in wireless sensor networks have been resolved Middleware, Oper. Syst., Appl., 2008, pp. 1–7.
[24] J. Winter, Y. Xu, and W.-C. Lee, “Energy efficient processing of k nearest
using several machine learning methods. In this paper, an neighbor queries in location-aware sensor networks,” in Proc. 2nd Int.
extensive literature review over the period 2002–2013 on such Conf. Mobile Ubiquitous Syst., Netw. Serv., 2005, pp. 281–292.
studies was presented. In summary, adopting machine learning [25] P. P. Jayaraman, A. Zaslavsky, and J. Delsing, “Intelligent processing of
k-nearest neighbors queries using mobile data collectors in a location
algorithms in wireless sensor networks has to consider the aware 3D wireless sensor network,” in Trends in Applied Intelligent
limited resources of the network, as well as the diversity of Systems. Berlin, Germany: Springer-Verlag, 2010, pp. 260–270.
learning themes and patterns that will suit the problem at hand. [26] L. Yu, N. Wang, and X. Meng, “Real-time forest fire detection with
wireless sensor networks,” in Proc. Int. Conf. Wireless Commun., Netw.
Moreover, numerous issues are still open and need further Mobile Comput., 2005, vol. 2, pp. 1214–1217.
research efforts such as developing lightweight and distributed [27] M. Bahrepour, N. Meratnia, M. Poel, Z. Taghikhaki, and P. J. Havinga,
message passing techniques, online learning algorithms, hier- “Distributed event detection in wireless sensor networks for disaster
management,” in Proc. 2nd Int. Conf. Intell. Netw. Collab. Syst., 2010,
archical clustering patterns and adopting machine learning in pp. 507–512.
resource management problem of wireless sensor networks. [28] M. Kim and M.-G. Park, “Bayesian statistical modeling of system en-
ergy saving effectiveness for MAC protocols of wireless sensor net-
works,” in Software Engineering, Artificial Intelligence, Networking and
Parallel/Distributed Computing, vol. 209. Berlin, Germany: Springer-
R EFERENCES
Verlag, 2009, ser. Studies in Computational Intelligence, pp. 233–245.
[1] T. O. Ayodele, “Introduction to machine learning,” in New Advances in [29] Y.-J. Shen and M.-S. Wang, “Broadcast scheduling in wireless sensor
Machine Learning. Rijeka, Croatia: InTech, 2010. networks using fuzzy Hopfield neural network,” Exp. Syst. Appl., vol. 34,
[2] A. H. Duffy, “The ‘what’ and ‘how’ of learning in design,” IEEE Expert, no. 2, pp. 900–907, Feb. 2008.
vol. 12, no. 3, pp. 71–76, May/Jun. 1997. [30] R. V. Kulkarni and G. K. Venayagamoorthy, “Neural network based
[3] P. Langley and H. A. Simon, “Applications of machine learning and rule secure media access control protocol for wireless sensor networks,” in
induction,” Commun. ACM, vol. 38, no. 11, pp. 54–64, Nov. 1995. Proc. IJCNN, 2009, pp. 3437–3444.
[4] L. Paradis and Q. Han, “A survey of fault management in wireless sensor [31] D. Janakiram, V. Adi Mallikarjuna Reddy, and A. Phani Kumar, “Outlier
networks,” J. Netw. Syst. Manage., vol. 15, no. 2, pp. 171–190, Jun. 2007. detection in wireless sensor networks using Bayesian belief networks,”
[5] B. Krishnamachari, D. Estrin, and S. Wicker, “The impact of data ag- in Proc. 1st Int. Conf. Commun. Syst. Softw. Middleware, 2006, pp. 1–6.
gregation in wireless sensor networks,” in Proc. 22nd Int. Conf. Distrib. [32] J. W. Branch, C. Giannella, B. Szymanski, R. Wolff, and H. Kargupta,
Comput. Syst. Workshops, 2002, pp. 575–578. “In-network outlier detection in wireless sensor networks,” Knowl. Inf.
[6] J. Al-Karaki and A. Kamal, “Routing techniques in wireless sensor Syst., vol. 34, no. 1, pp. 23–54, Jan. 2013.
networks: A survey,” IEEE Wireless Commun., vol. 11, no. 6, pp. 6–28, [33] S. Kaplantzis, A. Shilton, N. Mani, and Y. Sekercioglu, “Detecting
Dec. 2004. selective forwarding attacks in wireless sensor networks using support
[7] K. Romer and F. Mattern, “The design space of wireless sensor net- vector machines,” in Proc. 3rd Int. Conf. Intell. Sensors, Sensor Netw.
works,” IEEE Wireless Commun., vol. 11, no. 6, pp. 54–61, Dec. 2004. Inf., 2007, pp. 335–340.
[8] J. Wan, M. Chen, F. Xia, L. Di, and K. Zhou, “From machine-to-machine [34] S. Rajasegarar, C. Leckie, M. Palaniswami, and J. Bezdek, “Quarter
communications towards cyber-physical systems,” Comput. Sci. Inf. sphere based distributed anomaly detection in wireless sensor networks,”
Syst., vol. 10, no. 3, pp. 1105–1128, Jun. 2013. in Proc. IEEE Int. Conf. Commun., 2007, pp. 3864–3869.
[9] Y. Bengio, “Learning deep architectures for AI,” Found. Trends Mach. [35] A. Snow, P. Rastogi, and G. Weckman, “Assessing dependability of wire-
Learn., vol. 2, no. 1, pp. 1–127, Jan. 2009. less networks using neural networks,” in Proc. IEEE Military Commun.
[10] A. G. Hoffmann, “General limitations on machine learning,” in Proc. 9th Conf., 2005, vol. 5, pp. 2809–2815.
European Conf. Artif. Intell., 1990, pp. 345–347. [36] A. Moustapha and R. Selmic, “Wireless sensor network modeling
[11] M. Di and E. M. Joo, “A survey of machine learning in wireless sensor using modified recurrent neural networks: Application to fault de-
networks from networking and application perspectives,” in Proc. 6th tection,” IEEE Trans. Instrum. Meas., vol. 57, no. 5, pp. 981–988,
Int. Conf. Inf., Commun. Signal Process., 2007, pp. 1–5. May 2008.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2016 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

[37] Y. Wang, M. Martonosi, and L.-S. Peh, “Predicting link quality using [62] I. T. Jolliffe, Principal Component Analysis. New York, NY, USA:
supervised learning in wireless sensor networks,” ACM SIGMOBILE Springer-Verlag, 2002.
Mobile Comput. Commun. Rev., vol. 11, no. 3, pp. 71–83, Jul. 2007. [63] D. Feldman et al., “Turning big data into tiny data: Constant-size coresets
[38] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is ‘near- for k-means, PCA and projective clustering,” in Proc. SODA, 2013,
est neighbor’ meaningful?” in Database Theory. Berlin, Germany: pp. 1434–1453.
Springer-Verlag, 1999, pp. 217–235. [64] C. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, no. 3/4,
[39] T. O. Ayodele, “Types of machine learning algorithms,” in New Ad- pp. 279–292, May 1992.
vances in Machine Learning. Rijeka, Croatia: InTech, 2010. [65] R. Sun, S. Tatsumi, and G. Zhao, “Q-MAP: A novel multicast routing
[40] S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier method in wireless ad hoc networks with multiagent reinforcement learn-
methodology,” IEEE Trans. Syst., Man Cybern., vol. 21, no. 3, pp. 660– ing,” in Proc. IEEE Region 10 Conf. Comput., Commun., Control Power
674, May/Jun. 1991. Eng., 2002, vol. 1, pp. 667–670.
[41] R. Lippmann, “An introduction to computing with neural nets,” IEEE [66] S. Dong, P. Agrawal, and K. Sivalingam, “Reinforcement learning based
ASSP Mag., vol. 4, no. 2, pp. 4–22, Apr. 1987. geographic routing protocol for UWB wireless sensor network,” in Proc.
[42] W. Dargie and C. Poellabauer, Localization. Hoboken, NJ, USA: Wi- IEEE Global Telecommun. Conf., 2007, pp. 652–656.
ley, 2010, pp. 249–266. [67] A. Förster and A. Murphy, “FROMS: Feedback routing for optimizing
[43] T. Kohonen, Self-Organizing Maps, vol. 30. Berlin, Germany: multiple sinks in WSN with reinforcement learning,” in Proc. 3rd Int.
Springer-Verlag, 2001, ser. Springer Series in Information Sciences. Conf. Intell. Sensors, Sensor Netw. Inf., 2007, pp. 371–376.
[44] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of [68] R. Arroyo-Valles, R. Alaiz-Rodriguez, A. Guerrero-Curieses, and
data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, J. Cid-Sueiro, “Q-probabilistic routing in wireless sensor networks,” in
Jul. 2006. Proc. 3rd Int. Conf. Intell. Sensors, Sensor Netw. Inf., 2007, pp. 1–6.
[45] I. Steinwart and A. Christmann, Support Vector Machines. New York, [69] C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden,
NY, USA: Springer-Verlag, 2008. “Distributed regression: An efficient framework for modeling sensor
[46] Z. Yang, N. Meratnia, and P. Havinga, “An online outlier detection tech- network data,” in Proc. 3rd Int. Symp. Inf. Process. Sensor Netw., 2004,
nique for wireless sensor networks using unsupervised quarter-sphere pp. 1–10.
support vector machine,” in Proc. Int. Conf. Intell. Sensors, Sensor Netw. [70] J. Barbancho, C. León, F. Molina, and A. Barbancho, “A new QoS
Inf. Process., 2008, pp. 151–156. routing algorithm based on self-organizing maps for wireless sensor
[47] Y. Chen, Y. Qin, Y. Xiang, J. Zhong, and X. Jiao, “Intrusion detection networks,” Telecommun. Syst., vol. 36, no. 1–3, pp. 73–83, Nov. 2007.
system based on immune algorithm and support vector machine in wire- [71] B. Scholkopf and A. J. Smola, Learning With Kernels: Support Vec-
less sensor network,” in Information and Automation, vol. 86. Berlin, tor Machines, Regularization, Optimization, and Beyond. Cambridge,
Germany: Springer-Verlag, 2011, ser. Communications in Computer and MA, USA: MIT Press, 2001.
Information Science, pp. 372–376. [72] J. Kivinen, A. Smola, and R. Williamson, “Online learning with kernels,”
[48] Y. Zhang, N. Meratnia, and P. J. Havinga, “Distributed online out- IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2165–2176, Aug. 2004.
lier detection in wireless sensor networks using ellipsoidal support [73] G. Aiello and G. Rogerson, “Ultra-wideband wireless systems,” IEEE
vector machine,” Ad Hoc Netw., vol. 11, no. 3, pp. 1062–1074, Microw. Mag., vol. 4, no. 2, pp. 36–47, Jun. 2003.
May 2013. [74] R. Rajagopalan and P. Varshney, “Data-aggregation techniques in sensor
[49] W. Kim, J. Park, and H. Kim, “Target localization using ensemble sup- networks: A survey,” IEEE Commun. Surveys Tuts., vol. 8, no. 4, pp. 48–
port vector regression in wireless sensor networks,” in Proc. Wireless 63, 2006.
Commun. Netw. Conf., 2010, pp. 1–5. [75] G. Crosby, N. Pissinou, and J. Gadze, “A framework for trust-based
[50] D. Tran and T. Nguyen, “Localization in wireless sensor networks based cluster head election in wireless sensor networks,” in Proc. 2nd IEEE
on support vector machines,” IEEE Trans. Parallel Distrib. Syst., vol. 19, Workshop Dependability Security Sensor Netw. Syst., 2006, pp. 10–22.
no. 7, pp. 981–994, Jul. 2008. [76] J.-M. Kim, S.-H. Park, Y.-J. Han, and T.-M. Chung, “CHEF: Cluster head
[51] B. Yang, J. Yang, J. Xu, and D. Yang, “Area localization algorithm election mechanism using fuzzy logic in wireless sensor networks,” in
for mobile nodes in wireless sensor networks based on support vector Proc. 10th Int. Conf. Adv. Commun. Technol., 2008, vol. 1, pp. 654–659.
machines,” in Mobile Ad-Hoc and Sensor Networks. Berlin, Germany: [77] S. Soro and W. Heinzelman, “Prolonging the lifetime of wireless sen-
Springer-Verlag, 2007, pp. 561–571. sor networks via unequal clustering,” in Proc. 19th IEEE Int. Parallel
[52] G. E. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis, Distrib. Process. Symp., 2005, pp. 4–8.
vol. 40. Hoboken, NJ, USA: Wiley, 2011. [78] A. A. Abbasi and M. Younis, “A survey on clustering algorithms
[53] C. E. Rasmussen, “Gaussian processes for machine learning,” in Adap- for wireless sensor networks,” Comput. Commun., vol. 30, no. 14/15,
tive Computation and Machine Learning. Cambridge, MA, USA: MIT pp. 2826–2841, Oct. 2007.
Press, 2006, Citeseer. [79] H. He, Z. Zhu, and E. Makinen, “A neural network model to minimize
[54] S. Lee and T. Chung, “Data aggregation for wireless sensor networks the connected dominating set for self-configuration of wireless sensor
using self-organizing map,” in Artificial Intelligence and Simulation, networks,” IEEE Trans. Neural Netw., vol. 20, no. 6, pp. 973–982,
vol. 3397. Berlin, Germany: Springer-Verlag, 2005, ser. Lecture Notes Jun. 2009.
in Computer Science, pp. 508–517. [80] G. Ahmed, N. M. Khan, Z. Khalid, and R. Ramer, “Cluster head selection
[55] R. Masiero et al., “Data acquisition through joint compressive sensing using decision trees for wireless sensor networks,” in Proc. Int. Conf.
and principal component analysis,” in Proc. IEEE Global Telecommun. Intell. Sensors, Sensor Netw. Inf. Process., 2008, pp. 173–178.
Conf., 2009, pp. 1–6. [81] E. Ertin, “Gaussian process models for censored sensor readings,” in
[56] R. Masiero, G. Quer, M. Rossi, and M. Zorzi, “A Bayesian analysis of Proc. IEEE/SP 14th Workshop Stat. Signal Process., 2007, pp. 665–669.
compressive sensing data recovery in wireless sensor networks,” in Proc. [82] J. Kho, A. Rogers, and N. R. Jennings, “Decentralized control of adap-
Int. Conf. Ultra Modern Telecommun. Workshops, 2009, pp. 1–6. tive sampling in wireless sensor networks,” ACM Trans. Sensor Netw.,
[57] A. Rooshenas, H. Rabiee, A. Movaghar, and M. Naderi, “Reducing vol. 5, no. 3, pp. 19:1–19:35, May 2009.
the data transmission in wireless sensor networks using the principal [83] S. Lin, V. Kalogeraki, D. Gunopulos, and S. Lonardi, “Online informa-
component analysis,” in Proc. 6th Int. Conf. Intell. Sensors, Sensor Netw. tion compression in sensor networks,” in Proc. IEEE Int. Conf. Com-
Inf. Process., 2010, pp. 133–138. mun., 2006, vol. 7, pp. 3371–3376.
[58] S. Macua, P. Belanovic, and S. Zazo, “Consensus-based distributed [84] C. Fenxiong, L. Mingming, W. Dianhong, and T. Bo, “Data compression
principal component analysis in wireless sensor networks,” in Proc. through principal component analysis over wireless sensor networks,” J.
IEEE 11th Int. Workshop Signal Process. Adv. Wireless Commun., 2010, Comput. Inf. Syst., vol. 9, no. 5, pp. 1809–1816, 2013.
pp. 1–5. [85] A. Förster and A. Murphy, “CLIQUE: Role-free clustering with q-
[59] Y.-C. Tseng, Y.-C. Wang, K.-Y. Cheng, and Y.-Y. Hsieh, “iMouse: An learning for wireless sensor networks,” in Proc. 29th IEEE Int. Conf.
integrated mobile surveillance and wireless sensor system,” Computer, Distrib. Comput. Syst., 2009, pp. 441–449.
vol. 40, no. 6, pp. 60–66, Jun. 2007. [86] M. Mihaylov, K. Tuyls, and A. Nowe, “Decentralized learning in wire-
[60] D. Li, K. Wong, Y. H. Hu, and A. Sayeed, “Detection, classification, and less sensor networks,” in Adaptive and Learning Agents, vol. 5924.
tracking of targets,” IEEE Signal Process. Mag., vol. 19, no. 2, pp. 17– Berlin, Germany: Springer-Verlag, 2010, ser. Lecture Notes in Com-
29, Mar. 2002. puter Science, pp. 60–73.
[61] T. Kanungo et al., “An efficient k-means clustering algorithm: Analysis [87] W. B. Heinzelman, “Application-specific protocol architectures for
and implementation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, wireless networks,” Ph.D. dissertation, MIT, Cambridge, MA, USA,
no. 7, pp. 881–892, Jul. 2002. 2000.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
ALSHEIKH et al.: MACHINE LEARNING IN WSNs: ALGORITHMS, STRATEGIES, AND APPLICATIONS 2017

[88] M. Duarte and Y. Eldar, “Structured compressed sensing: From theory [113] A. Bachir, M. Dohler, T. Watteyne, and K. K. Leung, “MAC essentials
to applications,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4053– for wireless sensor networks,” IEEE Commun. Surveys Tuts., vol. 12,
4085, Sep. 2011. no. 2, pp. 222–248, Apr. 2010.
[89] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood [114] Z. Liu and I. Elhanany, “RL-MAC: A reinforcement learning based
from incomplete data via the EM algorithm,” J. Roy. Stat. Soc. Ser. B, MAC protocol for wireless sensor networks,” Int. J. Sensor Netw., vol. 1,
Methodol., vol. 39, no. 1, pp. 1–38, 1977. no. 3/4, pp. 117–124, Sep. 2006.
[90] M. H. DeGroot, “Reaching a consensus,” J. Amer. Stat. Assoc., vol. 69, [115] M. Sha et al., “Self-adapting MAC layer for wireless sensor networks,”
no. 345, pp. 118–121, Mar. 1974. Washington Univ. St. Louis, St. Louis, MO, USA, Tech. Rep. WUCSE-
[91] B. Krishnamachari and S. Iyengar, “Distributed Bayesian algo- 2013-75, 2013.
rithms for fault-tolerant event region detection in wireless sensor [116] W. Ye, J. Heidemann, and D. Estrin, “An energy-efficient MAC protocol
networks,” IEEE Trans. Comput., vol. 53, no. 3, pp. 241–250, for wireless sensor networks,” in Proc. 21st Annu. Joint Conf. IEEE
Mar. 2004. Comput. Commun. Soc., 2002, vol. 3, pp. 1567–1576.
[92] P. Zappi et al., “Activity recognition from on-body sensors: Accuracy- [117] T. van Dam and K. Langendoen, “An adaptive energy-efficient MAC
power trade-off by dynamic sensor selection,” in Wireless Sensor Net- protocol for wireless sensor networks,” in Proc. 1st Int. Conf. Embedded
works. Berlin, Germany: Springer-Verlag, 2008, pp. 17–33. Netw. SenSys, 2003, pp. 171–180.
[93] H. Malik, A. Malik, and C. Roy, “A methodology to optimize query [118] K. Klues, G. Hackmann, O. Chipara, and C. Lu, “A component-based
in wireless sensor networks using historical data,” J. Ambient Intell. architecture for power-efficient media access control in wireless sensor
Humanized Comput., vol. 2, no. 3, pp. 227–238, Sep. 2011. networks,” in Proc. 5th Int. Conf. Embedded Netw. Sensor Syst., 2007,
[94] Q. Chen, K.-Y. Lam, and P. Fan, “Comments on ‘Distributed Bayesian pp. 59–72.
algorithms for fault-tolerant event region detection in wireless sen- [119] C. Doerr et al., “MultiMAC—An adaptive MAC framework for dynamic
sor networks’,” IEEE Trans. Comput., vol. 54, no. 9, pp. 1182–1183, radio networking,” in Proc. IEEE Int. Symp. New Frontiers Dyn. Spectr.
Sep. 2005. Access Netw., 2005, pp. 548–555.
[95] K. Sha, W. Shi, and O. Watkins, “Using wireless sensor networks for fire [120] D. Moss and P. Levis, “BoX-MACs: Exploiting physical and link layer
rescue applications: Requirements and challenges,” in Proc. IEEE Int. boundaries in low-power networking,” Comput. Syst. Lab. Stanford
Conf. Electro/Inf. Technol., 2006, pp. 239–244. Univ., Stanford, CA, USA, 2008.
[96] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoor [121] Y. Sun, O. Gurewitz, and D. B. Johnson, “RI-MAC: A receiver-initiated
positioning techniques and systems,” IEEE Trans. Syst., Man, Cybern. asynchronous duty cycle MAC protocol for dynamic traffic loads in
C, Appl. Rev., vol. 37, no. 6, pp. 1067–1080, Nov. 2007. wireless sensor networks,” in Proc. 6th ACM Conf. Embedded Netw.
[97] J. Wang, R. Ghosh, and S. Das, “A survey on sensor localization,” J. Sensor Syst., 2008, pp. 1–14.
Control Theory Appl., vol. 8, no. 1, pp. 2–11, Jan. 2010. [122] Z. Alliance, “Zigbee-2007 specification,” 2007. [Online]. Available:
[98] A. Nasipuri and K. Li, “A directionality based location discovery scheme http://www.zigbee.org/Specifications/ZigBee/Overview.aspx
for wireless sensor networks,” in Proc. 1st ACM Int. Workshop Wireless [123] T. Avram, S. Oh, and S. Hariri, “Analyzing attacks in wireless ad hoc
Sensor Netw. Appl., 2002, pp. 105–111. network with self-organizing maps,” in Proc. 5th Annu. Conf. Commun.
[99] S. Yun, J. Lee, W. Chung, E. Kim, and S. Kim, “A soft computing Netw. Serv. Res., 2007, pp. 166–175.
approach to localization in wireless sensor networks,” Exp. Syst. Appl., [124] L. N. De Castro and J. Timmis, Artificial Immune Systems: A New
vol. 36, no. 4, pp. 7552–7561, May 2009. Computational Intelligence Approach. London, U.K.: Springer-Verlag,
[100] S. Chagas, J. Martins, and L. de Oliveira, “An approach to localization 2002.
scheme of wireless sensor networks based on artificial neural networks [125] G. J. Pottie and A. Pandya, Quality of Service in Wireless Sensor Net-
and genetic algorithms,” in Proc. IEEE 10th Int. Conf. New Circuits works. Hoboken, NJ, USA: Wiley, 2008, pp. 401–435.
Syst., 2012, pp. 137–140. [126] D. Chen and P. K. Varshney, “QoS support in wireless sensor networks:
[101] Z. Merhi, M. Elgamel, and M. Bayoumi, “A lightweight collabora- A survey,” in Proc. Int. Conf. Wireless Netw., 2004, pp. 227–233.
tive fault tolerant target localization system for wireless sensor net- [127] M. A. Osborne, S. J. Roberts, A. Rogers, S. D. Ramchurn, and
works,” IEEE Trans. Mobile Comput., vol. 8, no. 12, pp. 1690–1704, N. R. Jennings, “Towards real-time information processing of sen-
Dec. 2009. sor network data using computationally efficient multi-output Gaussian
[102] E. Cayirci, H. Tezcan, Y. Dogan, and V. Coskun, “Wireless sensor net- processes,” in Proc. 7th Int. Conf. Inf. Process. Sensor Netw., 2008,
works for underwater surveillance systems,” Ad Hoc Netw., vol. 4, no. 4, pp. 109–120.
pp. 431–446, Jul. 2006. [128] N. Ouferhat and A. Mellouk, “A QoS scheduler packets for wireless
[103] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements sensor networks,” in Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl.,
in Gaussian processes: Theory, efficient algorithms and empirical stud- 2007, pp. 211–216.
ies,” J. Mach. Learn. Res., vol. 9, pp. 235–284, Jun. 2008. [129] M. Seah, C.-K. Tham, V. Srinivasan, and A. Xin, “Achieving coverage
[104] D. Gu and H. Hu, “Spatial Gaussian process regression with mobile through distributed reinforcement learning in wireless sensor networks,”
sensor networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 8, in Proc. 3rd Int. Conf. Intell. Sensors, Sensor Netw. Inf., 2007, pp. 425–
pp. 1279–1290, Aug. 2012. 430.
[105] L. Paladina, M. Paone, G. Iellamo, and A. Puliafito, “Self organizing [130] R. Hsu, C.-T. Liu, K.-C. Wang, and W.-M. Lee, “QoS-aware power
maps for distributed localization in wireless sensor networks,” in Proc. management for energy harvesting wireless sensor network utilizing
12th IEEE Symp. Comput. Commun., 2007, p. 1113. reinforcement learning,” in Proc. Int. Conf. Comput. Sci. Eng., 2009,
[106] G. Giorgetti, S. K. S. Gupta, and G. Manes, “Wireless localization using vol. 2, pp. 537–542.
self-organizing maps,” in Proc. 6th Int. Conf. IPSN, 2007, pp. 293–302. [131] X. Liang, M. Chen, Y. Xiao, I. Balasingham, and V. C. M. Leung,
[107] J. Hu and G. Lee, “Distributed localization of wireless sensor networks “A novel cooperative communication protocol for QoS provisioning in
using self-organizing maps,” in Proc. IEEE Int. Conf. Multisens. Fusion wireless sensor networks,” in Proc. 5th Int. Conf. Testbeds Res. Infrastr.
Integr. Intell. Syst., 2008, pp. 284–289. Develop. Netw. Comm. Workshops, 2009, pp. 1–6.
[108] S. Li, X. Kong, and D. Lowe, “Dynamic path determination of mobile [132] N. Baccour et al., “Radio link quality estimation in wireless sensor
beacons employing reinforcement learning for wireless sensor localiza- networks: A survey,” ACM Trans. Sensor Netw., vol. 8, no. 4, p. 34,
tion,” in Proc. 26th Int. Conf. Adv. Inf. Netw. Appl. Workshops, 2012, Sep. 2012.
pp. 760–765. [133] A. Woo, T. Tong, and D. Culler, “Taming the underlying challenges of
[109] C. Musso, N. Oudjane, and F. Le Gland, “Improving regularised particle reliable multihop routing in sensor networks,” in Proc. 1st Int. Conf.
filters,” in Sequential Monte Carlo Methods in Practice. New York, Embedded Netw. SenSys, 2003, pp. 14–27.
NY, USA: Springer-Verlag, 2001, pp. 247–271. [134] K. Shah and M. Kumar, “Distributed Independent Reinforcement
[110] Y.-X. Wang and Y.-J. Zhang, “Non-negative matrix factorization: A Learning (DIRL) approach to resource management in wireless sensor
comprehensive review,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 6, networks,” in Proc. IEEE Int. Conf. Mobile Adhoc Sensor Syst., 2007,
pp. 1336–1353, Jun. 2013. pp. 1–9.
[111] H.-P. Tan, R. Diamant, W. K. Seah, and M. Waldmeyer, “A survey [135] A. Mainwaring, D. Culler, J. Polastre, R. Szewczyk, and J. Anderson,
of techniques and challenges in underwater localization,” Ocean Eng., “Wireless sensor networks for habitat monitoring,” in Proc. 1st ACM
vol. 38, no. 14/15, pp. 1663–1676, Oct. 2011. Int. Workshop Wireless Sensor Netw. Appl., 2002, pp. 88–97.
[112] Y. Chu, P. Mitchell, and D. Grace, “ALOHA and q-learning based [136] E. Nadimi, H. T. Søgaard, and T. Bak, “Zigbee-based wireless sensor
medium access control for wireless sensor networks,” in Proc. Int. Symp. networks for classifying the behaviour of a herd of animals using classi-
Wireless Commun. Syst., 2012, pp. 511–515. fication trees,” Biosyst. Eng., vol. 100, no. 2, pp. 167–176, Jun. 2008.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.
2018 IEEE COMMUNICATION SURVEYS & TUTORIALS, VOL. 16, NO. 4, FOURTH QUARTER 2014

[137] L. Paladina, A. Biundo, M. Scarpa, and A. Puliafito, “Self organizing Shaowei Lin was born in Singapore, in 1981. He
maps for synchronization in wireless sensor networks,” in Proc. New received the B.Sc. degree (with honors) in mathemat-
Technol., Mobility Security, 2008, pp. 1–6. ics from Stanford University, Stanford, CA, USA, in
[138] O. Postolache, J. Pereira, and P. Girao, “Smart sensors network for air 2005. He then worked on MIMO communications
quality monitoring applications,” IEEE Trans. Instrum. Meas., vol. 58, with the Agency for Science, Technology and Re-
no. 9, pp. 3253–3262, Apr. 2009. search (A∗STAR) Institute for Infocomm Research
[139] Y. Gao, Y. Lin, and Y. Sun, “A wireless sensor network based on the (I2R), Singapore. From 2006 to 2011, he studied
novel concept of an I-matrix to achieve high-precision lighting control,” algebraic statistics under Bernd Sturmfels at the
Building Environ., vol. 70, pp. 223–231, Dec. 2013. University of California, Berkeley, CA, USA, and
[140] N. Kimura and S. Latifi, “A survey on data compression in wireless received the Ph.D. degree in mathematics. His thesis
sensor networks,” in Proc. Int. Conf. Inf. Technol., Coding Comput., was titled “Algebraic Methods for Evaluating Inte-
2005, vol. 2, pp. 8–13. grals in Bayesian Statistics.” After that, he completed a one-year collaboration
[141] K. C. Barr and K. Asanović, “Energy-aware lossless data compression,” with the artificial intelligence laboratory headed by Andrew Ng at Stanford
ACM Trans. Comput. Syst., vol. 24, no. 3, pp. 250–291, Aug. 2006. University to explore mathematical challenges in deep learning. He is currently
[142] J. Haupt, W. Bajwa, M. Rabbat, and R. Nowak, “Compressed sensing for a Scientist with the Sense and Sense-abilities Programme, A∗STAR I2R,
networked data,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 92–101, Singapore, where he uses machine learning principles to design algorithms and
Mar. 2008. protocols for wireless sensor networks. His research interests include algebraic
[143] J. Luo, L. Xiang, and C. Rosenberg, “Does compressed sensing improve geometry, asymptotic theory, singular learning theory, deep learning, and neural
the throughput of wireless sensor networks?” in Proc. IEEE Int. Conf. networks.
Commun., 2010, pp. 1–6.
[144] S. Feizi, M. Medard, and M. Effros, “Compressive sensing over net-
works,” in Proc. 48th Annu. Allerton Conf. Commu., Control, Comput., Dusit Niyato (M’08) is currently an Associate Pro-
2010, pp. 1129–1136. fessor with the School of Computer Engineering,
[145] J. B. Predd, S. Kulkarni, and H. V. Poor, “Distributed learning in wireless Nanyang Technological University, Singapore. He
sensor networks,” IEEE Signal Process. Mag., vol. 23, no. 4, pp. 56–69, received the B.E. degree from King Mongkut’s Insti-
Jul. 2006. tute of Technology Ladkrabang (KMITL), Bangkok,
[146] K. Crammer, A. Kulesza, and M. Dredze, “Adaptive regularization of Thailand, in 1999. He received the Ph.D. degree in
weight vectors,” Mach. Learn., vol. 91, no. 2, pp. 155–187, May 2013. electrical and computer engineering from the Univer-
[147] L. Yang, R. Jin, and J. Ye, “Online learning by ellipsoid method,” in sity of Manitoba, Winnipeg, MB, Canada, in 2008.
Proc. 26th Annu. ICML, 2009, pp. 1153–1160. His research interests are in the area of radio resource
[148] J. Wang, P. Zhao, and S. C. Hoi, “Exact soft confidence-weighted learn- management in cognitive radio networks and energy
ing,” in Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 121–128. harvesting for wireless communication.
[149] J. H. Kotecha, V. Ramachandran, and A. Sayeed, “Distributed multitarget
classification in wireless sensor networks,” IEEE J. Sel. Areas Commun.,
vol. 23, no. 4, pp. 703–713, Apr. 2005.
Hwee-Pink Tan (S’00–M’04–SM’14) is currently
[150] X. Jiang et al., “An architecture for energy management in wireless
a Senior Scientist with the Institute for Infocomm
sensor networks,” SIGBED Rev., vol. 4, no. 3, pp. 31–36, Jul. 2007.
Research (I2R), A∗STAR, and is also the SERC
[151] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient
Programme Manager for the A∗STAR Sense and
data clustering method for very large databases,” ACM SIGMOD Rec., Sense-abilities Programme, where he leads a team
vol. 25, no. 2, pp. 103–114, Jun. 1996.
of 30 full-time research scientists and engineers.
[152] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algo-
He received the Ph.D. degree from the Technion,
rithm for large databases,” in Proc. ACM SIGMOD Int. Conf. Manage.
Israel Institute of Technology, Haifa, Israel, in Au-
Data, 1998, pp. 73–84. gust 2004. In December 2004, he was a recipient of
the A∗STAR International Postdoctoral Fellowship.
From December 2004 to June 2006, he was a Post-
doctoral Researcher with EURANDOM, Eindhoven University of Technology,
Eindhoven, The Netherlands. Between July 2006 and March 2008, he was
a Research Fellow with The Telecommunications Research Centre (CTVR),
Trinity College Dublin, Ireland. His research has focused on the design,
modeling, and performance evaluation of networking protocols for wireless
networks, and his current research interests include underwater acoustic sensor
Mohammad Abu Alsheikh received the B.S. de- networks, wireless sensor networks powered by ambient energy harvesting,
gree in computer systems engineering from Birzeit and large-scale and heterogeneous sensor networks. He has been a Principal
University, Birzeit, Palestine, in 2011. Between 2010 Investigator for several industry-projects in the aforementioned research areas.
and 2012, he was a Software Engineer for indus- He has published more than 80 papers and has served on the TPC of numerous
trial projects and solutions. He is currently working conferences and reviewer of papers for many key journals and conferences in
toward the Ph.D. degree in the School of Com- the area of wireless networks. In recognition of his contributions toward I2R,
puter Engineering, Nanyang Technological Univer- he was also awarded the I2R Good Team Player Award in 2010, the Excellent
sity, Singapore. His current research interests include Team Player Award in 2011, and the I2R Role Model Award in 2012 and 2013.
exploring new trends and uses of machine learning In 2014, in recognition of his contributions to A∗STAR, he was also awarded
to enhance wireless sensor networks’ operations and the Most Inspiring Mentor Award, the TALENT award, and the Borderless
protocols. Award.

Authorized licensed use limited to: FH Koeln. Downloaded on September 11,2021 at 20:45:15 UTC from IEEE Xplore. Restrictions apply.

You might also like