0% found this document useful (0 votes)
1 views

4 Privacy- and Integrity-Preserving

The paper presents SafeQ, a protocol designed to ensure privacy and integrity in two-tiered sensor networks that utilize storage nodes for data processing and query handling. SafeQ addresses vulnerabilities by encoding data and queries to prevent unauthorized access and employing integrity verification methods to detect compromised storage nodes. The protocol significantly improves power and storage efficiency compared to existing methods, particularly for multidimensional data, while maintaining robust security measures.

Uploaded by

suni.yeluru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

4 Privacy- and Integrity-Preserving

The paper presents SafeQ, a protocol designed to ensure privacy and integrity in two-tiered sensor networks that utilize storage nodes for data processing and query handling. SafeQ addresses vulnerabilities by encoding data and queries to prevent unauthorized access and employing integrity verification methods to detect compromised storage nodes. The protocol significantly improves power and storage efficiency compared to existing methods, particularly for multidimensional data, while maintaining robust security measures.

Uploaded by

suni.yeluru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE/ACM TRANSACTIONS ON NETWORKING 1

Privacy- and Integrity-Preserving Range Queries


in Sensor Networks
Fei Chen and Alex X. Liu

Abstract—The architecture of two-tiered sensor networks, routes. Second, sensors can be memory-limited because data
where storage nodes serve as an intermediate tier between sensors are mainly stored on storage nodes. Third, query processing
and a sink for storing data and processing queries, has been widely becomes more efficient because the sink only communicates
adopted because of the benefits of power and storage saving for
with storage nodes for queries. The inclusion of storage nodes
sensors as well as the efficiency of query processing. However,
the importance of storage nodes also makes them attractive to at- in sensor networks was first introduced in [2] and has been
tackers. In this paper, we propose SafeQ, a protocol that prevents widely adopted [3]–[7]. Several products of storage nodes,
attackers from gaining information from both sensor collected such as StarGate [8] and RISE [9], are commercially available.
data and sink issued queries. SafeQ also allows a sink to detect However, the inclusion of storage nodes also brings signifi-
compromised storage nodes when they misbehave. To preserve cant security challenges. As storage nodes store data received
privacy, SafeQ uses a novel technique to encode both data and from sensors and serve as an important role for answering
queries such that a storage node can correctly process encoded
queries over encoded data without knowing their values. To pre- queries, they are more vulnerable to be compromised, espe-
serve integrity, we propose two schemes—one using Merkle hash cially in a hostile environment. A compromised storage node
trees and another using a new data structure called neighborhood imposes significant threats to a sensor network. First, the
chains—to generate integrity verification information so that a attacker may obtain sensitive data that has been, or will be,
sink can use this information to verify whether the result of a stored in the storage node. Second, the compromised storage
query contains exactly the data items that satisfy the query. To node may return forged data for a query. Third, this storage
improve performance, we propose an optimization technique using
Bloom filters to reduce the communication cost between sensors node may not include all data items that satisfy the query.
and storage nodes. Therefore, we want to design a protocol that prevents at-
tackers from gaining information from both sensor collected
Index Terms—Integrity, privacy, range queries, sensor data and sink issued queries, which typically can be modeled
networks.
as range queries, and allows the sink to detect compromised
storage nodes when they misbehave. For privacy, compromising
I. INTRODUCTION a storage node should not allow the attacker to obtain the sensi-
tive information that has been, and will be, stored in the node,

W IRELESS sensor networks (WSNs) have been widely as well as the queries that the storage node has received, and
deployed for various applications, such as environment will receive. Note that we treat the queries from the sink as
sensing, building safety monitoring, earthquake predication, confidential because such queries may leak critical information
etc. In this paper, we consider a two-tiered sensor network about query issuers’ interests, which need to be protected es-
architecture in which storage nodes gather data from nearby pecially in military applications. For integrity, the sink needs
sensors and answer queries from the sink of the network. The to detect whether a query result from a storage node includes
storage nodes serve as an intermediate tier between the sensors forged data items or does not include all the data that satisfy
and the sink for storing data and processing queries. Storage the query. There are two key challenges in solving the privacy-
nodes bring three main benefits to sensor networks. First, and integrity-preserving range query problem. First, a storage
sensors save power by sending all collected data to their closest node needs to correctly process encoded queries over encoded
storage node instead of sending them to the sink through long data without knowing their actual values. Second, a sink needs
to verify that the result of a query contains all the data items that
satisfy the query and does not contain any forged data.
Manuscript received July 08, 2010; revised May 25, 2011 and November
30, 2011; accepted January 17, 2012; approved by IEEE/ACM TRANSACTIONS Although important, the privacy- and integrity-preserving
ON NETWORKING Editor D. Agrawal. This work was supported in part by the range query problem has been underinvestigated. The prior
National Science Foundation under Grants CNS-0716407, CNS-0916044, and art solution to this problem was proposed by Sheng and Li in
CNS-0845513. The preliminary version of this paper, titled “SafeQ: Secure and
Efficient Query Processing in Sensor Networks,” was published in the Pro-
their recent seminal work [7]. We call it the “S&L scheme.”
ceedings of the IEEE International Conference on Computer Communications This scheme has two main drawbacks: 1) it allows attackers to
(INFOCOM), San Diego, CA, March 15–19, 2010. obtain a reasonable estimation on both sensor collected data and
F. Chen was with the Department of Computer Science and Engineering, sink issued queries; and 2) the power consumption and storage
Michigan State University, East Lansing, MI 48824 USA. He is now with
VMware, Inc., Palo Alto, CA 94304 USA (e-mail: [email protected]). space for both sensors and storage nodes grow exponentially
A. X. Liu is with the Department of Computer Science and Engineering, with the number of dimensions of collected data. In this paper,
Michigan State University, East Lansing, MI 48824 USA (e-mail: alexliu@cse. we propose SafeQ, a novel privacy- and integrity-preserving
msu.edu).
Color versions of one or more of the figures in this paper are available online
range query protocol for two-tiered sensor networks. The ideas
at http://ieeexplore.ieee.org. of SafeQ are fundamentally different from the S&L scheme.
Digital Object Identifier 10.1109/TNET.2012.2188540 To preserve privacy, SafeQ uses a novel technique to encode

1063-6692/$31.00 © 2012 IEEE


This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE/ACM TRANSACTIONS ON NETWORKING

both data and queries such that a storage node can correctly can then decrypt the encrypted buckets and verify the integrity
process encoded queries over encoded data without knowing using encoding numbers. The S&L scheme only considered
their actual values. To preserve integrity, we propose two one-dimensional data in [7], and it can be extended to handle
schemes—one using Merkle hash trees and another using a multidimensional data by dividing the domain of each dimen-
new data structure called neighborhood chains—to generate sion into multiple buckets.
integrity verification information such that a sink can use this The S&L scheme has two main drawbacks inherited from
information to verify whether the result of a query contains the bucket-partitioning technique. First, as pointed out in [14],
exactly the data items that satisfy the query. We also propose the bucket-partitioning technique allows compromised storage
an optimization technique using Bloom filters to significantly nodes to obtain a reasonable estimation on the actual value of
reduce the communication cost between sensors and storage both data items and queries. In SafeQ, such estimations are
nodes. Furthermore, we propose a solution to adapt SafeQ for very difficult. Second, for multidimensional data, the power
event-driven sensor networks, where a sensor submits data to consumption of both sensors and storage nodes, as well as the
its nearby storage node only when a certain event happens and space consumption of storage nodes, increases exponentially
the event may occur infrequently. with the number of dimensions due to the exponential increase
SafeQ excels state-of-the-art S&L scheme [7] in two aspects. of the number of buckets. In SafeQ, power and space consump-
First, SafeQ provides significantly better security and privacy. tion increases linearly with the number of dimensions times the
While prior art allows a compromised storage node to obtain a number of data items.
reasonable estimation on the value of sensor collected data and Shi et al. proposed an optimized version of S&L’s integrity-
sink issued queries, SafeQ makes such estimation very difficult. preserving scheme aiming to reduce the communication cost be-
Second, SafeQ delivers orders of magnitude better performance tween sensors and storage nodes [11], [12]. The basic idea of
on both power consumption and storage space for multidimen- their optimization is that each sensor uses a bit map to represent
sional data, which are most common in practice as most sensors which buckets have data and broadcasts its bit map to the nearby
are equipped with multiple sensing modules such as tempera- sensors. Each sensor attaches the bit maps received from others
ture, humidity, pressure, etc. to its own data items and encrypts them together. The sink veri-
We performed side-by-side comparison with prior art over fies query result integrity for a sensor by examining the bit maps
a large real-world data set from Intel Lab [10]. Our results from its nearby sensors. In our experiments, we did not choose
show that the power and space savings of SafeQ over prior art the solutions in [11] and [12] for side-by-side comparison for
grow exponentially with the number of dimensions. For power two reasons. First, the techniques used in [11] and [12] are sim-
consumption, for three-dimensional data, SafeQ consumes ilar to the S&L scheme except the optimization for integrity ver-
184.9 times less power for sensors and 76.8 times less power ification. The way they extend the S&L scheme to handle multi-
for storage nodes. For space consumption on storage nodes, for dimensional data is to divide the domain of each dimension into
three-dimensional data, SafeQ uses 182.4 times less space. Our multiple buckets. They inherit the same weakness of allowing
experimental results conform with the analysis that the power compromised storage nodes to estimate the values of data items
and space consumption in the S&L scheme grow exponentially and queries with the S&L scheme. Second, their optimization
with the number of dimensions, whereas those in SafeQ grow technique allows a compromised sensor to easily compromise
linearly with the number of dimensions times the number of the integrity verification functionality of the network by sending
data items. falsified bit maps to sensors and storage nodes. In contrast, in
S&L and our schemes, a compromised sensor cannot jeopardize
II. RELATED WORK the querying and verification of data collected by other sensors.

A. Privacy and Integrity Preserving in WSNs


B. Privacy Preserving in Databases
Privacy- and integrity-preserving range queries in WSNs
have drawn people’s attention recently [7], [11], [12]. Sheng Database privacy has been studied in prior work [13]–[17].
and Li proposed a scheme to preserve the privacy and integrity Hacigumus et al. first proposed the bucket partitioning
of range queries in sensor networks [7]. This scheme uses the idea for querying encrypted data in the database-as-service
bucket-partitioning idea proposed by Hacigumus et al. in [13] model (DAS), where sensitive data are outsourced to an un-
for database privacy. The basic idea is to divide the domain of trusted server [13]. Agrawal et al. further used the bucket-parti-
data values into multiple buckets, the size of which is computed tioning idea to investigate range queries on numerical data [15].
based on the distribution of data values and the location of Hore et al. explored the optimal partitioning of buckets [14].
sensors. In each time-slot, a sensor collects data items from the However, they have the same two drawbacks as we discussed
environment, places them into buckets, encrypts them together above. Boneh and Waters proposed a public-key system for
in each bucket, and then sends each encrypted bucket along supporting conjunctive, subset, and range queries on encrypted
with its bucket ID to a nearby storage node. For each bucket data [18]. Although theoretically this seems possible, Boneh
that has no data items, the sensor sends an encoding number, and Waters’s scheme cannot be used to solve our privacy
which can be used by the sink to verify that the bucket is empty, problem because it is too expensive for sensor networks. It
to a nearby storage node. When the sink wants to perform a would require a sensor to perform encryption for each
range query, it finds the smallest set of bucket IDs that contains data submission, where is the number of dimensions and
the range in the query, then sends the set as the query to storage is the domain size (i.e., the number of all possible values) of
nodes. Upon receiving the bucket IDs, the storage node returns each dimension. Here, could be large, and each encryption
the corresponding encrypted data in all those buckets. The sink is expensive due to the use of public key cryptography.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN AND LIU: PRIVACY- AND INTEGRITY-PRESERVING RANGE QUERIES IN SENSOR NETWORKS 3

C. Integrity Preserving in Databases


Database integrity has also been explored in prior
work [19]–[24], independent of the privacy issues. It focuses on
verifying the completeness of the result of relational database
queries. Merkle hash trees have been used for the authentication
of data elements [25], and they were used for verifying the
integrity of database queries in [19] and [20]. Pang et al. [21]
and Narasimha and Tsudik [22] proposed similar schemes Fig. 1. Architecture of two-tired sensor networks.
for verifying the integrity of relational database query results
using signature aggregation and chaining. For each tuple in a
database, Pang et al. computed the signature of the tuple by TABLE I
signing the concatenation of the digests of the tuple itself as SUMMARY OF NOTATION
well as the tuple’s left and right neighbors [21]. Narasimha and
Tsudik computed the signature by signing the concatenation
of the digests of the tuple and its left neighbors along each di-
mension [22]. Although our neighborhood chaining technique
seems similar to the above signature aggregation and chaining
technique, it is much more efficient and suitable for sensor
networks. First, our technique concatenates a data item with
its left neighbor without computing their digests. Second, our
technique does not compute signatures, which require the use
of computationally expensive public key cryptography.
Chen et al. proposed canonical range trees (CRTs) to store
the counting information for multidimensional data such that to the corresponding storage nodes, which process the queries
this counting information can be used for integrity verifica- based on their data and return the query results to the sink. The
tion without leaking boundary information [24]. However, sink unifies the query results from multiple storage nodes into
protecting boundary information is unnecessary in our context the final answer and sends it back to the user.
because the sink can access all data collected by sensors. We assume that sensors and storage nodes are loosely
Therefore, the price for protecting boundary information is un- synchronized with the sink. With loose synchronization, we
necessary. Chen’s solution requires each sensor to compute and divide time into fixed duration intervals, and every sensor
send an encrypted multidimensional CRT with approximately collects data once per time interval. From a starting time that
overhead to a storage node, where is the number all sensors and the sink agree upon, every time intervals
of data items. Therefore, it incurs too much communication form a time-slot. From the same starting time, after a sensor
cost between sensors and storage nodes. collects data for times, it sends a message that contains a
3-tuple , where is the sensor ID and is
D. Secure File Systems on Untrusted Servers the sequence number of the time-slot in which the data items
Secure file systems on untrusted servers have been studied in are collected by sensor . We address privacy-
prior work (e.g., [26] and [27]), which aims to design a system and integrity-preserving ranges queries for event-driven sensor
where users can store their files on an untrusted server and networks, where a sensor only submits data to a nearby storage
the server cannot read the content of the files. These solutions node when a certain event happens, in Section IX. We further
cannot solve our secure range query problem because, in such assume that the queries from the sink are range queries. A
work, the untrusted server is not able to process queries over range query “finding all the data items collected at time-slot in
the files. In contrast, processing queries in a privacy-preserving the range ” is denoted as . Note that the queries
manner at storage nodes is our main design goal for SafeQ. in most sensor network applications can be easily modeled as
range queries. Table I shows the notation used in this paper.
III. MODELS AND PROBLEM STATEMENT
B. Threat Model
A. System Model For a two-tiered sensor network, we assume that the sensors
We consider two-tired sensor networks as illustrated in Fig. 1. and the sink are trusted, but the storage nodes are not. In a hos-
A two-tired sensor network consists of three types of nodes: sen- tile environment, both sensors and storage nodes can be com-
sors, storage nodes, and a sink. Sensors are inexpensive sensing promised. If a sensor is compromised, the subsequent collected
devices with limited storage and computing power. They are data of the sensor will be known to the attacker, and the compro-
often massively distributed in a field for collecting physical or mised sensor may send forged data to its closest storage node. It
environmental data, e.g., temperature. Storage nodes are pow- is extremely difficult to prevent such attacks without the use of
erful wireless devices that are equipped with much more storage tamper-proof hardware. However, the data from one sensor con-
capacity and computing power than sensors. Each sensor pe- stitute a small fraction of the collected data of the whole sensor
riodically sends collected data to its nearby storage node. The network. Therefore, we mainly focus on the scenario where a
sink is the point of contact for users of the sensor network. Each storage node is compromised. Compromising a storage node can
time the sink receives a question from a user, it first translates the cause much greater damage to the sensor network than com-
question into multiple queries and then disseminates the queries promising a sensor. After a storage node is compromised, the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE/ACM TRANSACTIONS ON NETWORKING

2) Given and , it is computationally in-


feasible for the storage node to compute . This condition
guarantees data privacy.
3) Given , it is computationally infeasible for the
Fig. 2. Idea of SafeQ for preserving privacy. storage node to compute . This condition guarantees
query privacy.

large quantity of data stored on the node will be known to the A. Prefix Membership Verification
attacker, and upon receiving a query from the sink, the com- The building block of our privacy-preserving scheme is
promised storage node may return a falsified result formed by the prefix membership verification scheme first introduced
including forged data or excluding legitimate data. Therefore, in [28] and later formalized in [29]. The idea of this scheme is
attackers are more motivated to compromise storage nodes. to convert the verification of whether a number is in a range
to several verifications of whether two numbers are equal. A
C. Problem Statement prefix with leading 0’s and 1’s followed by
The fundamental problem for a two-tired sensor network is ’s is called a -prefix. For example, 1*** is a 1-prefix,
the following: How can we design the storage scheme and the and it denotes the range . If a value matches
query protocol in a privacy- and integrity-preserving manner? a -prefix (i.e., is in the range denoted by the prefix), the
A satisfactory solution to this problem should meet the fol- first bits of and the -prefix are the same. For example, if
lowing two requirements. 1*** (i.e., ), then the first bit of must
1) Data and query privacy: Data privacy means that a storage be 1. Given a binary number of bits, the prefix
node cannot know the actual values of sensor collected family of this number is defined as the set of prefixes
data. This ensures that an attacker cannot understand the , where
data stored on a compromised storage node. Query privacy the th prefix is . The prefix family of
means that a storage node cannot know the actual value of is denoted as . For example, the prefix family of number
sink issued queries. This ensures that an attacker cannot 12 is .
understand, or deduce useful information from, the queries Prefix membership verification is based on the fact that for any
that a compromised storage node receives. number and prefix if and only if .
2) Data integrity: If a query result that a storage node sends to To verify whether a number is in a range , we first
the sink includes forged data or excludes legitimate data, convert the range to a minimum set of prefixes, de-
the query result is guaranteed to be detected by the sink as noted , such that the union of the prefixes is equal
invalid. Besides these two hard requirements, a desirable to . For example, . Given
solution should have low power and space consumption a range , where and are two numbers of bits,
because these wireless devices have limited resources. the number of prefixes in is at most [30].
Second, we compute the prefix family for number . Thus,
if and only if .
IV. PRIVACY FOR ONE-DIMENSIONAL DATA To verify whether using only the op-
To preserve privacy, it seems natural to have sensors encrypt erations of verifying whether two numbers are equal, we convert
data and the sink encrypt queries. However, the key challenge is each prefix to a corresponding unique number using a prefix nu-
how a storage node processes encrypted queries over encrypted mericalization function. A prefix numericalization function
data. needs to satisfy the following two properties: 1) for any prefix
The idea of our solution for preserving privacy is illustrated is a binary string; 2) for any two prefixes and
in Fig. 2. We assume that each sensor in a network shares if and only if . There are many ways
a secret key with the sink. For the data items to do prefix numericalization. We use the prefix numericaliza-
that a sensor collects in time-slot first encrypts the tion scheme defined in [31]. Given a prefix of
data items using key , the results of which are represented as bits, we first insert 1 after . The bit 1 represents a separator
. Then, applies a “magic” function to between and . Second, we replace every * by
the data items and obtains . The message that 0. Note that if there is no * in a prefix, we add 1 at the end of this
the sensor sends to its closest storage node includes both the prefix. For example, is converted to 11100. Given a set of
encrypted data and the associative information . prefixes , we use to denote the resulting set of numeri-
When the sink wants to perform query on a storage calized prefixes. Therefore, if and only if
node, the sink applies another “magic” function on the . Fig. 3 illustrates the process of verifying
range and sends to the storage node. The .
storage node processes the query over encrypted
data collected at time-slot using another B. Submission Protocol
“magic” function . The three “magic” functions , and The submission protocol concerns how a sensor sends its
satisfy the following three conditions. data to a storage node. Let be data items that
1) A data item is in range if and only sensor collects at a time-slot. Each item is
if is true. This condition al- in the range , where and denote the lower
lows the storage node to decide whether should be and upper bounds, respectively, for all possible data items that
included in the query result. a sensor may collect. The values of and are known
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN AND LIU: PRIVACY- AND INTEGRITY-PRESERVING RANGE QUERIES IN SENSOR NETWORKS 5

1) Compute prefix families and .


2) Numericalize all prefixes, i.e., compute
.
3) Apply to each numericalized prefix, i.e., compute
and .
4) Send as a
query to the storage node.
The above steps show that the aforementioned “magic” func-
tion is defined as follows:
Fig. 3. Prefix membership verification.

to both sensors and the sink. After collecting data items, Because of the one-wayness and collision resistance properties
performs the following steps. of the HMAC function, the storage node cannot compute and
1) Sort the data items in an ascending order. For simplicity, from the query that it receives.
we assume . If some
data items have the same value, we simply represent them D. Query Processing
as one data item annotated with the number of such data Upon receiving query
items. , the storage node processes this
2) Convert the ranges query on the data items received from
to their corresponding prefix representation, i.e., compute each nearby sensor at time-slot based on the following
. theorem.
3) Numericalize all prefixes. That is, compute Theorem 4.1: Given numbers sorted in the ascending order
. , where , and a
4) Compute the keyed Hash Message Authentication Code range if and only if
(HMAC) of each numericalized prefix using key , which there exist such that the following
is known to all sensors and the sink. Examples of two conditions hold:
HMAC implementations include HMAC-MD5 and 1)
HMAC-SHA1 [32]–[34]. An HMAC function using
key , denoted , satisfies the one-wayness
property (i.e., given , it is computationally
infeasible to compute and ) and the collision 2)
resistance property (i.e., it is computationally
infeasible to find two distinct numbers and such
that ). Given a set of
numbers , we use to denote the resulting Proof: Note that if and only
set of numbers after applying function to if there exist and
every number in . In summary, this step computes such that and .
. Furthermore, if and only if
5) Encrypt every data item with key , i.e., compute
. and if and only if
6) Sensor sends the encrypted data along with
Based on Theorem 4.1, the storage node searches for the
to its closest storage node. smallest and the largest such
The above steps show that the aforementioned “magic” func- that and . If , the data
tion is defined as follows: items are in the range ; if ,
no data item is in the range .
In fact, there is another privacy-preserving scheme. First,
sensor converts each data value to a prefix family ,
and then applies the numericalization and hash functions
Due to the one-wayness and collision resistance properties of . Second, the sink converts a given range
the HMAC function, given and the encrypted query to a set of prefixes , and then applies the
data items , the storage node cannot compute numericalization and hash functions .
the value of any data item. Finally, the storage node checks whether
has a common element with . However,
C. Query Protocol this privacy-preserving scheme is not compatible with the
The query protocol concerns how the sink sends a range integrity-preserving scheme that we will discuss in Section V
query to a storage node. When the sink wants to perform query because this privacy-preserving scheme does not allow storage
on a storage node, it performs the following four nodes to identify the positions of and (from the range
steps. Note that any range query satisfies the condition query ) among if no data item satisfies the
. query, while our integrity-preserving scheme requires storage
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE/ACM TRANSACTIONS ON NETWORKING

Fig. 4. Merkle hash tree for eight data items.


Fig. 5. Data integrity verification.

nodes to know such information in order to compute integrity


verification objects. We first discuss what a sensor needs to send to its nearest
storage node along its data items. Each time sensor wants to
V. INTEGRITY FOR ONE-DIMENSIONAL DATA send encrypted data items to a storage node, it first computes a
Merkle hash tree over the encrypted data items, and then sends
The meaning of data integrity is twofold in this context. In the root value along with the encrypted data items to a storage
the result that a storage node sends to the sink in responding to node. Note that among all the nodes in the Merkle hash tree, only
a query, first, the storage node cannot include any data item that the root is sent from sensor to the storage node because the
does not satisfy the query; second, the storage node cannot ex- storage node can compute all other nodes in the Merkle hash
clude any data item that satisfies the query. To allow the sink to tree by itself.
verify the integrity of a query result, the query response from a Next, we discuss what a storage node needs to send to the
storage node to the sink consists of two parts: 1) the query result sink along a query result, i.e., what should be included in a
, which includes all the encrypted data items that satisfy the verification object. For the storage node that is near to sensor
query; 2) the verification object , which includes informa- , each time it receives a query from the sink, it
tion for the sink to verify the integrity of . To achieve this first finds the data items that are in the range . Second, it
purpose, we propose two schemes based on two different tech- computes the Merkle hash tree (except the root) from the data
niques: Merkle hash trees and neighborhood chains. items. Third, it sends the query result and the verification
object to the sink. Given data items
A. Integrity Scheme Using Merkle Hash Trees in a storage node, where , and a range ,
Our first integrity-preserving mechanism is based on Merkle where
hash trees [25]. Each time a sensor sends data items to storage and , and the query result
nodes, it constructs a Merkle hash tree for the data items. Fig. 4 , the storage node should in-
shows a Merkle hash tree constructed for eight data items. Sup- clude and in the verification object
pose sensor wants to send encrypted data items because and ensure that the query re-
to a storage node. Sensor first builds a sult does include all data items that satisfy the query as the
Merkle hash tree for the data items, which is a com- query result is bounded by them. We call the left
plete binary tree. The terminal nodes are , where bound of the query result, and the right bound
for every . Function is a one-way of the query result. Note that the left bound and
hash function such as MD5 [33] or SHA-1 [34]. The value of the right bound may not exist. If , the
each nonterminal node , whose children are and , is the left bound does not exist; if , the right
hash of the concatenation of ’s value and ’s value. For ex- bound does not exist. The verification object in-
ample, in Fig. 4, . Note that if the number cludes zero to two encrypted data items and proof
of data items is not a power of 2, interim hash values that nodes in the Merkel hash tree that are needed for the sink to
do not have a sibling value to which they may be concatenated verify the integrity of the query result. Taking the example in
are promoted, without any change, up the tree until a sibling is Fig. 5, suppose a storage node has received eight data items
found. Note that the resulting Merkle hash tree will not be bal- that
anced. For the example Merkle hash tree in Fig. 4, if we remove sensor collected at time , and the sink wants to perform the
the nodes , and let , the query on the storage node. Using Theorem 4.1, the
resulting unbalanced tree is the Merkle hash tree for five data storage node finds that the query result includes ,
items. and , which satisfy the query. Along with the query
The Merkle hash tree used in our solution has two spe- result (i.e., the three data items), the storage node also sends
cial properties that allow the sink to verify query result , and , which are marked gray in
integrity. First, the value of the root is computed using a Fig. 5, to the sink as the verification object.
keyed HMAC function, where the key is , the key shared Next, we discuss how the sink uses Merkle hash trees to verify
between sensor and the sink. For example, in Fig. 4, query result integrity. Upon receiving a query result
. Using a keyed HMAC function and its verification object, the sink com-
gives us the property that only sensor and the sink can com- putes the root value of the Merkle hash tree and then verifies the
pute the root value. Second, the terminal nodes are arranged in integrity of the query result. Query result integrity is preserved
an ascending order based on the value of each data item . if and only if the following four conditions hold.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN AND LIU: PRIVACY- AND INTEGRITY-PRESERVING RANGE QUERIES IN SENSOR NETWORKS 7

3) If , the sink can detect this error


because it knows the existence of from the item
in and satisfies the query.
Fig. 6. Example neighborhood chain. 4) If , the sink can verify this fact because the item
in should satisfy the property
.
1) The data items in the query result do satisfy the query.
2) If the left bound exists, verify that VI. QUERIES OVER MULTIDIMENSIONAL DATA
and is the nearest left neighbor of in
the Merkle hash tree; otherwise, verify that is the Sensor collected data and sink issued queries are typically
leftmost encrypted data item in the Merkle hash tree. multidimensional as most sensors are equipped with multiple
3) If the right bound exists, verify that sensing modules such as temperature, humidity, pressure, etc.
and is the nearest right neighbor of in A -dimensional data item is a -tuple , where
the Merkle hash tree; otherwise, verify that is the each is the value for the th dimension (i.e.,
rightmost encrypted data item in the Merkle hash tree. attribute). A -dimensional range query consists of subqueries
4) The computed root value is the same as the root value in- , where each subquery
cluded in . is a range over the th dimension.
Note that sorting data items is critical in our scheme for en-
A. Privacy for Multidimensional Data
suring the integrity of query result. Without this property, it
is difficult for a storage node to prove query result integrity We extend our privacy-preserving techniques for one-dimen-
without sending all data items to the sink. sional data to multidimensional data as follows. Let
denote the -dimensional data items that a sensor col-
B. Integrity Scheme Using Neighborhood Chains lects at time-slot , where .
We first present a new data structure called neighborhood First, encrypts these data with its secret key . Second,
chains and then discuss its use in integrity verification. Given for each dimension applies the “magic” function and
data items , where , obtains . At last, sends the encrypted data
we call the list of items encrypted using key items and
, the to a nearby storage node. For example, sensor collects
neighborhood chain for the data items. Here “ ” denotes five two-dimensional data items (1, 11), (3, 5), (6, 8), (7, 1),
concatenation. For any item in the chain, we call and (9, 4) at time-slot , and it will send the encrypted data
the value of the item, and the right neighbor of items as well as and to a
the item. Fig. 6 shows the neighborhood chain for the five data nearby storage node. When the sink wants to perform query
items 1, 3, 5, 7, and 9. on a storage node, the sink ap-
Preserving query result integrity using neighborhood plies the “magic” function on each subquery and
chaining works as follows. After collecting data items sends to the storage node. The
, sensor sends the corresponding neighborhood storage node then applies the “magic” function to find the
chain , query result for each subquery . Here, the three
instead of , to a storage node. Given “magic” functions , and are the same as the “magic”
a range query , the storage node computes as functions defined in Section IV. Finally, the storage node com-
usual. The corresponding verification object only putes as the query result. Considering
consists of the right neighbor of the largest data item in the above example, given a range query ([2, 7],[3, 8]), the query
. Note that always consists of one item for any result for the subquery [2, 6] is the encrypted data items
query. If , of (3, 5), (6, 8) and the query result for the subquery [3, 8]
then ; if , suppose is the encrypted data items of (9, 4), (3, 5), (6, 8). Therefore,
, then . the query result is the encrypted data items of (3, 5), (6, 8).
After the sink receives and , it verifies the integrity
B. Integrity for Multidimensional Data
of as follows. First, the sink verifies that every item in
satisfies the query. Assume that the sink wants to perform the We next present two integrity-preserving schemes for mul-
range query over the data items in Fig. 6. The storage tidimensional data: One builds a Merkle hash tree for each di-
node calculates to be and mension, and the other builds a multidimensional neighborhood
to be . Second, the sink verifies that the storage chain.
node has not excluded any item that satisfies the query. Let 1) Integrity Scheme Using Merkle Hash Trees: To preserve
be the integrity, sensor first computes Merkle hash trees over
correct query result and be the query result from the storage the encrypted data items along dimensions. In the th
node. We consider the following four cases. Merkle hash tree, the data items are sorted according to the
1) If there exists such that , values of the th attribute. Second, sends the root values
the sink can detect this error because the items in do to a storage node along with the encrypted data items. For a
not form a neighborhood chain. storage node that is near to sensor , each time it receives a
2) If , the sink can detect this error be- query , it first finds the query result
cause it knows the existence of from for each range . Second, it chooses the query result
and satisfies the query. that contains the smallest number of encrypted data items
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE/ACM TRANSACTIONS ON NETWORKING

Fig. 7. Merkle hash trees for two-dimensional data.


Fig. 8. Two-dimensional neighborhood chain.

among . Third, it computes the Merkle hash tree query results. Considering five example two-dimensional
in which the data items are sorted according to the th attribute. data items (1, 11), (3, 5), (6, 8), (7, 1), (9, 4) with lower
Finally, it sends and the corresponding verification object bound (0, 0) and upper bound (15, 15), the corresponding
to the sink. For example, suppose a sensor collects four multidimensional neighborhood chain encrypted with key is
two-dimensional data items ,
in a time-slot. Sensor computes a Merkle hash tree along and . Fig. 8 illustrates this chain, where each
each dimension. Fig. 7 shows the two Merkle hash trees. Given black point denotes an item, two gray points denote the
a two-dimensional range query , the storage lower and upper bounds, solid arrows illustrate the chain
node can find the query results based on the along -dimension, and dashed arrows illustrate the chain
first attribute and based along -dimension.
on the second attribute. Since only contains one encrypted Next, we discuss the operations carried on sensors, storage
data item, the storage node sends to the sink the query result nodes, and the sink using multidimensional chaining.
and the corresponding verification object Sensors: After collecting -dimensional data items at time-
. slot , sensor computes the multidimensional chain for the
Note that the query result of a multidimensional range query items and sends it to a storage node.
may contain data items that do not satisfy the query. After de- Storage Nodes: Given a -dimensional query
cryption, the sink can easily prune the query result by discarding , a storage node first computes
such data items. . Second, it computes
2) Integrity Scheme Using Neighborhood Chains: The basic , where is the
idea is that for each of the values in a data item, we find its smallest set among (i.e.,
nearest left neighbor along each dimension and embed this in- for any ) and is the right bounding
formation when we encrypt the item. Such neighborhood infor- item of the range . Given a multidimensional chain
mation is used by the sink for integrity verification. and a subquery
We first present multidimensional neighborhood chains and along dimension , the right bounding item of
then discuss its use in integrity verification. Let , is the item where .
where for each , denote Fig. 8 shows a query ([2, 6], [3, 8]) with a query result
-dimensional data items. We use and to denote and .
the lower bound and the upper bound of any data item Sink: Upon receiving and , the sink verifies the in-
along dimension . We call and tegrity of as follows. First, it verifies that every item in
the lower bound and upper bound satisfies the query. Second, it verifies that the storage node has
of the data items. For each dimension , we can sort not excluded any item that satisfies the query based on the fol-
the values of data items along the th dimension together lowing three properties.
with and in an ascending order. For ease of 1) The items in should form a chain along one
presentation, we assume for every dimension, say . Thus, if the storage node excludes an item
dimension . In this sorted list, we call whose value in the th dimension is in the middle of this
the left neighboring value of . We use chain, this chaining property would be violated.
to denote the left neighboring value of along 2) The item in that has the smallest value among
dimension . A multidimensional neighborhood chain for the th dimension, say , satisfies the condition
is constructed by encrypting every item that . Thus, if the storage node excludes the
as , which is denoted as item whose value on the th dimension is the beginning of
. We call the value of . Note that the chain, this property would be violated.
when multiple data items have the same value along the th 3) There exists only one item in that is the right bounding
dimension, we annotate with the number of such items item of . Thus, if the storage node excludes the item
in . The list of items encrypted with key whose value on the th dimension is the end of the chain,
forms a this property would be violated.
multidimensional neighborhood chain. The nice property of a
multidimensional neighborhood chain is that all data VII. SAFEQ OPTIMIZATION
items form a neighborhood chain along every dimension. We present an optimization technique based on Bloom fil-
This property allows the sink to verify the integrity of ters [35] to reduce the communication cost between sensors and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN AND LIU: PRIVACY- AND INTEGRITY-PRESERVING RANGE QUERIES IN SENSOR NETWORKS 9

Fig. 9. Example Bloom filter.


Fig. 10. Simulation result of false positive rate.

storage nodes. This cost can be significant because of two rea-


sons. First, in each submission, a sensor needs to convert each verifies that satisfies the above two condi-
range , where and are two numbers of bits, tions, therefore .
to prefix numbers in the worst case. Second, the sensor Although, using our optimization technique, may con-
applies HMAC to each prefix number, which results in a 128-bit tain data items that do not satisfy the query, they can be easily
string if we choose HMAC-MD5 or a 160-bit string if we choose pruned by the sink after decryption. Given a query , using
HMAC-SHA1. Reducing communication cost for sensors is im- this optimization technique, a storage node may find multiple
portant because of power consumption. ranges that contain and multiple ranges that contain due to
Our basic idea is to use a Bloom filter to represent false positives of Bloom filters. In this case, the storage node
. uses the first range that contains and the last range that con-
Thus, a sensor only needs to send the Bloom filter instead of tains to compute the query result.
the hashes to a storage node. The number of bits needed to Bloom filters introduce false positives in the result of a query,
represent the Bloom filter is much smaller than that needed i.e., the data items that do not satisfy the query. We can con-
to represent the hashes. Next, we discuss the operations that trol the false positive rate by adjusting Bloom filter parameters.
sensors and storage nodes need to perform in using this opti- Let denote the average false positive rate and denote the
mization technique. bit length of each number in . For
Sensors: Let be a bit array of simplicity, we assume that each set
size representing the Bloom filter for contains the same number of values, which is denoted as . The
upper bound of the average false positive rate is shown in (1),
that a sensor computes after collecting data items the derivation of which is in Section VIII
assuming . Let be an array of
pointers. For every and for every number (1)
in , the sensor applies hash
functions on , where each hash function To represent
hashes to an integer in the range , and then sets , without Bloom filters,
to be 1 and appends the index to the list the total number of bits required is ; with
that points to. In each submission, the sensor Bloom filters, the total number of bits required is at most
sends and to its closest storage node. For example, , the calculation of which is in
and can Section VIII. Therefore, our optimization technique reduces
be represented as the two arrays in Fig. 9, where “-” denotes the communication cost if
an empty pointer. Note that and
. (2)
The logical meaning of is an array of pointers, each
Fig. 10 shows that the upper bound of the false positive rate
pointing to a list of indices from 0 to . To reduce the
decreases as the number of data items increases or the number of
space used for storing pointers, we implement as a con-
hash functions increases. Based on (1) and (2), assuming
catenation of all these lists separated by delimiters. For
and , to achieve reduction on the communication cost
example, we can represent the array in Fig. 9 as a list
of sensors and the small false positive rate of %, we choose
.
and to be . Note
Storage Nodes: Recall that if and only if
. If that only when , which is unlikely to happen, such
, then does not exist.
there exists at least one number in
such that the following two conditions hold: 1) for every VIII. ANALYSIS OF SAFEQ OPTIMIZATION
is 1; 2) for every , index Let
is included in the list to which points. For example, be the sets of data
to verify whether using the Bloom filter in items that a sensor needs to represent in a Bloom filter. Let
Fig. 9, a storage node can apply the three hash functions be the range query over the data items .
to each number in . For one number Let denote the bit length of each , and . Let
in , the storage node denote the bit length of the numbers after hashing in each
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE/ACM TRANSACTIONS ON NETWORKING

. Let be the number of hash Therefore, the average false positive rate can be computed as
functions in the Bloom filter. follows:
Given two arrays and representing data in
,
for any of bits, a storage node searches the corresponding
index for by applying the hash functions to and check
whether two conditions hold: 1) for every
; 2) for every , index
is included in the list to which points. Let
denote the index that the storage node finds for : If the index
exists (i.e., the above conditions hold), ; otherwise,
.
Based on the analysis of Bloom filters [35], the probability or
is .
The probability is
. Therefore, we have

(3) or

As is the same for any , let (4)


denote the probability . According to our discus-
sion in Section IV-A, each of the two sets Because
and includes -bit numbers. For , and , we
, there exists a range such that derive (1) from the following calculation:
. Therefore, there exists one number in
such that . Let
denote the rest of the numbers in and
denote the minimum index in . Without
loss of generality, we assume is the minimum index. The
probability of can be computed as follows:
Typically, we choose the value
to minimize the probability of false positive for
Bloom filters. Thus, (1) becomes
or
(5)

Next, we discuss under what condition our optimization


technique reduces the communication cost between sen-
sors and storage nodes. To represent data in the sets
,
Similarly, for , there exists a range
without Bloom filters, the total number of bits required is
such that . Therefore, there
; with Bloom filters, the total number of bits required
exists one number in such that
is at most . Note that the number
. Let denote the rest numbers
of bits for representing array is , and the number of bits
in and denote the maximum index in
for representing array is at most .
. Without loss of generality, we assume
Therefore, we derive (2)
is the maximum index. We have
(6)
In case that , (2) becomes
or
(7)

IX. QUERIES IN EVENT-DRIVEN NETWORKS


So far, we have assumed that at each time-slot, a sensor sends
to a storage node the data that it collected at that time-slot. How-
Given a query , ever, this assumption does not hold for event-driven networks,
if the storage node can find or where where a sensor only reports data to a storage node when a certain
, the query result has false positives. event happens. If we directly apply our solution here, then the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN AND LIU: PRIVACY- AND INTEGRITY-PRESERVING RANGE QUERIES IN SENSOR NETWORKS 11

TABLE II
COMPLEXITY ANALYSIS OF SAFEQ

Fig. 11. Example idle periods and data submissions.

sink cannot verify whether a sensor collected data at a time-slot.


The case that a sensor did not submit any data at time-slot and which benign storage nodes are guaranteed to send the needed
the case that the storage node discards all the data that the sensor idle proof.
collected at time-slot are not distinguishable for the sink.
We address the above challenge by sensors reporting their idle X. COMPLEXITY AND SECURITY ANALYSIS
period to storage node each time when they submit data after an
A. Complexity Analysis
idle period or when the idle period is longer than a threshold.
Storage nodes can use such idle period reported by sensors to Assume that a sensor collects -dimensional data items in
prove to the sink that a sensor did not submit any data at any a time-slot, each attribute of a data item is a -bit number, and
time-slot in that idle period. Next, we discuss the operations the HMAC result of each numericalized prefix is a number.
carried on sensors, storage nodes, and the sink. The computation cost, communication cost, and storage space
Sensors: An idle period for a sensor is a time-slot interval of SafeQ are described in Table II. Note that the communica-
, which indicates that the sensor has no data to submit tion cost denotes the number of bytes sent for each submission
from to , including and . Let be the threshold of a or query, and the storage space denotes the number of bytes
sensor being idle without reporting to a storage node. Suppose stored in a storage node for each submission. Furthermore, note
the last time that sensor submitted data or reported idle period that whether sensor nodes report to storage nodes periodically
is time-slot . At any time-slot acts based on three or upon some events has no impact on these costs of one time
cases. sending of data items.
1) : In this case, if has data to submit, then it just
submits the data; otherwise, it takes no action. B. Privacy Analysis
2) : In this case, if has data to In a SafeQ protected two-tiered sensor network, compro-
submit, then it submits data along with encrypted idle pe- mising a storage node does not allow the attacker to obtain the
riod ; otherwise, it takes no action. We call actual values of sensor collected data and sink issued queries.
an idle proof. The correctness of this claim is based on the fact that the
3) : In this case, if has data to submit, hash functions and encryption algorithms used in SafeQ are
then it submits data along with the idle proof ; secure. In the submission protocol, a storage node only receives
otherwise, it submits the idle proof . encrypted data items and the secure hash values of prefixes
Fig. 11 illustrates idle periods for sensor , where each unit converted from the data items. Without knowing the keys used
in the time axis is a time-slot, a gray unit denotes that has data in the encryption and secure hashing, it is computationally
to submit, and a blank unit denotes that has no data to submit. infeasible to compute the actual values of sensor collected data
According to case 2, at time-slot submits data along and the corresponding prefixes. In the query protocol, a storage
with the idle proof . According to case 3, at time-slot node only receives the secure hash values of prefixes converted
submits the idle proof . from a range query. Without knowing the key used in the secure
Storage Nodes: When a storage node receives a query hashing, it is computationally infeasible to compute the actual
from the sink, it first checks whether has values of sink issued queries.
submitted data at time-slot . If has, then the storage node Next, we analyze information leaking if does
sends the query result as discussed in Section IV. Otherwise, not satisfy the one-wayness property. More formally, given ,
the storage node checks whether has submitted an idle proof where and is a numericalized prefix, suppose
for an idle period containing time-slot . If true, then it sends that a storage node takes steps to compute . Recall that
the idle proof to the sink as . Otherwise, it replies to the sink the number of HMAC hashes sent from a sensor is .
saying that it does not have the idle proof containing time-slot To reveal a data item , the storage node needs to reveal all the
at this moment, but once the right idle proof is received, it will numericalized prefixes in . Thus,
forward to the sink. The maximum number of time-slots that to reveal data items, the storage node would take
the sink may need to wait for the right idle proof is . Here, steps. Here, for HMAC.
is a system parameter trading off efficiency and the amount of Note that if a storage node and a sensor are both compro-
time that the sink may have to wait for verifying data integrity. mised, the storage node may reveal the sensor collected data
Smaller favors the sink for integrity verification, and larger and sink issued queries by employing brute-force attacks.
favors sensors for power saving because of less communication In this case, the storage node knows the shared secret key
cost. for the function. Due to the one-wayness property
Sink: Changes on the sink side are minimal. In the case that of , the storage node cannot reveal directly using
lacks the idle proof for verifying the integrity of , it and . However, it can compute the
will defer the verification for at most time-slots, during results of the numericalized prefixes for all possible values in
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE/ACM TRANSACTIONS ON NETWORKING

the data domain in a brute-force manner, and then compare


the results with the received data and queries. Based
on the comparison, the storage node can reveal the sensor
collected data and sink issued queries. However, in practice,
this computational cost could be prohibitive for a large data
domain.

C. Integrity Analysis
For our scheme using Merkle hash trees, the correctness of
this claim is based on the property that any change of leaf nodes
in a Merkle hash tree will change the root value. Recall that the Fig. 12. Network topology in the experiment.
leaf nodes in a Merkle hash tree are sorted according to their
values. In a query response, the left bound of the query result
(if it exists), the query result, and the right bound of the query with 128-bit keys as the hash function for computing encoding
result (if it exists) must be consecutive leaf nodes in the Merkle number. For multidimensional data, we used their optimal
hash tree. If the storage node includes forged data in the query bucket partition algorithm to partition multidimensional data
result or excludes a legitimate data item from the query result, along each dimension. In our experiments, we experimented
the root value computed at the sink will be different from the with different sizes of time-slots ranging from 10 to 80 min.
root value computed at the corresponding sensor. For each time-slot, we generated 1000 random range queries
For our scheme using neighborhood chains, the correctness in the form of , where are two
of this claim is based on the following three properties that random values of temperature, are two random values of
and should satisfy for a query. First, items in form humidity, and are two random values of voltage.
a chain. Excluding any item in the middle or changing any item
violates the chaining property. Second, the first item in B. Evaluation Results
contains the value of its left neighbor, which should be out of the The experimental results from our side-by-side comparison
range query on the smaller end. Third, the last item in show that SafeQ significantly outperforms the S&L scheme for
contains the value of its right neighbor, which should be out of multidimensional data in terms of power and space consump-
the range query on the larger end. tion. For the two integrity-preserving schemes, the neighbor-
hood-chaining technique is better than Merkle hash tree tech-
XI. EXPERIMENTAL RESULTS nique in terms of both power and space consumption. The ra-
tionale for us to include the Merkle hash-tree-based scheme is
A. Evaluation Setup that Merkle hash trees are the typical approach to achieving in-
We implemented both SafeQ and the S&L scheme using tegrity. We use SafeQ-MHT+ and SafeQ-MHT to denote our
TOSSIM [36], a widely used wireless sensor network simulator. schemes using Merkle hash trees with and without Bloom fil-
We measured the efficiency of SafeQ and the S&L scheme on ters, respectively, and we use SafeQ-NC+ and SafeQ-NC to de-
one-, two-, and three-dimensional data. For better comparison, note our schemes using neighborhood chains with and without
we conducted our experiments on the same data set that S&L Bloom filters, respectively.
used in their experiment [7]. The data set was chosen from a Fig. 13(a)–(c) shows the average power consumption of
large real data set from Intel Lab [10], and it consists of the sensors for three-, two-, and one-dimensional data, respec-
temperature, humidity, and voltage data collected by 44 nodes tively, versus different sizes of time-slots. Fig. 14(a)–(c) shows
during March 1–10, 2004. Each data attribute follows Gaussian the average power consumption of storage nodes for three-,
distribution. Note that S&L only conducted experiments on two-, and one-dimensional data, respectively, versus different
the temperature data, while we experimented with both SafeQ sizes of time-slots. We observe that the power consumption
and S&L schemes on one-dimensional data (of temperature), of both sensors and storage nodes grows linearly with the
two-dimensional data (of temperature and humidity), and number of data items, which confirms our complexity anal-
three-dimensional data (of temperature, humidity, and voltage). ysis in Section X-A. Note that the number of collected data
As in [7], we equally divided 44 nodes into four groups and items is in direct proportion to the size of time-slots. For
deployed a storage node for each group. Fig. 12 shows the net- power consumption, in comparison with the S&L scheme,
work topology. The locations of sensors can be found in [10]. our experimental results show that for three-dimensional data,
In implementing SafeQ, we used HMAC-MD5 [32] with SafeQ-NC+ consumes 184.9 times less power for sensors
128-bit keys as the hash function for hashing prefix numbers. and 76.8 times less power for storage nodes; SafeQ-MHT+
We used the DES encryption algorithm in implementing both consumes 171.4 times less power for sensors and 46.9 times
SafeQ and the S&L scheme. In implementing our Bloom filter less power for storage nodes; SafeQ-NC consumes 59.2 times
optimization technique, we chose the number of hash functions less power for sensors and 76.8 times less power for storage
to be 4 (i.e., ), which guarantees that the false positive rate nodes; and SafeQ-MHT consumes 57.9 times less power for
induced by the Bloom filter is less than 1%. In implementing the sensors and 46.9 times less power for storage nodes. For
S&L scheme, we used the parameter values (i.e., two-dimensional data, SafeQ-NC+ consumes 10.3 times less
and ), which are corresponding to the minimum false power for sensors and 9.0 times less power for storage nodes;
positives of query results in their experiments, for computing SafeQ-MHT+ consumes 9.5 times less power for sensors and
optimal bucket partitions as in [7], and we used HMAC-MD5 5.4 times less power for storage nodes; SafeQ-NC consumes
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CHEN AND LIU: PRIVACY- AND INTEGRITY-PRESERVING RANGE QUERIES IN SENSOR NETWORKS 13

Fig. 13. Average power consumption per submission for a sensor. (a) Three-dimensional data. (b) Two-dimensional data. (c) One-dimensional data. (d) For 10 min.

Fig. 14. Average power consumption per query response for a storage node. (a) Three-dimensional data. (b) Two-dimensional data. (c) One-dimensional data.
(d) For 10 min.

Fig. 15. Average space consumption for a storage node. (a) Three-dimensional data. (b) Two-dimensional data. (c) One-dimensional data. (d) Each data item.

2.7 times less power for sensors and 9.0 times less power for power for storage nodes. For space consumption on storage
storage nodes; and SafeQ-MHT consumes 2.6 times less power nodes, SafeQ-NC+ and SafeQ-MHT+ consume about the same
for sensors and 5.4 times less power for storage nodes. Our space, and SafeQ-NC and SafeQ-MHT consume about 1.0
experimental results conform with the theoretical analysis that times more space.
the power consumption in S&L scheme grows exponentially Fig. 15(a)–(c) shows the average space consumption of
with the number of dimensions, whereas in SafeQ it grows storage nodes for three-, two-, and one-dimensional data,
linearly with the number of dimensions times the number of respectively. For space consumption on storage nodes, in com-
data items. parison to the S&L scheme, our experimental results show that
Figs. 13(d) and 14(d) show the average power consumption for three-dimensional data, SafeQ-NC+ consumes 182.4 times
for a 10-min slot for a sensor and a storage node, respectively, less space; SafeQ-MHT+ consumes 169.1 times less space;
versus the number of dimensions of the data. We observe that SafeQ-NC consumes 58.5 times less space; and SafeQ-MHT
there are almost linear correlations between the average power consumes 57.2 times less space. For two-dimensional data,
consumption for both sensors and storage nodes and the number SafeQ-NC+ consumes 10.2 times less space; SafeQ-MHT+
of dimensions of the data, which also confirms our complexity consumes 9.4 times less space; SafeQ-NC consumes 2.7 times
analysis in Section X-A. less space; and SafeQ-MHT consumes 2.6 times less space.
Our experimental results also show that SafeQ is comparable The results conform with the theoretical analysis that the space
to the S&L scheme for one-dimensional data in terms of power consumption in the S&L scheme grows exponentially with the
and space consumption. For power consumption, SafeQ-NC+ number of dimensions, whereas in SafeQ it grows linearly with
consumes about the same power for sensors and 0.7 times less the number of dimensions times the number of data items.
power for storage nodes; SafeQ-MHT+ consumes about the Fig. 15 shows the average space consumption of storage
same power for sensors and 0.3 times less power for storage nodes for each data item versus the number of dimensions of the
nodes; SafeQ-NC consumes 1.0 times more power for sensors data item. For each three-dimensional data item, S&L consumes
and 0.7 times less power for storage nodes; and SafeQ-MHT about over 10 bytes, while SafeQ-NC+ and SafeQ-MHT+
consumes 1.0 times more power for sensors and 0.3 times less consume only 40 bytes.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE/ACM TRANSACTIONS ON NETWORKING

XII. CONCLUSION [19] P. Devanbu, M. Gertz, C. Martel, and S. G. Stubblebine, “Authentic


data publication over the internet,” J. Comput. Security, vol. 11, no. 3,
We make three key contributions in this paper. First, we pro- pp. 291–314, 2003.
pose SafeQ, a novel and efficient protocol for handling range [20] H. Pang and K.-L. Tan, “Authenticating query results in edge com-
queries in two-tiered sensor networks in a privacy- and in- puting,” in Proc. ICDE, 2004, p. 560.
[21] H. Pang, A. Jain, K. Ramamritham, and K.-L. Tan, “Verifying com-
tegrity-preserving fashion. SafeQ uses the techniques of prefix pleteness of relational query results in data publishing,” in Proc. ACM
membership verification, Merkle hash trees, and neighborhood SIGMOD, 2005, pp. 407–418.
chaining. In terms of security, SafeQ significantly strengthens [22] M. Narasimha and G. Tsudik, “Authentication of outsourced databases
using signature aggregation and chaining,” in Proc. DASFAA, 2006, pp.
the security of two-tiered sensor networks. Unlike prior art, 420–436.
SafeQ prevents a compromised storage node from obtaining a [23] W. Cheng, H. Pang, and K.-L. Tan, “Authenticating multi-dimensional
reasonable estimation on the actual values of sensor collected query results in data publishing,” in Proc. DBSec, 2006, pp. 60–73.
[24] H. Chen, X. Man, W. Hsu, N. Li, and Q. Wang, “Access control friendly
data items and sink issued queries. In terms of efficiency, our query verification for outsourced data publishing,” in Proc. ESORICS,
results show that SafeQ significantly outperforms prior art for 2008, pp. 177–191.
multidimensional data in terms of both power consumption and [25] R. Merkle, “Protocols for public key cryptosystems,” in Proc. IEEE
S&P, 1980, pp. 122–134.
storage space. Second, we propose an optimization technique [26] E.-J. Goh, H. Shacham, N. Modadugu, and D. Boneh, “Sirius: Securing
using Bloom filters to significantly reduce the communication remote untrusted storage,” in Proc. NDSS, 2003, pp. 131–145.
cost between sensors and storage nodes. Third, we propose a [27] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu,
“Plutus: Scalable secure file sharing on untrusted storage,” in Proc.
solution to adapt SafeQ for event-driven sensor networks. FAST, 2003, pp. 29–42.
[28] J. Cheng, H. Yang, S. H. Wong, and S. Lu, “Design and implementation
REFERENCES of cross-domain cooperative firewall,” in Proc. IEEE ICNP, 2007, pp.
[1] F. Chen and A. X. Liu, “SafeQ: Secure and efficient query processing 284–293.
in sensor networks,” in Proc. IEEE INFOCOM, 2010, pp. 1–9. [29] A. X. Liu and F. Chen, “Collaborative enforcement of firewall policies
[2] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, in virtual private networks,” in Proc. ACM PODC, 2008, pp. 95–104.
and F. Yu, “Data-centric storage in sensornets with GHT, a geographic [30] P. Gupta and N. McKeown, “Algorithms for packet classification,”
hash table,” Mobile Netw. Appl., vol. 8, no. 4, pp. 427–442, 2003. IEEE Netw., vol. 15, no. 2, pp. 24–32, Mar.–Apr. 2001.
[3] P. Desnoyers, D. Ganesan, H. Li, and P. Shenoy, “Presto: A predictive [31] Y.-K. Chang, “Fast binary and multiway prefix searches for packet for-
storage architecture for sensor networks,” in Proc. HotOS, 2005, p. 23. warding,” Comput. Netw., vol. 51, no. 3, pp. 588–605, 2007.
[4] D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos, and W. A. [32] H. Krawczyk, M. Bellare, and R. Canetti, “HMAC: Keyed-hashing for
Najjar, “Microhash: An efficient index structure for flash-based sensor message authentication,” RFC 2104, 1997.
devices,” in Proc. FAST, 2005, pp. 31–44. [33] R. Rivest, “The md5 message-digest algorithm,” RFC 1321, 1992.
[5] B. Sheng, Q. Li, and W. Mao, “Data storage placement in sensor net- [34] D. Eastlake and P. Jones, “Us secure hash algorithm 1 (sha1),” RFC
works,” in Proc. ACM MobiHoc, 2006, pp. 344–355. 3174, 2001.
[6] B. Sheng, C. C. Tan, Q. Li, and W. Mao, “An approximation algorithm [35] B. Bloom, “Space/time trade-offs in hash coding with allowable er-
for data storage placement in sensor networks,” in Proc. WASA, 2007, rors,” Commun. ACM vol. 13, no. 7, pp. 422–426, 1970.
pp. 71–78. [36] P. Levis, “Simulating TinyOS networks,” 2003 [Online]. Available:
[7] B. Sheng and Q. Li, “Verifiable privacy-preserving range query in two- http://www.cs.berkeley.edu/~pal/research/tossim.html
tiered sensor networks,” in Proc. IEEE INFOCOM, 2008, pp. 46–50.
[8] Xbow, “Stargate gateway (spb400),” 2011 [Online]. Available: http:// Fei Chen received the B.S. and M.S. degrees in au-
www.xbow.com tomation from Tsinghua University, Beijing, China,
[9] W. A. Najjar, A. Banerjee, and A. Mitra, “RISE: More powerful, en- in 2005 and 2007, respectively, and the Ph.D. degree
ergy efficient, gigabyte scale storage high performance sensors,” 2005 in computer science and engineering from Michigan
[Online]. Available: http://www.cs.ucr.edu/~rise State University, East Lansing, in 2011.
[10] S. Madden, “Intel lab data,” 2004 [Online]. Available: http://berkeley. He is currently a Member of Technical Staff with
intel-research.net/labdata VMware, Inc., Palo Alto, CA. His research interests
[11] J. Shi, R. Zhang, and Y. Zhang, “Secure range queries in tiered sensor focus on networking, privacy, and security.
networks,” in Proc. IEEE INFOCOM, 2009, pp. 945–953.
[12] R. Zhang, J. Shi, and Y. Zhang, “Secure multidimensional range
queries in sensor networks,” in Proc. ACM MobiHoc, 2009, pp.
197–206.
[13] H. Hacigümüş, B. Iyer, C. Li, and S. Mehrotra, “Executing SQL over
encrypted data in the database-service-provider model,” in Proc. ACM
SIGMOD, 2002, pp. 216–227. Alex X. Liu received the Ph.D. degree in computer
[14] B. Hore, S. Mehrotra, and G. Tsudik, “A privacy-preserving index for science from the University of Texas at Austin in
range queries,” in Proc. VLDB, 2004, pp. 720–731. 2006.
[15] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, “Order preserving He is currently an Assistant Professor with the
encryption for numeric data,” in Proc. ACM SIGMOD, 2004, pp. Department of Computer Science and Engineering,
563–574. Michigan State University (MSU), East Lansing.
[16] D. X. Song, D. Wagner, and A. Perrig, “Practical techniques for His research interests focus on networking, security,
searches on encrypted data,” in Proc. IEEE S&P, 2000, pp. 44–55. and dependable systems.
[17] P. Golle, J. Staddon, and B. Waters, “Secure conjunctive keyword Dr. Liu received the IEEE and IFIP William C.
search over encrypted data,” in Proc. ACNS, 2004, pp. 31–45. Carter Award in 2004 and an NSF CAREER Award in
[18] D. Boneh and B. Waters, “Conjunctive, subset, and range queries on 2009. He received the MSU College of Engineering
encrypted data,” in Proc. TCC, 2007, pp. 535–554. Withrow Distinguished Scholar Award in 2011.

You might also like