Paperv5 Acm Format
Paperv5 Acm Format
k
i=1
x
i
, of the individual values sent by its k children, but
also the sum of their squares: V =
k
i=1
x
2
i
. Eventually, the sink obtains two values: the sum
of the actual samples which it can use to compute the mean and the sum of the squares which
it can use to compute the variance:
V ar = E(x
2
) E(x)
2
; where
E(x
2
) = (
n
i=1
x
2
i
)/n and E(x) = (
n
i=1
x
i
)/n
3. GOALS AND SECURITY MODEL
To provide data privacy, our goal is to prevent an attacker from gaining any information about
sensor data beside what can be inferred by the measurements done directly by the attacker.
We dene the privacy goal by the standard notion of semantic security in this work.
An attacker is assumed to be global, i.e., able to monitor any location in the network or
even the entire WSN. Furthermore, we assume the attacker is able to read the internal state
of some sensors. The attacker is also supposed to be able to corrupt a subset of sensor nodes.
We assume the attacker can launch chosen plaintext attacks only. That is, the attacker is
able to obtain the ciphertext of any plaintext of his choice. In the real situation, this means
the attacker could manipulate the sensing environment and obtain the desired ciphertext by
eavesdropping.
In light of our requirement for end-to-end privacy between the sensors and the sink, additive
aggregation, although otherwise simple, becomes problematic. This is largely because popular
block and stream ciphers, such as AES [NIST 2001] or RC5 [Rivest 1995], are not additively
homomorphic. In other words, the summation of encrypted values does not allow for the
retrieval of the sum of the plaintext values.
To minimize trust assumptions, we assume that each of the n sensors shares a distinct long-
term key with the sink, called the encryption key. This key is originally derived, using a
pseudo-random function (PRF), from the master secret, which is only known to the sink. We
denote the sinks master secret as K and the long-term sensor/sink shared key as ek
i
, where
the subscript 0 < i n uniquely identies a particular sensor. This way, the sink only needs
to store a single master secret and all long-term keys can be recomputed as needed.
As opposed to encryption, authentication schemes that allow for aggregation seem to be very
dicult, and perhaps impossible, to design. It should be noted that the problem of aggregate
authentication considered in this paper is dierent from the problem considered in aggregate
signatures [Boneh et al. 2003]; more precisely, the latter should be called aggregatable signa-
3
We assume that an aggregating node has its own measurement to contribute; thus k additions are needed.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
6
tures instead. In aggregate authentication, it is the messages themselves being aggregated and
hence the original messages are not available for verication, whereas, in aggregate signatures,
the signatures for dierent messages are aggregated and all the signed messages have to be dis-
tinct and available to the verication algorithm in order to verify the validity of an aggregate
signature. Consequently, it is fair to say there could be no secure aggregate authentication
scheme (which is existentially unforgeable against chosen message attacks) in the literature. As
explained in [Wagner 2004], other techniques are likely needed to verify the plausibility of the
resulting aggregate and to increase the aggregation resiliency.
In WSNs, providing end-to-end aggregate authentication seems to be dicult since messages
lose their entropies through aggregation, rendering it dicult to verify the validity of a given
aggregate. But it is still possible to prevent unauthorized nodes from injecting fake packets in
the networks. That is, groupwise message authentication can be achieved in which only nodes
knowing a common group key can contribute to an aggregate and produce valid authentication
tags that would pass a prescribed verication test at the sink. Note that the scheme would
be vulnerable to compromised nodes. We give an end-to-end message authentication scheme
providing such access control assuming outsider-only attacks.
4. ADDITIVELY AGGREGATE ENCRYPTION
Encrypted data aggregation or aggregate encryption is sometimes called concealed data aggre-
gation (CDA), a term coined by Westho et. al. [Westho et al. 2006]. Appendix A gives an
abstract description of CDA showing the desired functionalities.
In this section we describe the notion of homomorphic encryption and provide an example.
Our notion is a generalized version of the widely used one for homomorphic encryption we
allow the homomorphism be under dierent keys while the homomorphism in common notions is
usually under the same key. We then proceed to present our additively homomorphic encryption
scheme whose security analysis is given in Section 6 and Appendix B. The encryption technique
is very well-suited for privacy-preserving additive aggregation. For the sake of clarity, in Section
4.2, we will rst describe a basic scheme assuming the encryption keys are randomly picked in
each session (which is the same scheme as given in our earlier work [Castelluccia et al. 2005]);
the header part is also excluded in the discussion. Then we will give a concrete construction
in which the session keys and the encryption keys are derived using a pseudorandom function
family. The concrete construction can be proved to be semantically secure in the CDA model
[Chan and Castelluccia 2007], the details of which are given in Appendix A. Compared to our
earlier work [Castelluccia et al. 2005], this paper provides the details of a concrete construction
using a pseudorandom function in Section 4.3, with the security requirements on the used
components specied.
Our scheme can be considered as a practical, tailored modication of the Vernam cipher
[Vernam 1926], the well-known one-time pad, to allow plaintext addition to be done in the
ciphertext domain. Basically, there are two modications. First, the exclusive-OR operation is
replaced by an addition operation. By choosing a proper modulus, multiplicative aggregation
is also possible.
4
Second, instead of uniformly picking a key at random from the key space,
the key is generated by a certain deterministic algorithm (with an unknown seed) such as a
pseudorandom function [Goldreich 2001]; this modication is actually the same as that in a
4
Our construction can achieve either additive or multiplicative aggregation but not both at the same time.
Besides, multiplication aggregation seems to bear no advantage as the size of a multiplicative aggregate is the
same as the sum of the size of its inputs.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
7
stream cipher. As a result, the information-theoretic security (which requires the key be at
least as long as the plaintext) in the Vernam cipher is replaced with a security guarantee in the
computational-complexity theoretic setting in our construction.
4.1 Homomorphic Encryption
A homomorphic encryption scheme allows arithmetic operations to be performed on cipher-
texts. One example is a multiplicatively homomorphic scheme, whereby the multiplication of
two ciphertexts followed by a decryption operation yields the same result as, say, the mul-
tiplication of the two corresponding plaintext values. Homomorphic encryption schemes are
especially useful in scenarios where someone who does not have decryption keys needs to per-
form arithmetic operations on a set of ciphertexts. A more formal description of homomorphic
encryptions schemes is as follows.
Let Enc() denote a probabilistic encryption scheme. Let M be the message space and C the
ciphertext space such that M is a group under operation and Enc() is a -homomorphic
encryption scheme if for any instance Enc() of the encryption scheme, given c
1
= Enc
k1
(m
1
)
and c
2
= Enc
k2
(m
2
) for some m
1
, m
2
M, there exists an ecient algorithm which can
generate from c
1
and c
2
a valid ciphertext c
3
C for some key k
3
such that
c
3
= Enc
k3
(m
1
m
2
)
In other words, decrypting c
3
with k
3
would yield m
1
m
2
. In this paper, we mainly consider
additive homomorphism, i.e. is the + operation. We do not restrict k
1
, k
2
, k
3
to be the same
despite that they are usually equal in common homomorphic encryption schemes. Since k
3
could be dierent from k
1
, k
2
, some identifying information, say, denoted by hdr, needs to be
attached to a ciphertext to indicate which keys are required to decrypt the ciphertext.
A good example is the RSA cryptosystem[Rivest et al. 1978] which is multiplicatively homo-
morphic under a single key. The RSA encryption function is Enc(m) = m
e
= c (mod n) and
the corresponding decryption function is Dec(c) = c
d
= m (mod n) where n is a product of two
suitably large primes (p and q), e and d are encryption and decryption exponents, respectively,
such that e d = 1 (mod (p 1)(q 1)).
Given two RSA ciphertexts c
1
and c
2
, corresponding to respective plaintexts m
1
and m
2
,
it is easy to see that c
1
c
2
m
e
1
m
e
2
(m
1
m
2
)
e
(mod n). Hence, one can easily compute the
multiplication of the ciphertexts (c
1
c
2
) to obtain the ciphertext corresponding to the plaintext
m = m
1
m
2
(mod n). Note that c
1
, c
2
and the resulting ciphertext after multiplication are all
under the same decryption key d and no hdr is thus needed.
4.2 Basic Encryption Scheme using Random Keys
We now introduce a simple additively homomorphic encryption technique. The main idea of
our scheme is to replace the xor (Exclusive-OR) operation typically found in stream ciphers
with modular addition (+). For the sake of clarity, the inclusion of hdr (the information to
identify decryption keys) and pseudorandom functions is deferred to the discussion in Section
4.3. The basic scheme is as follows.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
8
Basic Additively Homomorphic Encryption Scheme
Encryption:
(1) Represent message m as an integer m [0, M1] where M is the modulus
of arithmetics.
(2) Let k be a randomly generated keystream, where k [0, M 1].
(3) Compute c = Enc
k
(m) = m + k mod M.
Decryption:
(1) Dec
k
(c) = c k mod M.
Addition of Ciphertexts:
(1) Let c1 = Enc
k
1
(m1) and c2 = Enc
k
2
(m2).
(2) The aggregated ciphertext is: c
l
= c1 + c2 mod M = Enc
k
(m1 + m2)
where k = k1 + k2 mod M.
The correctness of the aggregation is assured if M is suciently large. The explanation
is as follows: c
1
= m
1
+ k
1
mod M and c
2
= m
2
+ k
2
mod M, then c
l
= c
1
+ c
2
mod M =
(m
1
+m
2
)+(k
1
+k
2
) mod M = Enc
k1+k2
(m
1
+m
2
). For k = k
1
+k
2
, Dec
k
(c
l
) = c
l
k mod M =
(m
1
+m
2
) + (k
1
+k
2
) (k
1
+k
2
) mod M = m
1
+m
2
mod M.
We assume that 0 m < M. Note that if n dierent ciphers c
i
are added together, then
M must be larger than
n
i=1
m
i
, otherwise correctness is not provided. In fact if
n
i=1
m
i
is larger than M, decryption will result in a value m
N
be a pseudorandom function family where F
= f
s
: 0, 1
0, 1
s{0,1}
.
Most provably secure pseudorandom functions such as [Naor et al. 2002] are based on the
hardness of some number-theoretic problems. However, these constructions are usually com-
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
9
putationally expensive for sensor nodes. Instead, key derivation in practice is usually based
on functions with conjectured or assumed pseudorandomness, that is, pseudorandomness or
unpredictability is inherently assumed in the construction rather than proven to follow from
the hardness of some computational problems. One typical example is the use of cryptographic
hash functions for key derivation such as [Perrig et al. 2001]. Even HMAC [Bellare et al. 1996]
and OMAC [Iwata and Kurosawa 2003] are constructed based on assumed pseudorandomness,
with the former assuming the underlying hash function in the construction has a certain pseu-
dorandomness property and the latter assuming the block cipher in use is a pseudorandom
permutation.
The proposed additive aggregate encryption scheme in this paper does not pose a restriction
on which type of pseudorandom functions should be used. A conjectured pseudorandom func-
tion can be used for the sake of eciency. The security guarantee provided by the proposed
construction holds as long as the underlying pseudorandom function has the widely dened
property of pseudorandomness or indistinguishability. If such an indistinguishability property
no longer holds or is broken for the pseudorandom function in use, we can simply replace it with
a better pseudorandom function (with its indistinguishability property yet to be broken) for the
proposed aggregate encryption to remain secure. It should be emphasized that the mentioned
indistinguishability or pseudorandomness property is also an inherent requirement on the hash
function to be used as a key derivation function [Perrig et al. 2001; Bellare et al. 1996] which
is also used in the IPSec standard. That is, if a given hash is not suitable for the proposed
aggregate encryption scheme due to its weakness, the same weakness would also undermine the
security foundation of these key derivation functions.
4.3.2 Length-matching Hash Function. The length-matching hash function h : 0, 1
0, 1
l
matches the length of the output of the pseudorandom function f to the modulus size
of M, that is, M is assumed to be l bits long. The purpose of h is to shorten a long bit-string
rather than to produce a ngerprint of a message; hence, unlike cryptographic hash functions,
h is not required to be collision resistant. The only requirement on h is: t 0, 1
: h(t)
has a uniform distribution over 0, 1
l
. That is, by uniformly picking an input from the domain
of h, the resulting output distribution is uniform over the range of h.
This requirement is pretty loose and many compression maps from 0, 1
to 0, 1
l
work.
For instance, h can be implemented by truncating the output of the pseudorandom function
and taking the least signicant l bits as output. The suciency of this requirement on h is
based on the assumption that an ideal pseudorandom function is used. For such a function,
without knowledge of the seed key, it is unpredictable whether an output bit is 0 or 1 for
all input. In practice, key derivation is usually based on conjectured pseudorandom functions
with unproven pseudorandomness; for example, a collision resistant hash function is commonly
used for deriving secret keys from a seed [Perrig et al. 2001; Bellare et al. 1996]. Hence, it
might be the case that, for some input to these conjectured pseudorandom functions, there
is a higher chance (greater than
1
2
) to predict some output bit successfully. To tolerate the
imperfectness of conjectured pseudorandom functions, if l[, a better construction could be as
follows: truncate the output of the pseudorandom function into smaller strings of length l and
then take exclusive-OR on all these strings and use it as the output of h.
Assume there is a sink and n nodes in the system. In the following description, f is a
pseudorandom function for key stream generation and h is a length-matching hash function.
The details of the proposed aggregate encryption scheme are as follows.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
10
Additively Homomorphic Encryption Scheme using a Pseudorandom Function Family
Assume the modulus is M.
Key Generation:
(1) Randomly pick K {0, 1}
ihdr
h(f
ek
i
(r))) mod M (where K =
ihdr
h(f
ek
i
(r))), and output the plaintext aggregate x.
Addition of Ciphertexts:
(1) Given two CDA ciphertexts (hdri, ci) and (hdrj, cj), compute c
l
= (ci +cj) mod M
(2) Set hdr
l
= hdri hdrj.
(3) Output (hdr
l
, c
l
).
The keystream for a node is now generated from its secret key ek
i
and a unique message ID
or nonce r. No randomness in the nonce is needed. This secret key is pre-computed and shared
between the node and the sink, while the nonce can either be included in the query from the
sink or derived from the time period in which the node is sending its values in (assuming some
form of synchronization).
5. AGGREGATION OF ENCRYPTED DATA
As previously noted, ecient aggregation in WSNs becomes very challenging when end-to-end
privacy of data is required. One solution is to disregard aggregation altogether in favor of
privacy, i.e., for sensor nodes to forward to their parents their own encrypted measurements,
as well as measurements received from their children. The sink, upon receiving as many data
packets as there are responding sensors, proceeds to decrypt all ciphertexts and sums them up
in order to compute the desired statistical measurements. We term this approach as No-Agg.
This approach has two obvious disadvantages. First, because all packets are forwarded towards
the sink, a lot of bandwidth (and hence power) is consumed. Second, as illustrated later in
Section 7.2, there is an extreme imbalance between sensors in terms of the amount of data
communicated. Sensors closer to the sink send and receive up to several orders of magnitude
more bits than those on the periphery of the spanning tree.
A second approach, that does not achieve end-to-end privacy but does aggregate data, is a
hop-by-hop (HBH) encryption method, which is also used for comparison between aggregation
methods in [Girao et al. 2004]. In HBH all nodes create pair-wise keys with their parents and
children during a boot strapping phase. When answering a query, nodes decrypt any packets
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
11
sent to them, aggregate this data together with their own before re-encrypting the aggregated
result and forwarding this to their parent. This approach is obviously more bandwidth ecient
than No-Agg, as no packet is sent twice. However, there is an associated cost involved with the
decryption and encryption performed at every non-leaf node in the WSN which increases their
energy consumption (see [Girao et al. 2004]). More importantly, from a privacy perspective,
the HBH scheme leaves nodes vulnerable to attacks because their aggregated data will appear
in plaintext (i.e., no end-to-end privacy). Especially nodes closer to the sink become attractive
targets for an attacker, as their aggregated values represent a large portion of the data in the
WSN.
We instead propose an end-to-end privacy preserving aggregation approach (denoted as AGG)
in which each sensor encrypts their sensed data using the encryption scheme presented in Section
4.3. Since this scheme is additively homomorphic, values can be added (aggregated) as they
are forwarded towards the sink. The sink can then retrieve from the aggregate it receives the
sum of the samples and derive certain statistical data. AGG retains the positive qualities of
both the No-Agg (end-to-end privacy) and HBH (energy ecient) solutions. Note that a piece
of identifying information is needed for each ciphertext in No-Agg to allow the sink to decide
which key to use for decrypting a particular ciphertext in the list of received ciphertexts. This
identifying information has roughly the same size as hdr in AGG.
5.1 Computing Statistical Data
In this section, we show how the new additively homomorphic encryption scheme can be used
to aggregate encrypted data such that the sink can still compute the average and variance.
Since multiple moduli may be used for dierent instances of the aggregate encryption scheme
in the following discussion, the modulus used for encryption and decryption will be explicitly
specied in the notation for clarity. For example, Enc
k
(x; M) means encrypting x using key k
with public parameter M (the modulus).
5.1.1 Computing the Average. When using our scheme, each sensor encrypts its data x
i
to
obtain c
xi
= Enc
ki
(x
i
; M). M needs to be chosen large enough to prevent an overow so it
is set as M = n t, where t is the range of possible measurement values and n is the number
sensor nodes. Each ciphertext c
xi
is therefore log(M) = log(t) +log(n) bits long.
The sensor then forwards c
xi
along with the key identifying information hdr
xi
to its par-
ent, who aggregates all the c
xj
s of its k children by simply adding them up (this addition is
performed modulo M). The resulting value is then forwarded. The sink ends up with value
C
x
=
n
i=1
c
xi
mod M associated with hdr which indicates the key set k
1
, ..., k
i
, ..., k
n
. It
can then compute S
x
= Dec
K
(C
x
; M) = C
x
K mod M, where K =
n
i=1
k
i
, and derive the
average as follows: Avg = S
x
/n.
5.1.2 Computing the Variance. As mentioned previously, our scheme can also be used to
derive the variance of the measured data. Two moduli will be used, M for the sum of values
and M
i
(y
i
; M
= n t
2
.
Each ciphertext c
yi
is therefore log(M
n
i=1
c
yi
mod M. C
x
is used to
compute the average Av. C
y
is used to compute the variance as follows: The sink computes
V
x
= Dec
K
(C
y
; M
) = C
y
K
mod M
, where K
n
i=1
k
i
. The variance is then equal to
V
x
/n Av
2
.
5.2 Robustness
s
i+1
s
i+2
s
i+k
Sink
...
s
j+1
s
j+2
s
j+k
...
s
m+1
s
m+k
... ...
...
Fig. 1. Multi-level WSN model with nodes of degree k
An important consequence of using our proposed encryption scheme for aggregation in WSNs
is that the sink node needs to be aware of the encryptors ids such that it can regenerate the
correct keystream for decryption purposes.
Because WSNs are not always reliable, it cannot be expected that all nodes reply to all
requests. Therefore there needs to be a mechanism for communicating the ids of the non-
responding nodes to the base station. The simplest approach, and the one we used in our
evaluation, is for the sensors to append their respective node ids to their messages
5
.
6. SECURITY ANALYSIS
We use the CDA security model in [Chan and Castelluccia 2007] to analyze the concrete con-
struction in Section 4. For completeness, the security model is given in Appendix A. As usual,
the adversary is assumed to be probabilistic polynomial time (PPT) in the security model.
In the model, the adversary can choose to compromise a subset of nodes and obtain all the
secret information of these nodes. With oracle access, he can also obtain from any of the un-
compromised nodes the ciphertext of any plaintext he chooses. The security goal is that the
adversary cannot extract in polynomial time any information about the plaintext from a given
ciphertext. This is the well known notion of semantic security [Goldwasser and Micali 1984].
Formally dened, the security model is described as a game in Appendix A.
5
Depending on the number of nodes that respond to a query, it could be more ecient to communicate the ids
of nodes that successfully reported values
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
13
The concrete construction in Section 4.3 can be shown to achieve semantic security or indis-
tinguishability against chosen plaintext attacks (IND-CPA), an equivalent notion of semantic
security [Goldwasser and Micali 1984], if the underlying key generation function is from a
pseudorandom function family. The security can be summarized by the following theorem.
Theorem 1. The concrete construction is semantically secure against any collusion with at
most (n 1) compromised nodes (where n is the total number of nodes), assuming F
= f
s
:
0, 1
0, 1
s{0,1}
is a pseudorandom function and h : 0, 1
0, 1
l
satises the
requirement that t 0, 1
i
, k). They can be generated from three
independent master keys using a pseudorandom function as in the basic scheme. The sink
keeps all the three master keys. ki and k
i
correspond to the encryption key eki in the
basic scheme. Each node should receive a distinct pair of (ki, k
i
) while getting a common
group key k.
Encryption + Checksum Computation:
Let M be the modulus of the arithmetics. For an arbitrary reporting epoch r.
(1) Each node i generates the session keys (k
(r)
i
, k
(r)
i
, k
(r)
) from its secret keys (ki, k
i
, k)
using a pseudorandom function and the length-matching hash function as in the basic
scheme. (i.e. k
(r)
i
= h(f
k
i
(Nr)), k
(r)
i
= h(f
k
i
(Nr)), and k
(r)
= h(f
k
(Nr)) where f()
is the pseudorandom function used, h() is the length-matching hash function and Nr
is the nonce used for epoch r.)
(2) For a plaintext message mi [0, M 1], encrypt mi using k
(r)
i
to obtain the
ciphertext xi = mi + k
(r)
i
mod M.
(3) Compute the checksum: yi = mi k
(r)
+ k
(r)
i
mod M.
(4) The ciphertext and checksum is: (hdr, xi, yi) where hdr = {i}.
Decryption + Verication:
(1) Given a ciphertext (hdr, x, y), generate the session keys (k
(r)
i
, k
(r)
i
, k
(r)
) for each
i hdr.
(2) Compute m = x
ihdr
k
(r)
i
mod M. m is the decrypted plaintext.
(3) Check y
?
=
ihdr
k
(r)
i
+ k
(r)
m mod M. If yes, set b = 1, otherwise, set b = 0.
(4) Return (m, b). Note that b = 0 indicates a verication failure.
Addition of Ciphertexts: Given two ciphertexts (hdri, xi, yi) and (hdrj, xj, yj),
(1) Compute hdr
l
= hdri hdrj.
(2) Compute x
l
= xi + xj mod M.
(3) Compute y
l
= yi + yj mod M.
(4) The aggregated ciphertext is: (hdr
l
, x
l
, y
l
).
The nal aggregated ciphertext (hdr, x, y) received at the sink can be expressed as two
equations:
x = K
(r)
1
+m
y = K
(r)
2
+K
(r)
m
(1)
where m is the nal aggregate of the plaintext data and K
(r)
1
, K
(r)
2
, K
(r)
are two sums of node
keys and the common group key (for epoch r) given by the following expressions:
K
(r)
1
=
ihdr
k
(r)
i
, K
(r)
2
=
ihdr
k
(r)
i
, and K
(r)
= k
(r)
.
Equation (1) can be viewed as a set of constraint equations (for a particular hdr) that a
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
20
correct pair (x, y) should satisfy. For each epoch, hdr is part of the input to the verication
process to dene the coecients K
(r)
1
, K
(r)
2
, K
(r)
of the constraint equations in (1); hdr uniquely
species a subset of nodes whose data are supposed to have been incorporated in (x, y).
If (x, y) has not been tampered with, the plaintext aggregate m extracted from the rst
constraint equation in (1) should satisfy the second constraint equation in (1); m is a correct
aggregate of the data contributed by the nodes in hdr when they all act honestly. The goal of
an external adversary is thus to nd a dierent valid pair (x
, y
= K
(r)
1
+m
= K
(r)
2
+K
(r)
m
for some m
,= m and m
, y
) for the given hdr should be negligibly small. The proposed protocol guarantees with
high probability that, for an epoch r, any pair (x, y) which passes the verication test for a
given hdr has to allow the recovery of a correct aggregate whose contributions can only come
from nodes in hdr with knowledge of K
(r)
(with exactly one contribution from each node in
hdr).
In any epoch, by passively observing transmissions from honest nodes in a network, an
adversary without knowledge of K
(r)
can still create multiple tuples of the form (hdr, x, y),
each with a distinct hdr, to pass the verication test of Equation (1). This can be achieved by
simply aggregating valid ciphertext-checksum pairs eavesdropped in the transmissions of the
honest nodes. However, it should be noted that, for each hdr, there is at most one such tuple
and the corresponding pair of (x, y) is indeed a correct ciphertext-checksum pair for hdr in the
sense that this pair of (x, y), upon verication, can recover an aggregate m the contributions of
which only originate from the honest nodes specied in hdr, that is, m =
ihdr
m
i
where m
i
is
the measurement of node i. In other words, in the set ( of ciphertext-checksum pairs obtained
by combining eavesdropped pairs through the aggregation functionality, if a pair (x, y) (
passes the verication equations in (1) for hdr, any pair (x
, y
, y
) ,= (x, y) for
some m
,= m to pass the verication test of the aggregate authentication scheme for the same
hdr is negligible for any external PPT (Probabilistic Poly-Time) adversary without knowing
K, assuming the encryption keys and the group key are generated by a pseudorandom function
based on dierent seed keys.
Proof: Assume the pseudorandom function has some indistinguishability property as usual. We
prove by contradiction, showing that a PPT adversary which can forge a valid pair (x
, y
) can
also break the indistinguishability property of the underlying pseudorandom function. We show
the reduction
14
in two steps: rst, we show that a forging algorithm to nd (x
, y
) can be used
as a sub-routine to solve a newly dened problem called Under-determined Equation Set with
Pseudorandom Unknowns (UESPU); then we show that the UESPU problem is computation-
ally hard if the underlying pseudorandom function has the usual indistinguishability property.
The UESPU problem is dened as follows:
Under-determined Equation Set with Pseudorandom Unknowns (UESPU) Problem Suppose
K
1
, K
2
, K are independent random seeds. Let K
(r)
1
, K
(r)
2
and K
(r)
denote the hashed outputs
of a pseudorandom function f at input r corresponding to seed keys K
1
, K
2
and K.
15
Given
a 3-tuple (m, x, y) where x = K
(r)
1
+ m and y = K
(r)
2
+ K
(r)
m, nd (K
(r)
1
, K
(r)
2
, K
(r)
) while
allowed to evaluate the pseudorandom function at any input r
,= r.
16
Without loss of generality, in the UESPU problem, each of K
(r)
1
, K
(r)
2
and K
(r)
is treated
as a single hashed output of f. In the proposed aggregate authentication, they are the sums
of hashed outputs of f. If they are represented as the sums of hashed output of f instead, the
modied problem would remain hard if f is a pseudorandom function.
Solving the UESPU problem using a forger of (x
, y
).
Suppose there exists a PPT adversary / which can forge a valid pair (x
, y
) at an epoch with
nonce r with non-negligible probability p
f
. Using / as a subroutine, we can construct another
algorithm /
to nd (K
(r)
1
, K
(r)
2
, K
(r)
) from (m, x, y) with probability p
f
in any instance of the
14
The reduction of the problem of breaking the indistinguishability of the pseudorandom function to the problem
of forging a valid (x
, y
) pair.
15
That is, K
(r)
1
= h(f
K
1
(r)), K
(r)
2
= h(f
K
2
(r)), and K
(r)
= h(f
K
(r)) where h is the length-matching hash
function.
16
The UESPU problem is typically hard if f is a pseudorandom function. More formally dened, given that
l is the key length of the pseudorandom function f and h is a length-matching hash function, the following
probability is negligible in l for any PPT algorithm A.
Pr
K
1
{0, 1}
l
; K
2
{0, 1}
l
; K {0, 1}
l
; r {0, 1}
l
;
K
(r)
1
= h(f
K
1
(r)); K
(r)
2
= h(f
K
2
(r)); K
(r)
= h(f
K
(r));
m Z
M
; x = K
(r)
1
+m; y = K
(r)
2
+ K
(r)
m
: A
f
(m, x, y) = (K
(r)
1
, K
(r)
2
, K
(r)
)
,= r by
passing the queries to its challenger.
The construction of /
, y
) ,=
(x, y), we can determine K
(r)
1
, K
(r)
2
, K
(r)
from the resulting set of equations. The explanation
is as follows:
Note that
x = K
(r)
1
+m
y = K
(r)
2
+K
(r)
m.
So we have two equations and 3 unknowns. If (x
, y
= K
(r)
1
+m
= K
(r)
2
+K
(r)
m
,= m.
The pair (x
, y
. Since (x
, y
) ,= (x, y)
and m
,= m, it can be assured that the four equations are independent. Hence, there are four
independent equations and four unknowns in total and it should be easy to solve for K
(r)
1
, K
(r)
2
,
K
(r)
(a contradiction to the UESPU assumption). The probability of solving the problem in
the UESPU assumption is hence p
f
.
Suppose there are n reporting nodes. The communication transcripts can be simulated easily
by randomly picking (n1) random pairs of ciphertext-checksum (x
i
, y
i
) and subtracting them
from (x, y) to obtain the n-th pair. Since / does not have any knowledge about the node keys,
real pairs of (x
i
, y
i
) should look random to /. Hence, / could not distinguish its view in the
simulation and that in the real attack. On the other hand, it could be concluded that knowing
(x
i
, y
i
) without knowing the node keys would not help in creating a valid forgery. In the above
discussion, we treat K
(r)
1
, K
(r)
2
, K
(r)
as a single output of a pseudorandom function for the
sake of clarity and easy comprehension; more precisely, in the aggregate authentication scheme,
each one of them is the sum of outputs of a pseudorandom function seeded with distinct keys
(one from each sensor node). Nonetheless, the above arguments and conclusion apply to both
cases.
A distinguisher for the pseudorandom function using an algorithm which solves
the UESPU problem.
The UESPU problem is hard if K
(r)
1
, K
(r)
2
, K
(r)
are generated by a pseudorandom function.
Obviously, m and x can uniquely determine K
(r)
1
. But the equation y = K
(r)
2
+ K
(r)
m has
two unknowns, which cannot be uniquely determined. It could be shown that if there exists an
algorithm /
solving in poly-time K
(r)
2
and K
(r)
from m and y, then the indistinguishability
property of the underlying pseudorandom function is broken.
The idea is as follows: assume the seed key for generating K
(r)
is unknown but the seed key
for generating K
(r)
2
is known. That is, we can generate K
(r
)
2
for any r
. When a challenge K
(r)
is received, we have to determine whether it is randomly picked from a uniform distribution or
generated by the pseudorandom function with an unknown seed key. We generate K
(r)
2
from
the known seed key. Then we pass y = K
(r)
2
+K
(r)
m to /
does not
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
23
match the generated K
(r)
2
, we reply that K
(r)
is randomly picked, otherwise, it is generated
from the pseudorandom function. If /
could be answered by sending queries to the challenger and running the pseudorandom
function with the known key.
8.3 Additional Overheads
The aggregate authentication extension leads to additional costs in both communication and
computation. For the communication cost, the length of each ciphertext is now increased by [M[
(where M is the modulus of the arithmetics in use). This is the size of the added checksum. For
the computation cost, the notations of Section 7.3 are used. The additional computation costs
needed for checksum generation and verication are summarized as follows. In the calculation
of verication cost, the cost of a comparison operation in mod M is assumed similar to the cost
of an addition operation in mod M.
Additional Computation Costs
Checksum Generation 2 t
prf
+ 2 t
h
+t
add
+t
multi
Checksum Verication (2L + 1) t
prf
+ (L + 1) t
h
+ (L + 1) t
add
+t
multi
Table IV. Additional computation costs of the extension of aggregate authentication (assuming L is the number
of nodes contributing to an aggregate).
9. RELATED WORK
The problem of aggregating encrypted data in WSNs was partially explored in [Girao et al.
2004]. In this paper, the authors propose to use an additive and multiplicative homomorphic
encryption scheme to allows aggregation of encrypted data. While this work is very interesting,
it has several important limitations. Firstly, it is not clear how secure the encryption scheme
really is. Secondly, as acknowledged by the authors, the encryption and aggregation opera-
tions are very expensive and therefore require quite powerful sensors. Finally, in the proposed
scheme, the encryption expands the packet size signicantly. Given all these drawbacks, it
is questionable whether aggregation is still benecial. In contrast, our encryption scheme is
proven to be secure and is very ecient. Encryption and aggregation only requires a small
number of single-precision additions. Furthermore, our encryption scheme only expands packet
sizes by a small number of bits. As a result, it is well adapted to WSNs consisting of very
resource constrained sensors.
In [Hu and Evans 2003], Hu and Evans propose a protocol to securely aggregate data. The
paper presents a way to aggregate MACs (message authentication code) of individuals packets
such that the sink can eventually detects non-authorized inputs. This problem is actually
complementary to the problem of aggregating encrypted data, we are considering in this paper.
The proposed solution introduces signicant bandwidth overhead per packet. Furthermore, it
requires the sink to broadcast n keys, where n is the number of nodes in the network, at each
sampling period. This makes the proposed scheme non-practical.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
24
Although not related to data privacy, in [Przydatek et al. 2003] Przydatek, et al. present
ecient mechanism for detecting forged aggregation values (min, max, median, average and
count). In their setting, a trusted outside user can query the WSN. The authors then look into
how to reduce the trust placed in the sink node (base station) while ensuring correctness of the
query response. Another work by Wagner [Wagner 2004] examines security of aggregation in
WSNs, describing attacks against existing aggregation schemes before providing a framework
in which to evaluate such a schemes security.
10. CONCLUSION
This paper proposes a new homomorphic encryption scheme that allows intermediate sensors
(aggregators) to aggregate the encrypted data of their children without having to decrypt them.
As a result, even if an aggregator gets compromised, the attacker wont be able to eavesdrop on
the data and aggregate, resulting in much stronger privacy than an aggregation scheme relying
on by hop-by-hop encryption.
We show that if the key streams used in our scheme are derived using a pseudorandom
function, our scheme can achieve semantic security against any collusion of size less than the
total number of nodes.
We evaluate the performance of our scheme. We show, as expected, that our scheme is
slightly less bandwidth ecient than the hop-by-hop aggregation scheme described previously.
However it provides a much stronger level of security. The privacy protection provided by
our scheme is in fact comparable to the privacy protection provided by a scheme that would
use end-to-end encryption and no aggregation (i.e. the aggregation is performed at the base
station). We show that our scheme is not only much more bandwidth-ecient than such an
approach, but it also distributes the communication load more evenly amongst the network
nodes, resulting in an extended longevity of the WSN.
Finally, we extend our scheme to provide end-to-end aggregate authentication. Without
knowledge of a group key, an external attacker has negligible probability of tampering the
aggregate without being detected in the extension.
In conclusion, we give ecient, provably secure solutions to provide end-to-end privacy and
authenticity (with reasonably good security assurance) for WSNs while en-route aggregation is
supported. The presented scheme only supports mean and variance computation. However, we
shown in [Castelluccia and Soriente 2008] that our construction could be used as a building block
for other aggregation schemes to support more functions (such as medium, mode, range,...).
REFERENCES
Bellare, M., Canetti, R., and Krawczyk, H. 1996. Keying hash functions for message authentication. In
Advances in Cryptology CRYPTO 1996, Springer-Verlag LNCS vol. 1109. 115.
Boneh, D., Gentry, C., Lynn, B., and Shacham, H. 2003. Aggregate and veriably encrypted signatures
from bilinear maps. In Advances in Cryptology EUROCRYPT 2003, Springer-Verlag LNCS vol. 2656.
416432.
Castelluccia, C., Mykletun, E., and Tsudik, G. 2005. Ecient aggregation of encrypted data in wireless
sensor networks. In the Proceedings of MobiQuitous05. 19.
Castelluccia, C. and Soriente, C. 2008. ABBA: Secure aggregation in wsns - a bins and balls approach.
6th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks
(WiOpt).
Chan, A. C.-F. and Castelluccia, C. 2007. On the privacy of concealed data aggregation. In ESORICS
2007, Springer-Verlag LNCS vol. 4734. 390405.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
25
Chan, A. C.-F. and Castelluccia, C. 2008. On the (im)possibility of aggregate message authentication codes.
ePrint Archive, Report 2008-. http://.
Chan, H., Perrig, A., and Song, D. 2006. Secure hierarchical in-network aggregation in sensor networks. In
ACM Conference on Computer and Communication Security (CCS 06). 278287.
Eschenauer, L. and Gligor, V. D. 2000. A key management scheme for distributed sensor networks. ACM
CCS, 4147.
Girao, J., Westhoff, D., and Schneider, M. 2004. CDA: Concealed data aggregation in wireless sensor
networks. ACM WiSe 2004.
Goldreich, O. 2001. Foundations of Cryptography: Part 1. Cambridge University Press.
Goldwasser, S. and Micali, S. 1984. Probabilistic encryption. Journal of Computer and System Sci-
ences 28, 2, 270299.
Goldwasser, S., Micali, S., and Rivest, R. 1988. A secure signature scheme secure against adaptive chosen-
message attacks. SIAM Journal on Computing 17, 2, 281308.
Hu, L. and Evans, D. 2003. Secure aggregation for wireless networks. Workshop on Security and Assurance
in Ad hoc Networks.
Iwata, T. and Kurosawa, K. 2003. OMAC: One-key CBC MAC. In Fast Software Encryption (FSE 2003),
Springer-Verlag LNCS vol. 2887. 129153.
Karlof, C., Sastry, N., and Wagner, D. 2004. Tinysec: a link layer security architecture for wireless sensor
networks. Embedded Networked Sensor Systems, 162175.
Karlof, C. and Wagner, D. 2003. Secure routing in wireless sensor networks: Attacks and countermeasures.
Workshop on Sensor Network Protocols and Applications.
Katz, J. and Yung, M. 2006. Characterization of security notions for probabilistic private-key encryption.
Journal of Cryptology 19, 1, 6795.
Madden, S. R., Franklin, M. J., Hellerstein, J. M., and Hong, W. 2002. TAG: a Tiny AGgregation service
for ad-hoc sensor networks. Fith Annual Symposium on Operating Systems Design and Implementation, 131
146.
Naor, M., Reingold, O., and Rosen, A. 2002. Pseudorandom functions and factoring. SIAM Journal on
Computing 31, 5, 13831404.
Naor, M. and Yung, M. 1990. Public-key cryptosystems provably secure against chosen-ciphertext attacks.
In ACM Symposium on Theory of Computing (STOC 1990). 427437.
NIST. 2001. Advanced encryption standard. NIST (National Institute of Standards and Technology) FIPS
PUB 197.
Perrig, A., Stankovic, J., and Wagner, D. 2004. Security in wireless sensor networks. Communications of
the ACM 47, 5357.
Perrig, A., Szewczyk, R., Wen, V., Culler, D., and Tygar, D. 2001. SPINS: Security protocols for sensor
networks. In the Proceedings of ACM MOBICOM 2001. 189199.
Przydatek, B., Song, D., and Perrig, A. 2003. SIA: Secure information aggregation in sensor networks.
ACM SENSYS, 255265.
Rivest, R. L. 1995. The RC5 encryption algorithm. Dr. Dobbs Journal 1008.
Rivest, R. L., Shamir, A., and Adleman, L. M. 1978. A Method for Obtaining Digital Signatures and
Public-Key Cryptosystems. Communications of the ACM 21, 120126.
Vernam, G. S. 1926. Cipher printing telegraph systems for secret wire and radio telegraphic communications.
Journal of the American Institute of Electrical Engineers 45, 105115. See also US patent #1,310,719.
Wagner, D. 2004. Resilient aggregation in sensor networks. Workshop on Security of Ad Hoc and Sensor
Networks.
Westhoff, D., Girao, J., and Acharya, M. 2006. Concealed data aggregation for reverse multicast traf-
c in sensor networks: Encryption, key distribution, and routing adaption. IEEE Transactions on Mobile
Computing 5, 10, 14171431.
Wood, A. D. and Stankovic, J. A. 2002. Denial of service in sensor networks. IEEE Computer 35, 5462.
Yang, Y., Wang, X., Zhu, S., and Cao, G. 2006. SDAP: A secure hop-by-hop data aggregation protocol for
sensor networks. In the Proceedings of ACM Internation Symposium on Mobile Ad Hoc Networking and
Computing (MobiHoc) 2006.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
26
Zhu, S., Setia, S., Jajodia, S., and Ning, P. 2004. An interleaved hop-by-hop authentication scheme for
ltering false data in sensor networks. IEEE Symposium on Security and Privacy.
Appendix A: Semantic Security of Concealed Data Aggregation (CDA) [Chan and Castelluccia 2007]
Notation
We follow the notations for algorithms and probabilistic experiments that originate in [Gold-
wasser et al. 1988]. A detailed exposition can be found there. We denote by z A(x, y, . . .)
the experiment of running probabilistic algorithm A on inputs x, y . . ., generating output z. We
denote by A(x, y, . . .) the probability distribution induced by the output of A. The notations
x T and x
R
T are equivalent and mean randomly picking a sample x from the probability
distribution T; if no probability function is specied for T, we assume x is uniformly picked
from the sample space. We denote by N the set of non-negative integers. As usual, PPT denote
probabilistic polynomial time. An empty set is always denoted by .
CDA Syntax
A typical CDA scheme includes a sink R and a set U of n source nodes (which are usually
sensor nodes) where U = s
i
: 1 i n. Denote the set of source identities by ID; in the
simplest case, ID = [1, n]. In the following discussion, hdr ID is a header indicating the
source nodes contributing to an encrypted aggregate. A source node i has the encryption key
ek
i
while the sink keeps the decryption key dk from which all ek
i
s can be computed. Given a
security parameter , a CDA scheme consists of the following polynomial time algorithms.
Key Generation (KG). Let KG(1
, n) (dk, ek
1
, ek
2
, . . . , ek
n
) be a probabilistic algorithm.
Then, ek
i
(with 1 i n) is the encryption key assigned to source node s
i
and dk is the
corresponding decryption key given to the sink R.
Encryption (E). E
eki
(m
i
) (hdr
i
, c
i
) is a probabilistic encryption algorithm taking a plain-
text m
i
and an encryption key ek
i
as input to generate a ciphertext c
i
and a header hdr
i
ID.
Here hdr
i
indicates the identity of the source node performing the encryption; if the identity
is i, then hdr
i
= i. Sometimes the encryption function is denoted by E
eki
(m
i
; r) to explicitly
show by a string r the random coins used in the encryption process.
Decryption (D). Given an encrypted aggregate c and its header hdr ID (which indicates
the source nodes included in the aggregation), D
dk
(hdr, c) m/ is a deterministic algorithm
which takes the decryption key dk, hdr and c as inputs and returns the plaintext aggregate m
or possibly if c is an invalid ciphertext.
Aggregation (Agg). With a specied aggregation function f such as additive aggregation
considered in this paper, Agg
f
(hdr
i
, hdr
j
, c
i
, c
j
) (hdr
l
, c
l
) aggregates two encrypted aggre-
gates c
i
and c
j
with headers hdr
i
and hdr
j
respectively (where hdr
i
hdr
j
= ) to create
a combined aggregate c
l
and a new header hdr
l
= hdr
i
hdr
j
. Suppose c
i
and c
j
are the
ciphertexts for plaintext aggregates m
i
and m
j
respectively. The output c
l
is the cipher-
text for the aggregate f(m
i
, m
j
), namely, D
dk
(hdr
l
, c
l
) f(m
i
, m
j
). This paper considers
f(m
i
+m
j
) = m
i
+m
j
mod M. Note that the aggregation algorithm does not need the decryp-
tion key dk or any of the encryption keys ek
i
as input; it is a public algorithm.
It is intentional to include the description of the header hdr in the security model to make it
as general as possible (to cover schemes requiring headers in their operations). hdr is needed
in some schemes to identify the set of decryption keys required to decrypt a certain ciphertext.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
27
Nonetheless, generating headers or including headers as input to algorithms should not be
treated as a requirement in the actual construction or implementation of CDA algorithms. For
constructions which do not need headers, all hdrs can simply be treated as the empty set in
the security model.
The Notion of Semantic Security
Only one type of oracle queries (adversary interaction with the system) is allowed in the security
model, namely, the encryption oracle O
E
. The details are as follows:
Encryption Oracle O
E
(i, m, r).. For xed encryption and decryption keys, on input an en-
cryption query i, m, r), the encryption oracle retrieves s
i
s encryption key ek
i
and runs the
encryption algorithm on m and replies with the ciphertext E
eki
(m) and its header hdr. The
random coins or nonce r is part of the query input to O
E
.
The encryption oracle is needed in the security model since the encryption algorithm uses
private keys.
To dene security (more specically, indistinguishability) against chosen plaintext attacks
(IND-CPA), we use the following game played between a challenger and an adversary, assuming
there is a set U of n source nodes. If no PPT adversary, even in collusion with at most t
compromised nodes, can win the game with non-negligible advantage (as dened below), we
say the CDA scheme is t-secure. The adversary is allowed to freely choose parameters n and t.
Definition 3. A CDA scheme is t-secure (indistinguishable) against adaptive chosen plaintext
attacks if the advantage of winning the following game is negligible in the security parameter
for all PPT adversaries.
Collusion Choice. The adversary chooses to corrupt t source nodes. Denote the set of these
t corrupted nodes and the set of their identities by S
and I
respectively.
Setup. The challenger runs the key generation algorithm KG to generate a decryption key dk
and n encryption keys ek
i
: 1 i n, and gives the subset of t encryption keys ek
j
: s
j
S
to the adversary but keeps the decryption key dk and the other (n t) encryption keys
ek
j
: s
j
US
.
Query 1. The adversary can issue to the challenger one type of queries:
- Encryption Query i
j
, m
j
, r
j
). The challenger responds with E
ei
j
(m
j
) using random coins
r
j
. The adversary is allowed to choose and submit his choices of random coins for encryption
queries.
Challenge. Once the adversary decides that the rst query phase is over, it selects a subset
S of d source nodes (whose identities are in the set I) such that [SS
0, 1 for b.
Result. The adversary wins the game if b
Pr[b
= b]
1
2
.
Note that in CDA what the adversary is interested in is the information about the nal
aggregate. Consequently, in the above game, the adversary is asked to distinguish between the
ciphertexts of two dierent aggregates x
0
and x
1
as the challenge, rather than to distinguish
the two sets of plaintexts M
0
and M
1
. Allowing the adversary to choose the two sets M
0
, M
1
is to give him more exibility in launching attacks.
Appendix B: Proof of Theorem 1
Proof: For the sake of clarity, we rst prove the security of a version without using the hash
function h. Then we show why the proof also works for the hashed version. The reduction
is based on the indistinguishability property of a pseudorandom function which is stated as
follows:
Indistinguishability Property of a Pseudorandom Function.
Assume f is taken from a pseudorandom function. Then for a xed input argument x
and an unknown, randomly picked key K, the following two distributions are computationally
indistinguishable provided that polynomially many evaluations of f
K
() have been queried:
y = f
K
(x) : y, y 0, 1
: y.
That is, the output f
K
(x) is computationally indistinguishable from a randomly picked number
from 0, 1
to any PPT distinguisher who has knowledge of the input argument x and a
set of polynomially many 2-tuples (x
i
, f
K
(x
i
)) where x
i
,= x. More formally, for any PPT
distinguisher T,
[Pr[y = f
K
(x) : T(x, y) = 1] Pr[y 0, 1
.
Algorithm D
Setup. Allow the adversary D to choose any n 1 sources to corrupt. Randomly pick n 1
encryption keys ek
i
R
0, 1
is
being challenged with.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
29
Query. Upon receiving an encryption query i
j
, m
j
, r
j
) with nonce r
j
, return c
j
= (f
eki
j
(r
j
)+
m
j
) mod M if i
j
,= n. Otherwise, pass r
j
to query the pseudorandom function to get back
f
K
(r
j
) and reply with c
j
= (f
K
(r
j
) +m
j
) mod M.
Challenge. In the challenge phase, receive from D two sets of plaintext messages M
0
=
m
01
, m
02
, . . . , m
0n
and M
1
= m
11
, m
12
, . . . , m
1n
.
Randomly pick a number w and output it to the pseudorandom function challenger to ask
for a challenge. Note w is the nonce used for CDA encryption in the challenge for D. The
pseudorandom function challenger ips a coin b 0, 1 and returns t
b
, which is f
K
(w) when
b = 0 and randomly picked from 0, 1
n
i=1
m
di
+
n1
i=1
f
eki
(w) +t
b
.
Guess. D returns its guess b
. Return b
which is 0 when b
= d and 1 otherwise.
Obviously, if D is PPT, then D
n
i=1
m
di
+
n1
i=1
f
eki
(w)
by X
d
, the challenge passed to D can be expressed as c
d
= X
d
+t
b
. When b = 0, t
b
= f
K
(w);
when b = 1, t
b
is a randomly picked number from0, 1
to distinguish between
f
K
(w) and a random number is:
Pr
PRF
D
[Success] = Pr[b
= b]
=
1
2
Pr[b
= 0[b = 0] +Pr[b
= 1[b = 1]
=
1
4
Pr[b
= 0[b = 0, d = 0] +Pr[b
= 0[b = 0, d = 1]
+Pr[b
= 1[b = 1, d = 0] +Pr[b
= 1[b = 1, d = 1]
=
1
4
Pr[D(t
0
+X
0
) = 0] +Pr[D(t
0
+X
1
) = 1]
+Pr[D(t
1
+X
0
) = 0] +Pr[D(t
1
+X
1
) = 1]
=
1
4
Pr[D(t
0
+X
0
) = 0] +Pr[D(t
0
+X
1
) = 1]
+1 Pr[D(t
1
+X
0
) = 1] +Pr[D(t
1
+X
1
) = 1]
=
1
4
2Pr
CMT
D
[Success] + 1 (Pr[D(t
1
+X
0
) = 1] Pr[D(t
1
+X
1
) = 1]).
Note that t
0
+ X
0
and t
0
+ X
1
are valid ciphertexts for the two challenges plaintext sets M
0
and M
1
respectively. In the last step, we make use of the fact that the probability of success
for D to break the semantic security of the scheme is given by:
Pr
CMT
D
[Success] =
1
2
Pr[D(t
0
+X
0
) = 0] +
1
2
Pr[D(t
0
+X
1
) = 1].
Rearranging terms, we have
4Pr
PRF
D
[Success] +Pr[D(t
1
+X
0
) = 1] Pr[D(t
1
+X
1
) = 1] = 2Pr
CMT
D
[Success] + 1
4(Pr
PRF
D
[Success]
1
2
) +Pr[D(t
1
+X
0
) = 1]
Pr[D(t
1
+X
1
) = 1]
= 2(Pr
CMT
D
[Success]
1
2
).
Taking absolute value on both sides and substitute Adv
PRF
D
= [Pr
PRF
D
[Success]
1
2
[ and
Adv
CMT
D
= [Pr
CMT
D
[Success]
1
2
[, we have
2Adv
PRF
D
+
1
2
[Pr[D(t
1
+X
0
) = 1] Pr[D(t
1
+X
1
) = 1][ Adv
CMT
D
.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
30
Since t
1
is a randomly picked number, t
1
+ X
0
and t
1
+ X
1
are identically distributed.
That is, for any PPT algorithm D, Pr[D(t
1
+X
0
) = 1] = Pr[D(t
1
+X
1
) = 1]. Hence,
2Adv
PRF
D
() Adv
CMT
D
().
Note also that:
Pr[x 0, 1
; y = f
K
(x) : D
(y) = 1] Pr[y 0, 1
: D
(y) = 1]
> 2Adv
PRF
D
().
If Adv
CMT
D
is non-negligible in , then so is Adv
PRF
D
. As a result, if D can break the semantic
security of the scheme with non-negligible advantage, D
; y =
f
K
(x) : D
(y) = 1] Pr[y 0, 1
: D
0, 1
; x 0, 1
: f
f
K
(n)
(x)
K 0, 1
; x 0, 1
: f
K
(x)
.
The argument is as follows: Assume f is a pseudorandom function. That is, A = K
0, 1
: f
K
(n) is indistinguishable from B = K 0, 1
0, 1
; x
0, 1
: f
f
K
(n)
(x) and Y = K 0, 1
; x 0, 1
: f
K
(x), we can use D to distinguish
between A and B. The idea is when we receive a challenge s which could be from A or B, we
send f
s
(x) as a challenge for D. If s belongs to A, f
s
(x) belongs to X, and if s belongs to B,
f
s
(x) belongs to Y . We could thus distinguish X from Y (a contradiction).
Security of the Hashed Version.
Only a few modications to the security proof above are needed in order to prove the security
of the hashed variant.
First, in the algorithm D
, all ciphertexts are now generated using the hashed values of the
pseudorandom function outputs or replies from the challenger of D
n
i=1
m
di
+
n1
i=1
h(f
eki
(w)) by X
d
. Of course, the modulus size
would be l instead of .
Second, the challenge passed to D would be: c
d
= X
d
+ h(t
b
). Then the derivation for the
advantage expressions is essentially the same as that for the non-hashed scheme.
Third, the security proof of the non-hashed scheme relies on the fact that t
1
0, 1
:
t
1
+X
0
and t
1
0, 1
: t
1
+X
1
are identical distribution. On the contrary, to prove the
security of hashed scheme, we need the following distributions to be identical:
t
1
0, 1
: h(t
1
) +X
0
, t
1
0, 1
: h(t
1
) +X
1
.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.
31
If h fullls the requirement mentioned above, then t
1
0, 1
: h(t
1
) is the uniform
distribution over 0, 1
l
. Consequently, the above two distributions are identical. This thus
concludes the proof that hashed scheme is semantically secure.
ACM Transactions on Sensor Networks, Vol. V, No. N, Month 20YY.