Identification and Lossy Reconstruction in Noisy Databases: Ertem Tuncel, Deniz G Und Uz
Identification and Lossy Reconstruction in Noisy Databases: Ertem Tuncel, Deniz G Und Uz
Abstract—A noisy database system is studied in which the noisy a compressed version of the observed feature vectors rather
versions of the underlying feature vectors are observed in both than storing the whole noisy observation. This compression at
the enrollment and the query phases. The noisy observations the enrollment stage introduces a tradeoff between the identi-
are compressed before being stored in the database, and the
user wishes both to identify the correct entry corresponding to fication capacity and the compression, which is characterized
the noisy query vector and to reconstruct the original feature in [3] and [4].
vector within a desired distortion requirement. A fundamental Here we consider another dimension of the biometric
capacity/storage/distortion tradeoff is identified for this system database system. In the identification stage, we require not
in the form of single-letter information theoretic expressions.
The relation of this problem to the classical Wyner-Ziv rate-
only a reliable identification of the database entry correspond-
distortion problem is shown, where the noisy query vector acts ing to the noisy query, but also a lossy reconstruction of the
as the correlated side information in the lossy reconstruction of underlying feature vector. Note that the noisy query vector
the feature vector. serves as a side information for the reconstruction in the
identification stage. In a sense, this problem combines and
I. I NTRODUCTION
generalizes the capacity/storage tradeoff problem in biometric
Biometric data, such as fingerprints, behavioral patterns and databases studied in [3] and [4] with the classical Wyner-
iris scans are replacing classical identification documents for Ziv rate-distortion problem in [7]. We provide a single-letter
increased security. However, efficient use of such data for information theoretic expression for the set of achievable
sensitive security applications requires storage of extensive capacity/storage/distortion tradeoff for this biometric identi-
digital biometric data in a large database and fast search fication system.
algorithms for reliable identification of the queries within The rest of the paper is organized as follows. We introduce
the database. On top of the storage constraints and search the system model and the necessary definitions in Section II.
speed requirements, another difficulty arises due to the noisy The main result of the paper is presented in Section III, and
observation of the underlying biometric feature in both the Sections IV and V are dedicated to its proof. In Section VI
enrollment and identification stages. This might be either due we study a binary symmetric feature vector and identify the
to the random noise in the scanning device as in the case of capacity/storage/distortion tradeoff assuming noiseless obser-
fingerprinting or iris scanning, or due to the temporal changes vation in the enrollment phase and an erasure channel in the
in the expression of the underlying feature as in the case of query phase. Section VII concludes the paper.
behavioral patterns such as keystroke.
The first attempts to identify the fundamental performance
limits of biometric databases are done in [1] and [2] where II. S YSTEM M ODEL
the maximum exponential rate of entries that can be reliably
identified in a database is characterized. The main assumption We assume that the feature vectors {X n (m)}M m=1 are
is that the components of the underlying feature vectors are generated independently with the identical distribution of
independent and identically distributed (i.i.d.) with a known n
distribution, while both the enrollment vectors and the queries P [X n (m) = xn ] = PX (xi )
are noisy versions of the feature vectors through two different i=1
discrete memoryless channels, which might model two differ-
ent measurement devices or measurement conditions. Defining over the finite feature alphabet X .
the highest possible identification rate as the capacity of the The database is formed by an enrollment phase, in which
system, [2] provides a single-letter expression for the capacity, the noisy version of the feature vector of an individual is
characterized by the mutual information between the noisy observed and recorded to the database. We denote the observed
enrollment and the noisy query distributions. noisy feature vector by Y n (m), m ∈ M = {1, . . . , M },
However, to improve the efficiency of the identification which are assumed to be the output of a discrete memoryless
process in the storage device, it may be desirable to store only channel (DMC) characterized by PY |X , where Y is the finite
192
ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010
i
rate region and (8) and (9) characterize the marginal rate ŵ = g(j(1), . . . , j(2nR ), z n ) as the smallest such m, and set
i
region. Equivalence of the two regions then follow from the g(j(1), . . . , j(2nR ), z n ) = 1 if no such m is found.
fact that So far, the only randomness mentioned above is that of
the codebook U n (j, k). We pause here to emphasize that this
−I(U ; Z) = min I(U ; Y ) − I(U ; Z)
U randomization is for the purpose of creating an ensemble of
I(U ; Y ) = 0 codebooks over which we compute average probability of error
and average distortion. On the other hand, the database is filled
is trivially achieved by a dummy U . with random entries also, but this randomness is inherent to
Another special case of this setup is obtained if we ignore the problem and is independent from codebook generation.
the identification requirement of the user, i.e., by letting i
Now for m = 1, . . . , 2nR , define J(m) = f (Y n (m)) and
Ri = 0. It is not hard to see that the model then reduces to K(m) as the smallest k found in the process of enrolling
the classical Wyner-Ziv problem of lossy source compression Y n (m). If no (j, k) was found, also set K(m) = 1. Although
in the presence of receiver side information with the slight K(m) is not recorded, it is useful to define it for analysis
difference that the receiver wants to reconstruct the X n vector purposes. Finally, let Ŵ = g(J, Z n ).
rather than the Y n vector that is available at the encoder. We Reconstruction: For any noisy observation z n ∈ Z n and a
obtain the following rate-distortion region. given compression index j ∈ L, the reconstruction function
Corollary 2: A compression-distortion pair (Rc , D) is h(j, z n ) is defined as follows. Find the smallest k such that
achievable if and only if there exist a random variable U ∈ U (z n , U n (j, k)) ∈ T[ZU
n
] , and output φ(Ui (j, k), zi ) for the ith
with joint distribution pU Y XZ such that U − Y − X − Z forms component of h(j, z n ). If no such k is found, then output a
a Markov chain and random vector from the reconstruction alphabet.
Rc ≥ I(U ; Y |Z) Probability of error: We define the following events:
D ≥ E[d(X, X̂)]
E0 (m) = (Y n (m), Z n ) ∈ n
/ T[Y Z]
with |U| ≤ |Y| + 1.
E1 (m) = (Y n (m), U n (J(m), K(m))) ∈ n
/ T[Y U]
IV. ACHIEVABILITY
E2 (m, k) = (Z n , U n (J(m), k)) ∈ n
/ T[ZU ] .
We will first prove the achievability of an (Rc , Ri , D) tuple
for which there exist a random variable U ∈ U and a function The average probability of error for the identification process
φ : U × Z satisfying U − Y − X − Z and can then be bounded as
Ri + R̄ ≤ I(U ; Z) (11) Pr{Ŵ = W |W = w} ≤ Pr{E0 (w)}
Rc + R̄ ≥ I(U, Y ) (12) + Pr{E1 (w)|E0 (w)c }
D ≥ E[d(X, φ(U, Z))] (13) + Pr{E2 (w, K(w))|E1 (w)c }
for some R̄ ≥ 0. As mentioned above, this auxiliary −R̄ will + Pr{E2 (m, k)c }. (14)
play the role of rate transferred from the “second-stage” rate m=w k
Rc to the “first-stage” rate −Ri of a fictitious source coder. It is straightforward to show that Pr{E0 (w)} → 0. We can
Fix pU |Y and the function φ that satisfy the conditions in also show using standard arguments that Pr{E1 (w)|E0 (w)c }
c
the lemma. We first generate a codebook of size 2n(R +R̄) vanishes with increasing n if
n
that consists of i.i.d. codewords U . We index the codewords
Rc + R̄ > I(U ; Y ).
c
U n (j, k) for j = 1, . . . , 2nR and k = 1, . . . , 2nR̄ .
Enrollment: For any y ∈ Y n , we define the enroll-
n
That Pr{E2 (w, K(w))|E1 (w)c } also vanishes with increasing
ment function f (y n ) as the smallest index j for which
n follows from the Markov lemma [6]. In fact, with high
(y n , U n (j, k)) ∈ T[Y
n
U ] for some k = 1, . . . , 2
nR̄
. We set
n probability,
f (y ) = 1 if no such index can be found. Thus, one can think
of the collection of all codewords U n (j, k) for k = 1, . . . , 2nR̄ (Z n , X n (w), Y n (w), U n (J(w), K(w))) ∈ T[ZXY
n
U] (15)
as “bins,” and f (y n ) as a source coder which records only the
bin index. which will be useful in the distortion analysis. Finally,
Identification: For any noisy observation z n ∈ Z n and the i
Pr{E2 (m, k)c } ≤ 2n(R +R̄) 2−nI(U ;Z)
i
given compression indices j(1), . . . , j(2nR ) of the database m=w k
entries, the identifier looks for a database entry m ∈
i the right-hand side of which vanishes for large enough n if
{1, . . . , 2nR }, such that (z n , U n (j(m), k)) ∈ T[ZU n
]
1
for
nR̄
some k = 1, . . . , 2 . We define the identification function Ri + R̄ < I(U ; Z).
1 For a probability distribution P , we denote by T n the set of all strongly
X
Next, we consider the average distortion incurred by the
[X]
typical sequences. For more detail on strong typicality see [5]. reconstruction. A crucial observation at this point is that with
193
ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010
i.e., for any > 0 there exist deterministic functions f, g and = I(J(W ); Y n (W )|Z n ) (22)
h such that (2)-(5) are satisfied. n
We have = I(J(W ); Yi (W )|Z n , Y i−1 (W ))
i=1
log M = H(W ) n
= H(W |J, Z n ) + I(W ; J, Z n ) = H(Yi (W )|Z n , Y i−1 (W ))
i=1
≤ H(W |Ŵ ) + I(W ; J, Z n ) (16)
−H(Yi (W )|J(W ), Z n , Y i−1 (W ))
≤ 1 + Pen log M + I(W ; J, Z n ) (17)
n
where (16) follows since Ŵ is a deterministic function of J ≥ H(Yi (W )|Zi ) − H(Yi (W )|J(W ), Z n )
and Z n , and (17) follows from Fano’s inequality. From here i=1
we can obtain n
= H(Yi (W )|Zi ) − H(Yi (W )|Ui , Zi )
(1 − ) log M − 1 ≤ I(W ; J, Z n ) i=1
= I(W ; Z n |J) (18) n
= I(Yi (W ); Ui |Zi )
= H(Z n |J) − H(Z n |W, J)
i=1
≤ H(Z n ) − H(Z n |W, J)
where (22) follows from the fact that Z n − Y n (W ) − J(W )
= H(Z n ) − H(Z n |J(W )) (19) forms a Markov chain. Thus,
n
where (18) follows since W is independent of the database en- 1 1
tries, and hence of J, and (19) follows since Z n is independent Rc − Ri + Ri + ≥ I(Yi (W ); Ui |Zi ) . (23)
n n i=1
of J(m) with m = W .
194
ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010
As for the distortion constraint, first observe that For the maximum distortion D = 2, the problem reduces
to the one studied in [4], and
E[d(X n (W ), h(J(W ), Z n ))]
Ri
= (1 − Pen )E[d(X n (W ), h(J(Ŵ ), Z n ))|Ŵ = W ] Rc (Ri , /2) = .
1−
+Pen E[d(X n (W ), h(J(W ), Z n ))|Ŵ = W ]
On the other hand, if recognition rate Ri is set to zero, we
≤ (1 − Pen )(D + ) + Pen dmax
obtain
≤ D + (1 + dmax ) . Rc (0, D) = ψ (D)
Thus, denoting by hi the ith component of h, we have which coincides with the ordinary Wyner-Ziv rate-distortion
function for erasure side information.
D + (1 + dmax )
n
1 VII. C ONCLUSIONS
≥ E[d(Xi (W ), hi (J(W ), Z n ))]
n i=1 We have studied a noisy database system where both the
n
enrollment and the query vectors are noisy versions of the
1 underlying feature vectors. The noisy enrollment vectors are
= E[d(Xi (W ), hi (Ui , Zi ))] . (24)
n i=1 compressed before being stored in the database to reduce the
storage requirement and increase the search speed. The user
From (21), (23), (24), and convexity of R∗ , R ⊂ R∗
of the database wishes not only to identify the correct entry
follows.
corresponding to a noisy query vector, but also to reconstruct
VI. A N E XAMPLE the original feature vector of the queried entry within a de-
sired distortion requirement. We have identified a fundamental
Consider binary feature vectors with PX (x) = 12 for x ∈
capacity/storage/distortion tradeoff and identified the set of
X = {0, 1}. Let the enrollment channel PY |X be noiseless
achievable compression rate, identification rate, and distortion
(thus Y = X ), and PZ|X be a symmetric erasure channel with
tuples in a single-letter form. This problem combines and
Z = {0, ?, 1} and erasure probability . Also let X̂ = X and generalizes the previously studied capacity/storage tradeoff
d(·, ·) be the Hamming distortion measure. in databases and the Wyner-Ziv rate distortion function for
Due to space limitations, we provide here only a sketch lossy source compression in the presence of decoder side
of the full analysis. It is not difficult to show that, although information. As an example, we have studied the case of
the alphabet size could potentially be as high as |Y| + 2 in binary symmetric feature vectors with noiseless enrollment
general, in this case |U| = |Y| = 2 suffices to characterize the channel and erasure query channel and evaluated the capac-
whole region, with PU |Y being a binary symmetric channel ity/storage/distortion tradeoff for this special scenario.
with crossover probability α ≤ 12 . With this choice,
R EFERENCES
I(Y ; U ) = 1 − H(α) (25)
[1] J. A. O’Sullivan and N. A. Schmid, “Large deviations performance
I(Z; U ) = (1 − )[1 − H(α)] (26) analysis for biometrics recognition,” Proc. of Allerton Conf. on Comm.,
Control, and Computing, Oct. 2002, Monticello, IL.
E[d(X, φ(U, Z))] = α (27) [2] F. Willems, T. Kalker, J. Goseling and J.-P. Linnartz, “On the capacity
of a biometrical identification system,” Proc. IEEE Int’l Symp. Inform.
with
Theory, Yokohama, Japan, July 2003.
z z= ? [3] M. B. Westover and J. A. O’Sullivan, “Achievable rates for pattern
φ(u, z) = .
u z =? recognition,” IEEE Trans. Inform. Theory, vol. 54, no. 1, pp. 299-320,
Jan. 2008.
It then follows that (Rc , Ri , D) ∈ R if and only if [4] E. Tuncel, “Capacity/storage tradeoff in high-dimensional identification
systems,” IEEE Trans. Inform. Theory, vol. 55, no. 5, pp. 2097-2106,
Rc ≥ Rc (Ri , D) May 2009.
[5] I. Csiszár and J. Körner, Information Theory: Coding Theorems for
Discrete Memoryless Systems, New York: Academic, 1981.
where [6] T. Berger, “Multiterminal source coding,” Lectures presented at CISM
Summer School on the Inform. Theory Approach to Commun., July 1977.
c i Ri + ψ (D) 0 ≤ Ri ≤ (1 − )ψ (D)
R (R , D) = Ri
[7] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding
1− (1 − )ψ (D) ≤ Ri ≤ 1 − with side information at the decoder,” IEEE Transactions on Information
(28) Theory, vol. 22, no. 1, pp. 1-10, Jan. 1976.
[8] E. Tuncel, “The rate transfer argument in two-stage scenarios: When
for 0 ≤ D ≤ 2 , with does it matter?” Proc. IEEE Int’l Symp. Inform. Theory, Seoul, S. Korea,
July 2009.
D
ψ (D) = 1 − H .
Note that D > 2 and Ri > 1 − need not be considered.
In (28), the expression for the range (1 − )ψ (D) ≤ Ri ≤
1 − follows from (25)-(27), whereas the expression in the
range 0 ≤ Ri ≤ (1−)ψ (D) is obtained through rate transfer.
195