0% found this document useful (0 votes)
9 views

Faster retrieval with a two-pass dynamic-time-warping lower bound

This paper presents a two-pass approach to improve the efficiency of dynamic time warping (DTW) for time series retrieval by utilizing a tighter lower bound (LB_Improved) compared to the existing LB_Keogh. The proposed method can significantly reduce the number of DTW computations, achieving speed improvements of 2-3 times for various time series types. The study includes experimental comparisons that demonstrate the effectiveness of the new lower bound in enhancing search performance in large databases.

Uploaded by

xiang ma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Faster retrieval with a two-pass dynamic-time-warping lower bound

This paper presents a two-pass approach to improve the efficiency of dynamic time warping (DTW) for time series retrieval by utilizing a tighter lower bound (LB_Improved) compared to the existing LB_Keogh. The proposed method can significantly reduce the number of DTW computations, achieving speed improvements of 2-3 times for various time series types. The study includes experimental comparisons that demonstrate the effectiveness of the new lower bound in enhancing search performance in large databases.

Uploaded by

xiang ma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Pattern Recognition 42 (2009) 2169 -- 2180

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r

Faster retrieval with a two-pass dynamic-time-warping lower bound


Daniel Lemire ∗
LICEF, Université du Québec à Montréal (UQAM), 100 Sherbrooke West, Montreal (Quebec), Canada H2X 3P2

A R T I C L E I N F O A B S T R A C T

Article history: The dynamic time warping (DTW) is a popular similarity measure between time series. The DTW fails to
Received 21 April 2008 satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neigh-
Received in revised form 7 October 2008 bors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive
Accepted 20 November 2008
lower bound (LB_Keogh). We compare LB_Keogh with a tighter lower bound (LB_Improved). We find that
LB_Improved-based search is faster. As an example, our approach is 2–3 times faster over random-walk
Keywords: and shape time series.
Time series © 2008 Elsevier Ltd. All rights reserved.
Very large databases
Indexing
Classification

1. Introduction We are able to improve upon LB_Keogh as follows. The first


step of our two-pass approach is LB_Keogh itself. If this first lower
Dynamic time warping (DTW) was initially introduced to rec- bound is sufficient to discard the candidate, then the computation
ognize spoken words [1], but it has since been applied to a wide terminates and the next candidate is considered. Otherwise, we pro-
range of information retrieval and database problems: handwriting cess the time series a second time to increase the lower bound
recognition [2,3], signature recognition [4,5], image de-interlacing (see Fig. 5). If this second lower bound is large enough, the can-
[6], appearance matching for security purposes [7], whale vocal- didate is pruned, otherwise we compute the full DTW. We show
ization classification [8], query by humming [9,10], classification of experimentally that the two-pass approach can be several times
motor activities [11], face localization [12], chromosome classifica- faster.
tion [13], shape retrieval [14,15], and so on. Unlike the Euclidean The paper is organized as follows. In Section 4, we define the
distance, DTW optimally aligns or “warps” the data points of two DTW in a generic manner as the minimization of the lp norm
time series (see Fig. 1). (DTWp ). Among other things, we show that if x and y are separated
When the distance between two time series forms a metric, such by a constant (x  c  y or x  c  y) then the DTW1 is the l1 norm
as the Euclidean distance or the Hamming distance, several index- (see Proposition 1). In Section 5, we compute generic lower bounds
ing or search techniques have been proposed [16–20]. However, on the DTW and their approximation errors using warping en-
even assuming that we have a metric, Weber et al. have shown velopes. In Section 6, we show how to compute the warping
that the performance of any indexing scheme degrades to that of a envelopes quickly. The next two sections introduce LB_Keogh and
sequential scan, when there are more than a few dimensions [21]. LB_Improved, respectively. Section 9 presents the application of
Otherwise—when the distance is not a metric or that the number these lower bounds for multidimensional indexing, whereas the last
of dimensions is too large—we use bounding techniques such as section presents an experimental comparison.
the generic multimedia object indexing (GEMINI) [22]. We quickly
discard (most) false positives by computing a lower bound. 2. Conventions
Ratanamahatana and Keogh [23] argue that their lower bound
(LB_Keogh) cannot be improved upon. To make their point, they Time series are arrays of values measured at certain times. For
report that LB_Keogh allows them to prune out over 90% of all DTW simplicity, we assume a regular sampling rate so that time series are
computations on several data sets. generic arrays of floating-point values. Time series have length n and

are indexed from 1 to n. The lp norm of x is xp = ( i |xi |p )1/p for any
integer 0 < p < ∞ and x∞ =maxi |xi |. The lp distance between x and
y is x−yp and it satisfies the triangle inequality x−zp  x−yp +
y − zp for 1  p  ∞. The distance between a point x and a set or
∗ Tel.: +1 514 987 3000×2835; fax: +1 514 843 2160. region S is d(x, S) = miny∈S d(x, y). Other conventions are summarized
E-mail address: [email protected] in Table 1.

0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2008.11.030
2170 D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180

array x, we write the suffix starting at position i, x(i) = xi , xi+1 , . . . , xn .


The symbol ⊕ is the exclusive or. Write qi,j = DTWp (x(i) , y(j) )p so that

DTWp (x, y) = p q1,1 , then

⎪ 0 if |x(i) | = |y(j) | = 0




⎨∞ if |x(i) | = 0 ⊕ |y(j) | = 0
qi,j = or |i − j| > w



⎪ p
⎩ |xi − yj |

+ min(qi+1,j , qi,j+1 , qi+1,j+1 ) otherwise

For p = ∞, we rewrite the preceding recursive formula with qi,j =


DTW∞ (x(i) , y(j) ), and qi,j =max(|xi −yj |, min(qi+1,j , qi,j+1 , qi+1,j+1 )) when
|x(i) |  0, |y(j) |  0, and |i − j|  w.
We can compute NDTW1 without locality constraint in O(n log n)
[34]: if the values of the time series are already sorted, the compu-
Fig. 1. Dynamic time warping example. tation is in O(n) time.
We can express the solution of the DTW problem as an alignment
Table 1 of the two initial time series (such as x = 0, 1, 1, 0 and y = 0, 1, 0, 0)
Frequently used conventions. where some of the values are repeated (such as x = 0, 1, 1, 0, 0 and
n Length of a time series y = 0, 1, 1, 0, 0). If we allow non-monotonicity (NDTW), then values
xp lp norm can also be inverted.
DTWp Monotonic DTW The non-monotonic DTW is no larger than the monotonic DTW
NDTWp Non-monotonic DTW
which is itself no larger than the lp norm: NDTWp (x, y)  DTWp (x, y) 
w DTW locality constraint
U(x), L(x) Warping envelope (see Section 5) x − yp for all 0 < p  ∞.
H(x, y) Projection of x on y (see Eq. (1)) The DTW1 has the property that if the time series are value-
separated, then the DTW is the l1 norm as the next proposition
shows. In Figs. 3 and 4, we present value-separated functions: their
3. Related works DTW1 is the area between the curves.

Beside DTW, several similarity metrics have been proposed Proposition 1. If x and y are such that either x  c  y or x  c  y for
including the directed and general Hausdorff distance, Pearson's some constant c, then DTW1 (x, y) = NDTW1 (x, y) = x − y1 .
correlation, nonlinear elastic matching distance [24], edit distance
with real penalty (ERP) [25], Needleman–Wunsch similarity [26], Proof. Assume x  c  y. Consider the two aligned (and extended)
Smith–Waterman similarity [27], and SimilB [28]. time series x , y such that NDTW1 (x, y) = x − y 1 . We have that
 
Boundary-based lower-bound functions sometimes outperform x  c  y and NDTW1 (x, y) = x − y 1 = i |xi − yi | = i |xi − c| + |c −
LB_Keogh [29]. We can also quantize [30] the time series. yi | = x − c1 + c − y 1  x − c1 + c − y1 = x − y1 . Since we also
Sakurai et al. [31] have shown that retrieval under the DTW can have NDTW1 (x, y)  DTW1 (x, y)  x − y1 , the equality follows. 
be faster by mixing progressively finer resolution and by applying
early abandoning [32] to the dynamic programming computation. √ Proposition 1 does not hold for p > 1: DTW √ 2 ((0, 0, 1, 0), (2, 3, 2, 2))=
17, whereas |(0, 0, 1, 0) − (2, 3, 2, 2)|2 = 18.
4. Dynamic time warping
5. Computing lower bounds on the DTW
A many-to-many matching between the data points in time series
x and the data point in time series y matches every data point xi in Given a time series x, define U(x)i =maxk {xk k−i|  w} and L(x)i =
x with at least one data point yj in y, and every data point in y with mink {xk k − i|  w} for i = 1, . . . , n. The pair U(x) and L(x) forms the
at least a data point in x. The set of matches (i, j) forms a warping warping envelope of x (see Fig. 2). We leave the locality constraint
path . We define the DTW as the minimization of the lp norm of w implicit.
the differences {xi − yj }(i,j)∈ over all warping paths. A warping path
is minimal if there is no subset  of  forming an warping path: for 1 x
simplicity we require all warping paths to be minimal. U(x)
In computing the DTW distance, we commonly require the warp- L(x)
ing to remain local. For time series x and y, we align values xi and yj
0.5
only if |i − j|  w for some locality constraint w  0 [1]. When w = 0,
the DTW becomes the lp distance, whereas when w  n, the DTW
has no locality constraint. The value of the DTW diminishes mono-
tonically as w increases. (We do not consider other forms of locality 0
constraints such as the Itakura parallelogram [33].)
Other than locality, DTW can be monotonic: if we align value xi
with value yj , then we cannot align value xi+1 with a value appearing -0.5
before yj (yj for j < j).
We note the DTW distance between x and y using the lp norm as
DTWp (x, y) when it is monotonic and as NDTWp (x, y) when mono-
tonicity is not required. -1
By dynamic programming, the monotonic DTW requires O(wn) 0 10 20 30 40 50 60 70 80 90 100
time. A typical value of w is n/10 [23] so that the DTW is in O(n2 ). To
compute the DTW, we use the following recursive formula. Given an Fig. 2. Warping envelope example.
D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180 2171

The theorem of this section has an elementary proof requiring 1.5


only the following technical lemma.

Lemma 1. If b ∈ [a, c] then (c − a)p  (c − b)p + (b − a)p for 1  p < ∞. 1


x
p p p
Proof. For p = 1, (c − b) + (b − a) = (c − a) . For p > 1, by deriving y
(c − b)p + (b − a)p with respect to b, we can show that it is minimized LB_Keogh
0.5
when b=(c +a)/2 and maximized when b ∈ {a, c}. The maximal value
is (c − a)p . Hence the result. 

The following theorem introduces a generic result that we use 0


to derive two lower bounds for the DTW including the original
Keogh–Ratanamahatana result [35]. Indeed, this new result not only
implies the lower bound LB_Keogh, but it also provides a lower -0.5
bound to the error made by LB_Keogh, thus allowing a tighter lower
bound (LB_Improved).
-1
0 20 40 60 80 100 120
Theorem 1. Given two equal-length time series x and y and 1  p < ∞,
then for any time series h satisfying xi  hi  U(y)i or xi  hi  L(y)i or
Fig. 3. LB_Keogh example: the area of the marked region is LB_Keogh1 (x, y).
hi = xi for all indexes i, we have

DTWp (x, y)p  NDTWp (x, y)p


 x − hpp + NDTW p (h, y)p Algorithm 1. Streaming algorithm to compute the warping envelope
using no more than 3n comparisons.
For p = ∞, a similar result is true: DTW∞ (x, y)  NDTW∞ (x, y) 
max(x − h∞ , NDTW∞ (h, y)). input a time series y indexed from 1 to n
input some DTW locality constraint w
Proof. Suppose that 1  p < ∞. Let  be a warping path such
 p return warping envelope U, L (two time series of length n)
that NDTWp (x, y)p = (i,j)∈ |xi − yj |p . By the constraint on h and u, l ← empty double-ended queues, we append to “back”
Lemma 1, we have that |xi − yj |  |xi − hi |p + |hi − yj |p for any
p
append 1 to u and l
(i, j) ∈  since hi ∈ [min(xi , yj ), max(xi , yj )]. Hence, we have that for i in {2, . . . , n} do
 p 
NDTWp (x, y)p  (i,j)∈ |xi −hi |p +|hi −yj |p  x−hp + (i,j)∈ |hi −yj |p . if i  w + 1 then

This proves the result since (i,j)∈ |hi − yj |  NDTWp (h, y). For p = ∞, Ui−w ← yfront(u) , Li−w ← yfront(l)
we have that if yi > yi−1 then
NDTW∞ (x, y) = max |xi − yj | pop u from back
(i,j)∈
while yi > yback(u) do
 max max(|xi − hi |, |hi − yj |) pop u from back
(i,j)∈
else
= max(x − h∞ , NDTW∞ (h, y)) pop l from back
concluding the proof.  while yi < yback(l) do
pop l from back
While Theorem 1 defines a lower bound (x−hp ), the next propo- append i to u and l
sition shows that this lower bound must be a tight approximation if i = 2w + 1 + front(u) then
as long as h is close to y in the lp norm. pop u from front
else if i = 2w + 1 + front(l) then
Proposition 2. Given two equal-length time series x and y, and pop l from front
1  p  ∞ with h as in Theorem 1, we have that x − hp approximates for i in {n + 1, . . . , n + w} do
both DTWp (x, y) and NDTWp (x, y) within h − yp . Ui−w ← yfront(u) , Li−w ← yfront(l)
if i-front(u)  2w + 1 then
Proof. By the triangle inequality over lp , we have x − hp + h − pop u from front
yp  x − yp . Since x − yp  DTWp (x, y), we have x − hp + if i-front(l)  2w + 1 then
h − yp  DTWp (x, y), and hence h − yp  DTWp (x, y) − x − hp . pop l from front
This proves the result since by Theorem 1, we have that
DTWp (x, y)  NDTWp (x, y)  x − hp .  7. LB_Keogh
This bound on the approximation error is reasonably tight. If x
and y are separated by a constant, then DTW1 (x, y) = x − y1 by Let H(x, y) be the projection of x on y defined as
 
Proposition 1 and x − y1 = i |xi − yi | = i |xi − hi | + |hi − yi | = x − ⎧
h1 + h − y1 . Hence, the approximation error is exactly h − y1 in ⎨ U(y)i if xi  U(y)i
such instances. H(x, y)i = L(y)i if xi  L(y)i (1)

xi otherwise
6. Warping envelopes
for i = 1, 2, . . . , n. We have that H(x, y) is in the envelope of y. By
The computation of the warping envelope U(x), L(x) requires Theorem 1 and setting h = H(x, y), we have that NDTWp (x, y)p  x −
p
O(nw) time using the naive approach of repeatedly computing the H(x, y)p + NDTWp (H(x, y), y)p for 1  p < ∞. Write LB_Keoghp (x, y) =
maximum and the minimum over windows. Instead, we compute x − H(x, y)p (see Fig. 3), then LB_Keoghp (x, y) is a lower bound to
the envelope with at most 3n comparisons between data-point NDTWp (x, y) and thus DTWp (x, y). The following corollary follows
values [36] using Algorithm 1. from Theorem 1 and Proposition 2.
2172 D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180

Corollary 1. Given two equal-length time series x and y and 1  p  ∞, 1.5


then

• LB_Keoghp (x, y) is a lower bound to the DTW: 1

DTWp (x, y)  NDTWp (x, y)  LB_Keoghp (x, y) x


y
• the accuracy of LB_Keogh is bounded by the distance to the envelope: 0.5 LB_Improved

DTWp (x, y) − LB_Keoghp (x, y)   max{U(y)i − yi , yi − L(y)i }i p


0
for all x.

Algorithm 2 shows how LB_Keogh can be used to find a nearest -0.5


neighbor in a time-series database. We used DTW1 for all implemen-
tations (see Appendix C). The computation of the envelope of the
query time series is done once (see line 4). The lower bound is com- -1
puted in lines (lines 7–12). If the lower bound is sufficiently large, 0 20 40 60 80 100 120
the DTW is not computed (see line 13). Ignoring the computation of
the full DTW, at most (2N + 3)n comparisons between data points Fig. 4. LB_Improved example: the area of the marked region is LB_Improved1 (x, y).

are required to process a database containing N time series.


Algorithm 3 shows how to apply LB_Improved as a two-step pro-
Algorithm 2. LB_Keogh-based nearest-neighbor algorithm. cess (see Fig. 5). Initially, for each candidate x, we compute the lower
bound LB_Keogh1 (x, y) (see lines 8–15). If this lower bound is suf-
1: input a time series y indexed from 1 to n ficiently large, the candidate is discarded (see line 16), otherwise
2: input a set S of candidate time series we add LB_Keogh1 (y, H(x, y)) to LB_Keogh1 (x, y), in effect computing
3: return the nearest neighbor B to y in S under DTW1 LB_Improved1 (x, y) (see lines 17–22). If this larger lower bound is
4: U, L ← envelope(y) sufficiently large, the candidate is finally discarded (see line 23). Oth-
5: b ← ∞ {b stores minx∈S DTW1 (x, y)} erwise, we compute the full DTW. If  is the fraction of candidates
6: for candidate x in S do pruned by LB_Keogh, at most (2N + 3)n + 5(1 − )Nn comparisons
7:  ← 0 { stores the lower bound} between data points are required to process a database containing
8: for i ∈ {1, 2, . . . , n} do N time series.
9: if xi > Ui then
10:  ←  + xi − Ui Algorithm 3. LB_Improved-based nearest-neighbor algorithm.
11: else if xi < Li then
12:  ←  + Li − xi 1: input a time series y indexed from 1 to n
13: if  < b then 2: input a set S of candidate time series
14: t ← DTW1 (a, c) {We compute the full DTW.} 3: return the nearest neighbor B to y in S under DTW1
15: if t < b then 4: U, L ← envelope(y)
16: b←t 5: b ← ∞{b stores minx∈S DTW1 (x, y)}
17: B←c 6: for candidate x in S do
7: copy x to x {x will store the projection of x on y}
8. LB_Improved 8:  ← 0{ stores the lower bound}
9: for i ∈ {1, 2, . . . , n} do
In the previous section, we saw that NDTWp (x, y)p  LB_Keoghp 10: if xi > Ui then
(x, y)p + NDTWp (H(x, y), y)p for 1  p < ∞. In turn, we have 11:  ←  + xi − Ui
NDTWp (H(x, y), y)  LB_Keoghp (y, H(x, y)). Hence, write 12: xi = Ui
13: else if xi < Li then
LB_Improvedp (x, y)p = LB_Keoghp (x, y)p + LB_Keoghp (y, H(x, y))p 14:  ←  + Li − x i
15: xi = Li
for 1  p < ∞. By definition, we have LB_Improvedp (x, y) 
16: if  < b then
LB_Keoghp (x, y). Intuitively, whereas LB_Keoghp (x, y) measures the
17: U  , L ← envelope(x )
distance between x and the envelope of y, LB_Keoghp (y, H(x, y))
18: for i ∈ {1, 2, . . . , n} do
measures the distance between y and the envelope of the projection
19: if yi > Ui then
of x on y (see Fig. 4). The next corollary shows that LB_Improved is
20:  ←  + yi − Ui
a lower bound to the DTW.
21: else if yi < Li then
22:  ←  + Li − yi
Corollary 2. Given two equal-length time series x and y and
23: if  < b then
1  p < ∞, then LB_Improvedp (x, y) is a lower bound to the DTW:
24: t ← DTW1 (a, c) {We compute the full DTW.}
DTWp (x, y)  NDTWp (x, y)  LB_Improvedp (x, y).
25: if t < b then
26: b←t
Proof. Recall that LB_Keoghp (x, y) = x − H(x, y)p . First apply 27: B←c
Theorem 1: DTWp (x, y)p  NDTWp (x, y)p  LB_Keoghp (x, y)p +
NDTWp (H(x, y), y)p . Apply Theorem 1 once more: NDTWp (y, H(x, y))p 9. Using a multidimensional indexing structure
 LB_Keoghp (y, H(x, y))p . By substitution, we get DTWp (x, y)p 
NDTWp (x, y)p  LB_Keoghp (x, y)p + LB_Keoghp (y, H(x, y))p thus prov- The running time of Algorithms 2 and 3 may be improved if we
ing the result.  use a multidimensional index such as an R∗ -tree [37]. Unfortunately,
D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180 2173

20 20
L
10 y 10 U
L x
0 0
U
-10 -10
-20 -20
-30 -30
-40 -40
-50 -50
-60 -60
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

20 20
L
10 10 U
0 0 x‘
-10 -10
-20 -20
-30 -30
-40 LB_Keogh -40
-50 L -50
U
-60 -60
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

20 20
LB_Improved - LB_Keogh
10 10 L‘
0 0 U‘
-10 -10
-20 -20
-30 x‘ -30
-40 L‘ -40 LB_Keogh
U‘ L
-50 -50 U
-60 -60
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Fig. 5. Computation of LB_Improved as in Algorithm 3. (a) We begin with y and its envelope L(y); U(y). (b) We compare candidate x with the envelope L(y); U(y). (c) The
difference is LB_Keogh(x, y). (d) We compute x , the projection of x on the envelope L(y); U(y). (e) We compute the envelope of x . (f) The difference between y and the
envelope L(x ); U(x ) is added to LB_Keogh to compute LB_Improved.

the performance of such an index diminishes quickly as the num- We can summarize the Zhu–Shasha R∗ -tree algorithm as follows:
ber of dimensions increases [21]. To solve this problem, several di-
mensionality reduction techniques are possible such as piecewise (1) for each time series x in the database, add Pd (x) to the R∗ -tree;
linear [38–40] segmentation. Following Zhu and Shasha [10], we (2) given a query time series y, compute its envelope E =
project time series and their envelopes on a d-dimensional space us-
 Pd (L(y)), Pd (U(y));
ing piecewise sums: Pd (x)=( i∈Cj xi )j where C1 , C2 , . . . , Cd is a disjoint (3) starting with b = ∞, iterate over all candidate Pd (x) at a l1 dis-
cover of {1, 2, . . . , n}. Unlike Zhu and Shasha, we do not require the tance b from the envelope E using the R∗ -tree, once a candidate
intervals to have equal length. The l1 distance between Pd (y) and the is found, update b with DTW1 (x, y) and repeat until you have
minimum bounding hyperrectangle containing Pd (L(x)) and Pd (U(x)) exhausted all candidates.
is a lower bound to the DTW1 (x, y):
This algorithm is correct because the distance between E and

n
Pd (x) is a lower bound to DTW1 (x, y). However, dimension-
DTW1 (x, y)  LB_Keogh1 (x, y) = d(xi , [L(y)i , U(y)i ])
ality reduction diminishes the pruning power of LB_Keogh :
i=1
d(E, Pd (x))  LB_KEOGH1 (x, y). Hence, we propose a new algorithm

d
(R ∗ -TREE + LB_KEOGH) where instead of immediately updating b with
 d(Pd (x)j , [Pd (L(y))j , Pd (U(y))j ])
DTW1 (x, y), we first compute the LB_Keogh lower bound between x
j=1
and y. Only when it is less than b, do we compute the full DTW. Fi-
nally, as a third algorithm (R ∗ -TREE + LB_IMPROVED), we first compute
For our experiments, we chose the cover Cj = [1 + (j − 1) n/d , j n/d ] LB_Keogh, and if it is less than b, then we compute LB_Improved,
for j = 1, . . . , d − 1 and Cd = [1 + (d − 1) n/d , n]. and only when it is also lower than b do we compute the DTW, as
2174 D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180

in Algorithm 3. R ∗ -TREE + LB_IMPROVED has maximal pruning power, The R∗ -tree was implemented using the Spatial Index library [42].
whereas Zhu–Shasha R∗ -tree has the lesser pruning power of the In informal tests, we found that a projection on an eight-dimensional
three alternatives. space, as described by Zhu and Shasha, gave good results: substan-
tially larger (d > 10) or smaller (d < 6) settings gave poorer per-
formance. We used a 4096-byte page size and a 10-entry internal
10. Comparing Zhu–Shasha R∗ -tree, LB_Keogh, and LB_Improved memory buffer.
For R∗ -TREE + LB_KEOGH and R∗ -TREE + LB_IMPROVED, we experi-
In this section, we benchmark algorithms Zhu–Shasha R∗ -tree, mented with early abandoning [32] to cancel the computation of
R ∗ -TREE + LB_KEOGH, and R ∗ -TREE + LB_IMPROVED. We know that the lower bound as soon as the error is too large. While it often im-
the LB_Improved approach has at least the pruning power of the proved retrieval time slightly for both LB_Keogh and LB_Improved,
other methods, but does more pruning translate into a faster nearest- the difference was always small (less than ≈ 1%). One explanation is
neighbor retrieval under the DTW distance? that the candidates produced by the Zhu–Shasha R∗ -tree are rarely
We implemented the algorithms in C + + using an external- poor enough to warrant efficient early abandoning.
memory R∗ -tree. The time series are stored on disk in a binary flat We do not report our benchmarking results over the simple Algo-
file. We used the GNU GCC 4.0.2 compiler on an Apple Mac Pro, hav- rithms 2 and 3. In almost all cases, the R∗ -tree equivalent—R∗ -TREE +
ing two Intel Xeon dual-core processors running at 2.66 GHz with LB_KEOGH or R∗ -TREE + LB_IMPROVED—was at least slightly better and
2 GiB of RAM. No thrashing was observed. We measured the wall- sometimes several times faster.
clock total time. In all experiments, we benchmark nearest-neighbor
retrieval under the DTW1 . By default, the locality constraint w is 10.1. Synthetic data sets
set at 10% (w = n/10). To ensure reproducibility, our source code is
freely available [41], including the script used to generate synthetic We tested our algorithms using the cylinder–bell–funnel [43] and
data sets. We compute the full DTW using an O(nw)-time dynamic control charts [44] data sets, as well as over two databases of random
programming algorithm. walks. We generated 256-sample and 1000-sample random-walk
time series using the formula xi = xi−1 + N(0, 1) and x1 = 0.

0.25 0.14
Zhu-Shasha R*-Tree
R*-Tree + LB_Keogh
R*-Tree + LB_Improved 0.12
0.2
0.1
full DTW (%)

0.15 0.08
time (s)

0.1 0.06

0.04
0.05
0.02 Zhu-Shasha R*-Tree
R*-Tree + LB_Keogh
R*-Tree + LB_Improved
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
database size (in thousands) database size (in thousands)

Fig. 6. Nearest-neighbor retrieval for the 256-sample random-walk data set. (a) Average retrieval time. (b) Pruning power.

1.8 0.7
Zhu-Shasha R*-Tree
1.6 R*-Tree + LB_Keogh
R*-Tree + LB_Improved 0.6
1.4
0.5
1.2
full DTW (%)
time (s)

1 0.4

0.8 0.3
0.6
0.2
0.4
0.1 Zhu-Shasha R*-Tree
0.2 R*-Tree + LB_Keogh
R*-Tree + LB_Improved
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
database size (in thousands) database size (in thousands)

Fig. 7. Nearest-neighbor retrieval for the cylinder–bell–funnel data set. (a) Average retrieval time. (b) Pruning power.
D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180 2175

0.45 0.45
Zhu-Shasha R*-Tree
0.4 R*-Tree + LB_Keogh 0.4
R*-Tree + LB_Improved
0.35 0.35
0.3 0.3

full DTW (%)


time (s)

0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
Zhu-Shasha R*-Tree
0.05 0.05 R*-Tree + LB_Keogh
R*-Tree + LB_Improved
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
database size (in thousands) database size (in thousands)

Fig. 8. Nearest-neighbor retrieval for the control charts data set. (a) Average retrieval time. (b) Pruning power.

10 0.14
Zhu-Shasha R*-Tree
9 R*-Tree + LB_Keogh Zhu-Shasha R*-Tree
R*-Tree + LB_Improved 0.12 R*-Tree + LB_Keogh
8 R*-Tree + LB_Improved

7 0.1
full DTW (%)
time (s)

6
0.08
5
4 0.06
3 0.04
2
0.02
1
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
database size (in thousands) database size (in thousands)

Fig. 9. Nearest-neighbor retrieval for the 1000-sample random-walk data set. (a) Average retrieval time. (b) Pruning power.

20 0.6
Zhu-Shasha R*-Tree
18 R*-Tree + LB_Keogh
R*-Tree + LB_Improved
0.5
16
Zhu-Shasha R*-Tree
14 R*-Tree + LB_Keogh
full DTW (%)

0.4
R*-Tree + LB_Improved
12
time (s)

10 0.3
8
0.2
6
4
0.1
2
0 0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
database size (in thousands) database size (in thousands)

Fig. 10. Nearest-neighbor retrieval for the heterogeneous shape data set. (a) Average retrieval time. (b) Pruning power.

For each data set, we generated a database of 50 000 time series dates than LB_Keogh. R∗ -TREE +LB_IMPROVED is faster than Zhu–Shasha
by adding randomly chosen items. Figs. 6–9 show the average tim- R∗ -tree by a factor between 0 and 6.
ings and pruning ratio averaged over 20 queries based on randomly We saw almost no performance gain over Zhu–Shasha R∗ -tree
chosen time series as we consider larger and large fraction of the with simple time series such as the cylinder–bell–funnel or the con-
database. LB_Improved prunes between 2 and 4 times more candi- trol charts data sets. However, in these cases, even LB_Improved has
2176 D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180

3 0.6
Zhu-Shasha R*-Tree Zhu-Shasha R*-Tree
R*-Tree + LB_Keogh R*-Tree + LB_Keogh
2.5 R*-Tree + LB_Improved 0.5 R*-Tree + LB_Improved

2 0.4

full DTW (%)


time (s)

1.5 0.3

1 0.2

0.5 0.1

0 0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
database size (in thousands) database size (in thousands)

Fig. 11. Nearest-neighbor retrieval for the arrow-head shape data set. (a) Average retrieval time. (b) Pruning power.

0.09 0.8
Zhu-Shasha R*-Tree Zhu-Shasha R*-Tree
0.08 R*-Tree + LB_Keogh R*-Tree + LB_Keogh
R*-Tree + LB_Improved 0.7 R*-Tree + LB_Improved
0.07 0.6
0.06
0.5
time (s)

time (s)

0.05
0.4
0.04
0.3
0.03
0.02 0.2

0.01 0.1
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
database size (in thousands) database size (in thousands)

Fig. 12. Average nearest-neighbor retrieval time for the 256-sample random-walk data set. (a) w = 5%. (b) w = 20%.

0.5 8
Zhu-Shasha R*-Tree Zhu-Shasha R*-Tree
0.45 R*-Tree + LB_Keogh
7 R*-Tree + LB_Keogh
R*-Tree + LB_Improved R*-Tree + LB_Improved
0.4
6
0.35
0.3 5
time (s)

time (s)

0.25 4
0.2 3
0.15
2
0.1
0.05 1
0 0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
database size (in thousands) database size (in thousands)

Fig. 13. Average nearest-neighbor retrieval time for the arrow-head shape data set. (a) w = 5%. (b) w = 20%.

modest pruning powers of 40% and 15%. Low pruning means that which resulted in 5844 1024-sample times series. The second data
the computational cost is dominated by the cost of the full DTW. set is an arrow-head data set with 15 000 251-sample time series.
We extracted 50 time series from each data set, and we present
10.2. Shape data sets the average nearest-neighbor retrieval times and pruning power
as we consider various fractions of each database (see Figs. 10
We also considered a large collection of time series derived from and 11). The results are similar: LB_Improved has twice the prun-
shapes [45,46]. The first data set is made of heterogeneous shapes ing power than LB_Keogh, R∗ -TREE + LB_IMPROVED is twice as fast as
D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180 2177

R∗ -TREE + LB_KEOGH and over 3 times faster than the Zhu–Shasha Lemma 2. For any 0 < p  ∞, if y=c is a constant, then NDTWp (x, y)=
R∗ -tree. DTWp (x, y) = x − yp .

10.3. Locality constraint When p = ∞, a stronger result is true: if y = x + c for some


constant c, then NDTW∞ (x, y) = DTW∞ (x, y) = x − y∞ . Indeed,
The locality constraint has an effect on retrieval times: a large NDTW∞ (x, y)  | max(y) − max(x)| = c = x − y∞  x − y∞ which
value of w makes the problem more difficult and reduces the pruning shows the result. This same result √ is not true for p < ∞: for√x = 0, 1, 2
p p
power of all methods. In Figs. 12 and 13 , we present the retrieval and y=1, 2, 3, we have x−yp = 3, whereas DTWp (x, y)= 2. How-
times for w = 5% and 20%. The benefits of R∗ -TREE + LB_IMPROVED ever, the DTW is translation invariant: DTWp (x, z)=DTWp (x+b, z+b)
remain though they are less significant for small locality constraints. and NDTWp (x, z) = NDTWp (x + b, z + b) for any scalar b and 0 < p  ∞.
Nevertheless, even in this case, R∗ -TREE + LB_IMPROVED can still be 3 In classical analysis, we have that n1/p−1/q xq  xp [47] for
times faster than Zhu–Shasha R∗ -tree. For all our data sets and for 1  p < q  ∞. A similar results is true for the DTW and it allows us to
all values of w ∈ {5%, 10%, 20%}, R∗ -TREE + LB_IMPROVED was always at conclude that DTWp (x, y) and NDTWp (x, y) decrease monotonically
least as fast as the Zhu–Shasha R∗ -tree algorithm alone. as p increases.

Proposition 4. For 1  p < q  ∞, we have that (2n − 2)1/p−1/q DTWq


11. Conclusion
(x, y)  DTWp (x, y) where n is the length of x and y. The result also holds
for the non-monotonic DTW.
We have shown that a two-pass pruning technique can im-
prove the retrieval speed by 3 times or more in several time-
series databases. In our implementation, LB_Improved required Proof. Assume n > 1. The argument is the same for the monotonic
slightly more computation than LB_Keogh, but its added pruning or non-monotonic DTW. Given x, y consider the two aligned (and ex-
power was enough to make the overall computation several times tended) time series x , y such that DTWq (x, y) = x − y q . Let nx be
faster. Moreover, we showed that pruning candidates left from the the length of x and ny be the length of y . As a consequence of Propo-
Zhu–Shasha R∗ -tree with the full LB_Keogh alone—without dimen- sition 3, we have nx = ny  2n − 2. From classical analysis, we have
1/p−1/q
sionality reduction—was enough to significantly boost the speed nx  x − y q  x − y p , hence |2n − 2|1/p−1/q x − y q  x − y p
and pruning power. On some synthetic data sets, neither LB_Keogh or |2n − 2|1/p−1/q DTWq (x, y)  x − y p . Since x , y represent a valid
nor LB_Improved were able to prune enough candidates, making all warping path of x, y, then x − y p  DTWp (x, y) which concludes
algorithms comparable in speed. the proof. 

Acknowledgments Appendix B. The triangle inequality

The author is supported by NSERC Grant 261437 and FQRNT Grant The DTW is commonly used as a similarity measure: x and y
112381. are similar if DTWp (x, y) is small. Similarity measures often define
equivalence relations: A ∼ A for all A (reflexivity), A ∼ B ⇒ B ∼ C
(symmetry) and A ∼ B ∧ B ∼ C ⇒ A ∼ C (transitivity).
Appendix A. Some properties of DTW
The DTW is reflexive and symmetric, but it is not transitive.
Indeed, consider the following time series:
The DTW distance can be counterintuitive. As an example, if
x, y, z are three time series such that x  y  z pointwise, then it does
not follow that DTWp (x, z)  DTWp (z, y). Indeed, choose x = 7, 0, 1, 0, X = 0, 0, . . . , 0, 0

y = 7, 0, 5, 0, and z = 7, 7, 7, 0, then DTW∞ (z, y) = 5 and DTW∞ (z, x) = 1. 2m+1 times
Hence, we review some of the mathematical properties of the DTW.
The warping path aligns xi from time series x and yj from time Y = 0, 0, . . . , 0, 0, , 0, 0, . . . , 0, 0
series y if (i, j) ∈ . The next proposition is a general constraint on  
m times m times
warping paths.

Z = 0, , , . . . , , , 0
Proposition 3. Consider any two time series x and y. For any minimal 
warping path, if xi is aligned with yj , then either xi is aligned only with yj 2m−1 times

or yj is aligned only with xi . Therefore the length of a minimal warping


path is at most 2n − 2 when n > 1. We have that NDTWp (X, Y) = DTWp (X, Y) = ||, NDTWp (Y, Z) =
DTWp (Y, Z) = 0, NDTWp (X, Z) = DTWp (X, Z) = p (2m − 1)|| for
Proof. Suppose that the result is not true. Then there is xk , xi and 1  p < ∞ and w = m − 1. Hence, for  small and n?1/ , we have
yl , yj such that xk and xi are aligned with yj , and yl and yj are aligned that X ∼ Y and Y ∼ Z, but X ∼
/ Z. This example proves the following
with xi . We can delete (k, j) from the warping path and still have a lemma.
warping path. A contradiction.
Next, we show that warping path is no longer than 2n − 2. Let n1 Lemma 3. For 1  p < ∞ and w > 0, neither DTWp nor NDTWp satis-
be the number of points in x aligned with only one point in y, and fies a triangle inequality of the form d(x, y) + d(y, z)  cd(x, z) where c is
let n2 be the number of points in y aligned with only one point in x. independent of the length of the time series and of the locality constraint.
The cardinality of a minimal warping path is bounded by n1 + n2 . If
n1 = n or n2 = n, then n1 = n2 = n and the warping path has cardinality This theoretical result is somewhat at odd with practical ex-
n which is no larger than 2n − 2 for n > 1. Otherwise, n1  n − 1 and perience. Casacuberta et al. found no triangle inequality violation
n2  n − 1, and n1 + n2 < 2n − 2.  in about 15 million triplets of voice recordings [48]. To deter-
mine whether we could expect violations of the triangle inequal-
The next lemma shows that the DTW becomes the lp distance ity in practice, we ran the following experiment. We used three
when either x or y is constant. types of 100-sample time series: white-noise times series defined
2178 D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180

p 
by xi = N(0, 1) where N is the normal distribution, random-walk min(2w + 1, n)DTWp (x, y)p and y − z p = (i,j,k)∈ |yj − zk |p 
time series defined by xi = xi−1 + N(0, 1) and x1 = 0, and the min(2w + 1, n)DTWp (y, z)p . By the triangle inequality in lp , we have
cylinder–bell–funnel time series proposed by Saito [43]. For each
type, we generated 100 000 triples of time series x, y, z and we min(2w + 1, n)1/p (DTWp (x, y) + DTWp (y, z))
computed the histogram of the function  x − y p + y − z p
DTWp (x, z)
 x − z p  DTWp (x, z)
C(x, y, z) =
DTWp (x, y) + DTWp (y, z) p
For p = ∞, max(i,j,k)∈ xi − yj p = DTW∞ (x, y)p and max(i,j,k)∈ |yj −
zk |p = DTW∞ (y, z)p , thus proving the result by the triangle inequality
for p = 1 and 2. The DTW is computed without time constraints. Over
over l∞ . The proof is the same for the non-monotonic DTW. 
the white-noise and cylinder–bell–funnel time series, we failed to
find a single violation of the triangle inequality: a triple x, y, z for
which C(x, y, z) > 1. However, for the random-walk time series, we The constant min(2w+1, n)1/p is tight. Consider the example with
found that 20% and 15% of the triples violated the triangle inequality time series X, Y, Z presented before Lemma 3. We have DTWp (X, Y) +
for DTW1 and DTW2 . DTWp (Y, Z) = || and DTWp (X, Z) = p (2w + 1)||. Therefore, we have
The DTW satisfies a weak triangle inequality as the next theorem
DTWp (X, Z)
shows. DTWp (X, Y) + DTWp (Y, Z) = .
min(2w + 1, n)1/p
Theorem 2. Given any three same-length time series x, y, z and A consequence of this theorem is that DTW∞ satisfies the traditional
1  p  ∞, we have triangle inequality.

DTWp (x, z)
DTWp (x, y) + DTWp (y, z)  Corollary 3. The triangle inequality d(x, y) + d(y, z)  d(x, z) holds for
min(2w + 1, n)1/p DTW∞ and NDTW∞ .

where w is the locality constraint. The result also holds for the non-
Hence the DTW∞ is a pseudometric: it is a metric over equiva-
monotonic DTW.
lence classes defined by x ∼ y if and only if DTW∞ (x, y) = 0. When
no locality constraint is enforced (w  n), DTW∞ is equivalent to the
Proof. Let  and  be minimal warping paths between x and y discrete Fréchet distance [49].
and between y and z. Let  = {(i, j, k)|(i, j) ∈  and (j, k) ∈  }. Iter-
ate through the tuples (i, j, k) in  and construct the same-length
time series x , y , z from xi , yj , and zk . By the locality constraint Appendix C. Which is the best distance measure?
any match (i, j) ∈  corresponds to at most min(2w + 1, n) tuples
of the form (i, j, ·) ∈  , and similarly for any match (j, k) ∈  . The DTW can be seen as the minimization of the lp distance un-
p 
Assume 1  p < ∞. We have that x − y p = (i,j,k)∈ |xi − yj |p  der warping. Which p should we choose? Legrand et al. reported best

1 1
0.95
0.95
0.9
0.9
0.85 DTW1
DTW2
0.8 0.85 DTW4
DTW∞
0.75
0.8
0.7 DTW1
DTW2 0.75
0.65 DTW4
DTW∞
0.6 0.7
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

0.72 0.62
0.7 0.6
0.68 0.58
0.66 DTW1 0.56 DTW1
DTW2 DTW2
DTW4 DTW4
0.64 DTW∞ 0.54 DTW∞

0.62 0.52
0.6 0.5
0.58 0.48
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Fig. C.1. Classification accuracy versus the number of instances of each class in four data sets. (a) Cylinder–bell–funnel. (b) Control charts. (c) Waveform. (d) Wave+noise.
D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180 2179

results for chromosome classification using DTW1 [13] as opposed [12] L.E.M. Lopez, R.P. Elias, J.V. Tavira, Face localization in color images using
to using DTW2 . However, they did not quantify the benefits of dynamic time warping and integral projections, in: IJCNN 2007, 2007,
pp. 892–896.
DTW1 . Morse and Patel reported similar results with both DTW1 and [13] B. Legrand, C.S. Chang, S.H. Ong, S.Y. Neo, N. Palanisamy, Chromosome
DTW2 [50]. classification using dynamic time warping, Pattern Recognition Letters 29 (3)
While they do not consider the DTW, Aggarwal et al. [51] argue (2007) 215–222.
[14] I. Bartolini, P. Ciaccia, M. Patella, WARP: accurate retrieval of shapes using
that out of the usual lp norms, only the l1 norm, and to a lesser phase of Fourier descriptors and time warping distance, IEEE Transactions on
extend the l2 norm, express a qualitatively meaningful distance when Pattern Analysis and Machine Intelligence 27 (1) (2005) 142–147.
there are numerous dimensions. They even report on classification- [15] A. Marzal, V. Palazon, G. Peris, Contour-based shape retrieval using dynamic
time warping, in: Lecture Notes in Computer Science, vol. 4177, Springer, Berlin,
accuracy experiments where fractional lp distances such as l0.1 and
2006, p. 190.
l0.5 fare better. François et al. [52] made the theoretical result more [16] E. Chávez, G. Navarro, R. Baeza-Yates, J.L. Marroquín, Searching in metric spaces,
precise showing that under uniformity assumptions, lesser values of ACM Computing Surveys 33 (3) (2001) 273–321.
p are always better. [17] K. Fredriksson, Engineering efficient metric indexes, Pattern Recognition Letters
28 (1) (2007) 75–84.
To compare DTW1 , DTW2 , DTW4 and DTW∞ , we considered four [18] G.R. Hjaltason, H. Samet, Index-driven similarity search in metric spaces (survey
different synthetic time-series data sets: cylinder–bell–funnel [43], article), ACM Transactions on Database Systems 28 (4) (2003) 517–580.
control charts [44], waveform [53], and wave+noise [54]. The time [19] L. Micó, J. Oncina, R.C. Carrasco, A fast branch & bound nearest neighbour
classifier in metric spaces, Pattern Recognition Letters 17 (7) (1996) 731–739.
series in each data sets have lengths 128, 60, 21, and 40. The control
[20] J.Z.C. Lai, Y.-C. Liaw, J. Liu, Fast k-nearest-neighbor search based on projection
charts data set has six classes of time series, whereas the other three and triangular inequality, Pattern Recognition 40 (2) (2007) 351–359.
data sets have three classes each. For each data set, we generated [21] R. Weber, H.-J. Schek, S. Blott, A quantitative analysis and performance study
various databases having a different number of instances per class: for similarity-search methods in high-dimensional spaces, in: VLDB '98, 1998,
pp. 194–205.
between 1 and 9 inclusively for cylinder–bell–funnel and control [22] C. Faloutsos, Searching Multimedia Databases by Content, Kluwer Academic
charts, and between 1 and 99 for waveform and wave+noise. For a Publishers, Dordrecht, 1996.
given data set and a given number of instances, 50 different databases [23] C.A. Ratanamahatana, E. Keogh, Three myths about dynamic time warping data
mining, in: SDM'05, 2005.
were generated. For each database, we generated 500 new instances [24] R.C. Veltkamp, Shape matching: similarity measures and algorithms, in: Shape
chosen from a random class and we found a nearest neighbor in the Modeling and Applications, 2001, pp. 188–197.
database using DTWp for p = 1, 2, 4, ∞ and using a time constraint [25] L. Chen, R. Ng, On the marriage of lp -norms and edit distance, in: VLDB'04,
2004, pp. 1040–1049.
of w = n/10. When the instance is of the same class as the nearest
[26] S.B. Needleman, C.D. Wunsch, A general method applicable to the search for
neighbor, we considered that the classification was a success. similarities in the amino acid sequence of two proteins, Journal of Molecular
The average classification accuracies for the four data sets, and Biology 48 (3) (1970) 443–453.
for various number of instances per class is given in Fig. C.1. The [27] T.F. Smith, M.S. Waterman, Identification of common molecular subsequences,
Journal of Molecular Biology 147 (1981) 195–197.
average is taken over 25 000 classification tests (50 × 500), over 50 [28] A.-O. Boudraa, J.-C. Cexus, M. Groussat, P. Brunagel, An energy-based similarity
different databases. measure for time series, EURASIP Journal on Advances in Signal Processing
Only when there are one or two instances of each class is DTW∞ 2008 (1) (2008) 1–9.
[29] M. Zhou, M.H. Wong, Boundary-based lower-bound functions for dynamic time
competitive. Otherwise, the accuracy of the DTW∞ -based classifica- warping and their indexing, in: ICDE 2007, 2007, pp. 1307–1311.
tion does not improve as we add more instances of each class. For the [30] I.F. Vega-López, B. Moon, Quantizing time series for efficient similarity search
waveform data set, DTW1 and DTW2 have comparable accuracies. under time warping, in: ACST'06, ACTA Press, Anaheim, CA, USA, 2006,
pp. 334–339.
For the other three data sets, DTW1 has a better nearest-neighbor
[31] Y. Sakurai, M. Yoshikawa, C. Faloutsos, FTW: fast similarity search under the
classification accuracy than DTW2 . Classification with DTW4 has time warping distance, in: PODS '05, 2005, pp. 326–337.
almost always a lower accuracy than either DTW1 or DTW2 . [32] L. Wei, E. Keogh, H.V. Herle, A. Mafra-Neto, Atomic wedgie: efficient query
Based on these results, DTW1 is a good choice to classify time filtering for streaming times series, in: ICDM '05, 2005, pp. 490–497.
[33] F. Itakura, Minimum prediction residual principle applied to speech recognition,
series, whereas DTW2 is a close second. IEEE Transactions on Acoustics, Speech, and Signal Processing 23 (1) (1975)
67–72.
[34] J. Colannino, M. Damian, F. Hurtado, S. Langerman, H. Meijer, S. Ramaswami,
References D. Souvaine, G. Toussaint, Efficient many-to-many point matching in one
dimension, Graphs and Combinatorics 23 (1) (2007) 169–178.
[35] E. Keogh, C.A. Ratanamahatana, Exact indexing of dynamic time warping,
[1] H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken
Knowledge and Information Systems 7 (3) (2005) 358–386.
word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing
[36] D. Lemire, Streaming maximum–minimum filter using no more than three
26 (1) (1978) 43–49.
comparisons per element, Nordic Journal of Computing 13 (4) (2006) 328–339.
[2] C. Bahlmann, The writer independent online handwriting recognition system
[37] N. Beckmann, H. Kriegel, R. Schneider, B. Seeger, The R∗ -tree: an efficient
frog on hand and cluster generative statistical dynamic time warping, Writer
and robust access method for points and rectangles, in: SIGMOD '90, 1990,
26 (3) (2004) 299–310.
pp. 322–331.
[3] R. Niels, L. Vuurpijl, Using dynamic time warping for intuitive handwriting [38] H. Xiao, X.-F. Feng, Y.-F. Hu, A new segmented time warping distance for
recognition, in: IGS2005, 2005, pp. 217–221. data mining in time series database, Machine Learning and Cybernetics 2004,
[4] M. Faundez-Zanuy, On-line signature recognition based on VQ-DTW, Pattern vol. 2, 2004, pp. 1277–1281.
Recognition 40 (3) (2007) 981–992. [39] Y. Shou, N. Mamoulis, D.W. Cheung, Fast and exact warping of time series
[5] W. Chang, J. Shin, Modified dynamic time warping for stroke-based on-line using adaptive segmental approximations, Machine Learning 58 (2–3) (2005)
signature verification, in: ICDAR 2007, 2007, pp. 724–728. 231–267.
[6] A. Almog, A. Levi, A.M. Bruckstein, Spatial de-interlacing using dynamic time [40] X.L. Dong, C.K. Gu, Z.O. Wang, A local segmented dynamics time warping
warping, in: ICIP 2005, vol. 2, 2005, pp. 1010–1013. distance measure algorithm for time series data mining, in: International
[7] A. Kale, N. Cuntoor, B. Yegnanarayana, A.N. Rajagopalan, R. Chellappa, Gait- Conference on Machine Learning and Cybernetics 2006, 2006, pp. 1247–1252.
based human identification using appearance matching, in: Optical and Digital [41] D. Lemire, Fast nearest-neighbor retrieval under the dynamic time warping,
Techniques for Information Security, Springer, Berlin, 2004, pp. 271–295. 2008, online: http://code.google.com/p/lbimproved/.
[8] J.C. Brown, A. Hodgins-Davis, P.J.O. Miller, Classification of vocalizations of [42] M. Hadjieleftheriou, Spatial index library, 2008, online: http://research.att.
killer whales using dynamic time warping, Journal of the Acoustical Society of com/∼marioh/spatialindex/.
America 119 (3) (2006) 34–40. [43] N. Saito, Local feature extraction and its applications using a library of bases,
[9] J.-S.R. Jang, H.-R. Lee, A general framework of progressive filtering and its Ph.D. Thesis, Yale University, New Haven, CT, USA, 1994.
application to query by singing/humming, IEEE Transactions on Audio, Speech, [44] D.T. Pham, A.B. Chan, Control chart pattern recognition using a new type of
and Language Processing 16 (2) (2008) 350–358. self-organizing neural network, Proceedings of the Institution of Mechanical
[10] Y. Zhu, D. Shasha, Warping indexes with envelope transforms for query by Engineers, Part I: Journal of Systems and Control Engineering 212 (2) (1998)
humming, in: SIGMOD'03, 2003, pp. 181–192. 115–127.
[11] R. Muscillo, S. Conforto, M. Schmid, P. Caselli, T. D'Alessio, Classification of motor [45] E. Keogh, L. Wei, X. Xi, S.H. Lee, M. Vlachos, LB_Keogh supports exact indexing
activities through derivative dynamic time warping applied on accelerometer of shapes under rotation invariance with arbitrary representations and distance
data, in: EMBS 2007, 2007, pp. 4930–4933. measures, in: VLDB 2006, 2006, pp. 882–893.
2180 D. Lemire / Pattern Recognition 42 (2009) 2169 -- 2180

[46] E. Keogh, Shape matching, online: http://www.cs.ucr.edu/∼eamonn/shape/ [51] C.C. Aggarwal, A. Hinneburg, D.A. Keim, On the surprising behavior
shape.htm, papers and data sets, 2007. of distance metrics in high dimensional spaces, in: ICDT'01, 2001,
[47] G.B. Folland, Real Analysis. Modern Techniques and their Applications, Wiley, pp. 420–434.
New York, 1984. [52] D. François, V. Wertz, M. Verleysen, The concentration of fractional distances,
[48] F. Casacuberta, E. Vidal, H. Rulot, On the metric properties of dynamic time IEEE Transactions on Knowledge and Data Engineering 19 (7) (2007)
warping, IEEE Transactions on Acoustics, Speech, and Signal Processing 35 (11) 873–886.
(1987) 1631–1633. [53] L. Breiman, Classification and Regression Trees, Chapman & Hall, CRC, London,
[49] T. Eiter, H. Mannila, Computing discrete Frechet distance, Technical Report Boca Raton, FL, 1998.
CD-TR 94/64, Christian Doppler Laboratory for Expert Systems, 1994. [54] C.A. Gonzalez, J.J.R. Diez, Time series classification by boosting interval based
[50] M.D. Morse, J.M. Patel, An efficient and accurate method for evaluating time literals, Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial
series similarity, in: Proceedings of the 2007 ACM SIGMOD International 11 (2000) 2–11.
Conference on Management of Data, 2007, pp. 569–580.

About the Author—DANIEL LEMIRE received a B.Sc. and an M.Sc. in Mathematics from the University of Toronto in 1994 and 1995. He received his Ph.D. in Engineering
Mathematics from the Ecole Polytechnique and the Université de Montréal in 1998. He completed a post-doctoral fellowship at the Institut de génie biomédical and worked
as consultant in industry. From 2002 to 2004, he was a research officer at the National Research Council of Canada (NRC). He is now a professor at the Université du Québec
à Montréal (UQAM) where he teaches Computer Science. His research interests include data warehousing, OLAP and time series.

You might also like