0% found this document useful (0 votes)

6 views

gsaca

This master thesis by Uwe Baier presents a new linear-time algorithm for suffix array construction called GSACA, which is non-recursive. Although it does not outperform existing state-of-the-art algorithms, GSACA introduces innovative concepts that could inspire future research. The document includes a comprehensive overview of string processing, suffix arrays, and related work in the field.

Uploaded by

juxeiier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

gsaca

Uploaded by

juxeiier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Ulm University | 89069 Ulm | Germany Faculty of Engineering,

Computer Science and

Psychology
Institute of Theoretical Computer
Science

Linear-time Suffix Sorting

A new approach for suffix array construction
Master Thesis at Ulm University

Author:
Uwe Baier
[email protected]

Reviewers:
Prof. Dr. Enno Ohlebusch
Prof. Dr. Uwe Schöning

Supervisor:
Prof. Dr. Enno Ohlebusch

2015
“Linear-time Suffix Sorting– A new approach for suffix array construction”
Version from November 11, 2015

Acknowledgements:
To my supervisor, Enno Ohlebusch, as well as to my correctors, Matthias Gerber, Annika
Maier and Carolin Baier.

© 2015 Uwe Baier

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/
4.0/.
Abstract
This thesis presents a new approach for linear-time suffix sorting. It introduces a new
sorting principle that can be used to build the first non-recursive linear-time suffix
array construction algorithm named GSACA. Although GSACA cannot hold up with
the performance of state of the art suffix array construction algorithms, the algorithm
introduces a lot of new ideas for suffix array construction, and therefore can be seen as
a starting point for future research topics.

iii
Contents

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Preliminaries 3
2.1 String Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The Suffix Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Basic Suffix Array Construction Techniques . . . . . . . . . . . . . . . . 5

3 Algorithmic Idea 7
3.1 Basic Sorting Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 An Introducing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 The Algorithm 23
4.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Implementation 41

6 Performance Analyses 47

7 Conclusion 51

Bibliography 53

v
Chapter 1
Introduction
The suffix array plays an important role in string processing and data compression.
It lists the lexicographically sorted suffixes of a given text in increasing order. First
described by Manber and Myers in 1990 [20] as an alternative to suffix trees [10], the
suffix array nowadays is used in a wide range of applications. To name a few of the
most popular ones:
• The Burrows–Wheeler Transformation [3] of a text can easily be obtained using
the suffix array. Main applications of the BWT include lossless data compression
in tools like bzip2 [30], as well as full text indexing, a powerful method to prepare
a text for fast pattern localization and a lot more operations [6].
• Another popular lossless data compression method, Lempel–Ziv 77 [32], makes
use of the suffix array for fast construction [18]. It is used in data compression
tools such as gzip [9], and further research showed how to build a compressed text
index based on Lempel–Ziv 77 [17].
• Along with the suffix array, Abouelhoda et al. introduced the Enhanced Suffix
Array [1], one more powerful text index, which is able to remove the need for the
space–intensive use of suffix trees in a lot of common string processing operations.
As one can see, the suffix array is very useful in a lot of applications. Unfortunately,
constructing a suffix array from a given text turns out to be a computational hard task.
Although linear–time algorithms exist, some super–linear algorithms for suffix array
construction work faster in practice.
According to a survey paper of Puglisi et al. [28], suffix array construction algorithms
(SACAs) should fulfill all of the following requirements:
• Minimal asymptotic runtime: linear–time complexity
• Fast runtime in practice, tested on real world data
• Minimal space requirements, space usage for the suffix array and the text itself in
an optimal way
Although the paper was published in 2007, no SACA is able to meet all of those re-
quirements so far, so there’s still a need for an ’optimal’ SACA.
As research went on, there was an increasing interest in parallel suffix array construc-
tion, as well as suffix array construction using external memory. Surprisingly, both of
these areas can be combined, and perform quite better than approaches dealing only
with a single case [19]. One key to this result was the use of a fast sequential SACA, so
conventional SACAs could improve this technique further on.

1
Chapter 1 Introduction

1.1 Overview
As we’ve seen already, an ’optimal’ SACA would help a lot of areas in string process-
ing and data compression. My contribution to this theme will be a new linear–time
SACA, which—as first linear–time SACA—computes the suffix array without the use
of recursion.
This thesis will be organized as follows: Chapter 2 takes care of basic fundamentals of
string processing and suffix array construction. In Chapter 3, the basic idea of the new
algorithm will be explained, before Chapter 4 shows the algorithm along with proofs
for correctness and linear runtime. Chapter 5 discusses implementation details and
further optimisations, followed by performance analyses in Chapter 6. Chapter 7 finally
summarizes all results and gives a lookout on future research topics.

1.2 Related Work

As already noted, in 1990 Manber and Myers presented the suffix array along with
algorithms for its construction [20]. The algorithms needed super–linear time, but
were a space–saving alternative to already existing linear–time algorithms for suffix tree
construction [31, 21]. Since suffix trees require a lot of space, we will ignore algorithms
for their construction and focus on linear–time SACAs instead.1
Later on, in 2003, four new linear–time SACAs were contemporary introduced by
Kim et al. [14], Kärkkäinen and Sanders [12], Ko and Aluru [16] and Hon et al. [11].
In 2005, Joong Chae Na introduced one more linear–time SACA [25]. Amongst all of
these algorithms two stood out: the well–known and elegant Distance Cover or Skew
Algorithm by Kärkkäinen and Sanders [12, 13], and the KA Algorithm by Ko and Aluru
[16] which was able to achieve good results in several benchmarks [24].
The next big step in suffix sorting history was made by Nong et al. in 2009. They
invented two new algorithms using the induced sorting principle [27]. One of those
algorithms, called SA-IS [26], was able to outperform most of the super–linear algo-
rithms that were known to ’work good in practice’ [24], while guaranteeing asymptotic
linear runtime and almost optimal space requirements. As a consequence, SA-IS is
the currently fastest known linear–time SACA that is able to fulfill almost all of the
requirements noted by Puglisi et al. [28], and stays the ’candidate to beat’.

1 SACAs assuming a constant–size alphabet for achieving linear–time are ignored here, too.

2
Chapter 2
Preliminaries
In this chapter, we will discuss basic definitions in string processing and some funda-
mentals of suffix array construction. First of all, let’s begin with the term ’string’ and
its components.

2.1 String Definitions

A string typically consists of characters of a certain alphabet that can be compared to
each other.

Definition 2.1.1. An alphabet Σ is a finite set of totally ordered elements (characters).

The size of Σ is denoted by σ = |Σ|.

Let’s say we got an alphabet Σ = {a, b, c}. Normally, one would order the elements
of this alphabet in a manner like a < b < c, but an other order of those elements could
be possible too. However, to keep it simple, most of the time we will use the lowercase
basic modern Latin alphabet, and, if not mentioned otherwise, use an ’intuitive’ order
over this alphabet (a < b < · · · < z). Now let’s have a look at the definition of the term
string.

Definition 2.1.2. A string S is a finite sequence of characters over an alphabet Σ. The

length of a string S, denoted by |S|, is the number of characters in the sequence. The
empty string with length 0 is denoted by ε.

To give an example, mississippi is a string of length 11 over the alphabet Σ =

{i, m, p, s}. Next, the terms substring and suffix shall be defined.

Definition 2.1.3. Let S be a string of length n, and let i and j be integers with
1 ≤ i, j ≤ n. S[i] denotes the i-th character of the string S. S[i..j] denotes the substring
of S starting at the i-th and ending at the j-th position. We write S[i..j + 1) analogous
to S[i..j], and state S[i..j] = ε for i > j. Furthermore, Si denotes the i-th suffix S[i..n].

To give some examples for this definitions, consider the example string S = mississippi.

• S[1] = m

• S[2..1] = ε

• S[2..5] = S[2..6) = issi

• S[1..11] = S1 = mississippi

3
Chapter 2 Preliminaries

• S[5..11] = S5 = issippi
To conclude our basic definitions, we introduce a lexicographic order, a way to compare
and order strings.
Definition 2.1.4. Let Σ be an alphabet, S be a string of length n over alphabet Σ
and T be a string of length m over alphabet Σ. We write S <lex T and say that S is
lexicographically smaller than T , if one of the following conditions holds:
(i) There exists an i (1 ≤ i ≤ min{n, m}) with S[i] < T [i] and S[1..i) = T [1..i).
(ii) S is a proper prefix of T , i.e. n < m and S[1..n] = T [1..n].
Also, we define S =lex T if S = T , and write S >lex T analogous to T <lex S, S ≤lex T
if S <lex T or S =lex T , and S ≥lex T analogous to T ≤lex S.
By now, the tools for introducing the suffix array are complete, but let’s have a
look at some examples before moving on. We again consider our example string S =
mississippi (the ’deciding character’ from condition (i) of Definition 2.1.4 is high-
lighted):
• S2 = ississippi <lex mississippi = S1
<

• S5 = issippi <lex ississippi = S2

• S[2..5] = issi <lex ississippi = S2

2.2 The Suffix Array

The suffix array of a string S of length n is an array containing starting positions from
the suffixes S1 , · · · , Sn sorted by their lexicographic order. In other words, the suffix
array lists all suffixes of a string S in lexicographic order.
In order to make things easier, we introduce a special sentinel character $, which is
appended to the end of a string.
Definition 2.2.1. Let Σ be an alphabet containing a sentinel character $ with $ < c
for all c ∈ Σ \ {$}. Let S be a string of length n, in which the sentinel character $
appears exactly once at the last position (we call such a string a nullterminated string).
The suffix array SA of S is a permutation of integers in range [1 . . . n] satisfying
SSA[1] <lex SSA[2] <lex · · · <lex SSA[n] . The inverse suffix array ISA is the inverse per-
mutation of SA.
Figure 2.1 shows an example of a suffix array. Clearly, the sentinel character is
not needed to define the suffix array. As one can guess, when removing the sentinel
character, the order of suffixes stays identical.
The advantage of the sentinel character is that we only need condition (i) of Definition
2.1.4 to compare two suffixes: Since the sentinel character is occurring only once at the
end of the string, it has a different position in each suffix, i.e. if S[i+k] = $, S[j +k] 6= $
for any j 6= i. This has the consequence that no suffix Si can be a proper prefix of another
suffix Sj , so the first condition of Definition 2.1.4 is sufficient to compare two suffixes.

4
2.3 Basic Suffix Array Construction Techniques

i SA[i] ISA[i] SSA[i]

1 12 6 $
2 11 5 i$
3 8 12 ippi$
4 5 10 issippi$
5 2 4 ississippi$
6 1 11 mississippi$
7 10 9 pi$
8 9 3 ppi$
9 7 8 sippi$
10 4 7 sissippi$
11 6 2 ssippi$
12 3 1 ssissippi$

Figure 2.1: Suffix array and inverse suffix array of the string S = mississippi$.

2.3 Basic Suffix Array Construction Techniques

After introducing the suffix array, a short survey about basic suffix array construction
techniques should be given here. A more detailed survey can be found in the paper of
Puglisi et al. [28].
First, let’s clarify why suffix array construction is computationally hard. A simple
approach for suffix array construction could be to store all start positions of suffixes in
an array, and then using quicksort to sort them. Using this approach, each comparison
of two suffixes requires O(n) time, so the overall time expands to O(n2 log n), quite a
bad asymptotic runtime.
Better approaches use the relation between suffixes of the same string, and are per-
formed with three main techniques:
1. Prefix Doubling. Initially, suffixes get sorted by their first character, having a
’sorted context’ of one character. Then, by using the current lexicographic rank
of the suffix immediately following the own context, a more fine grained order of
suffixes can be obtained, implying context doubling / prefix doubling. Thus, after
repeating this step log n times, all suffixes are ordered. Each step requires O(n)
work, so the total time is O(n log n).
2. Recursive. Suffixes get typed by some criteria in a first step. Then, a new sequence
is built and recursively gets sorted, so it introduces the order of suffixes of a certain
type. The order of those special typed suffixes then can be used to induce the
order of all suffixes. Typically, the size of the newly created sequence is smaller
than 23 n, and each step except recursion can be done in O(n) time, thus implying
a total time of O(n).
3. Induced Copying. The idea of typing suffixes is the same as in recursive algorithms,
but instead of using recursion, efficient string sorting techniques are used to obtain
the order of special typed suffixes. Then, as in recursive algorithms, this order is
used to introduce the order of all suffixes. Despite super–linear worst case time
those algorithms perform well in practice.

5
Chapter 2 Preliminaries

Today, all state of the art linear–time SACAs make use of recursion, achieving quite
nice results in practice. It would be interesting nonetheless if a non–recursive algorithm
with the capability of linear–time can be designed—at least from a theoretical point of
view. This issue will be addressed by the next chapter, along with a new technique for
suffix sorting.

6
Chapter 3
Algorithmic Idea
After introducing the suffix array and discussing some basic construction techniques,
next goal will be to present a new sorting principle along with an algorithm to construct
a suffix array in linear time. To introduce the sorting principle, a definition for the next
lexicographically smaller suffix is needed first.

Definition 3.0.1. Let S be a nullterminated string of length n, and let i be an integer

in range [1, n). Then, by bi we denote the position of the next lexicographically smaller
suffix of Si , i.e. bi := min{ j ∈ [i + 1 . . . n] | Sj <lex Si }. Also, we define n
b := n + 1 for
the last suffix of S.1

An example of suffixes and their next lexicographically smaller suffixes can be found
in Figure 3.1.

i SA[i] [
SA[i] SSA[i] S[SA[i]..SA[i]
[)

1 14 15 $ $
2 3 14 aindraining$ aindraining
3 8 14 aining$ aining
4 6 8 draining$ dr
5 13 14 g$ g
6 1 3 graindraining$ gr
7 4 6 indraining$ in
8 11 13 ing$ in
9 9 11 ining$ in
10 5 6 ndraining$ n
11 12 13 ng$ n
12 10 11 ning$ n
13 2 3 raindraining$ r
14 7 8 raining$ r

Figure 3.1: Suffix array and next lexicographically smaller suffixes of

S = graindraining$.

In addition to the definition of next lexicographically smaller suffixes, suffix groups

will be introduced next. Roughly spoken, a suffix group is a set of suffixes sharing some
prefix, but the definition will explain this more in detail.
1 Onecan think of this as follows: if we define an imaginary and empty last suffix Sn+1 := ε, then
Sn+1 is a proper prefix of Sn , so Sn+1 is the next smaller suffix of Sn .

7
Chapter 3 Algorithmic Idea

Definition 3.0.2. Let S be a nullterminated string of length n, and let u be any

substring of S. A suffix group G with group prefix u is defined as a set G ⊆ [1 . . . n] such
that u is a prefix of all suffixes in G, i.e. i ∈ G ⇒ u is a prefix of Si .
Furthermore, let G1 , . . . , Gm be a partition of [1 . . . n] into suffix groups, such that
the group prefixes of G1 , . . . , Gm differ. For any i, j ∈ [1 . . . m], we write Gi < Gj if the
group prefix of Gi is lexicographically smaller than the group prefix of Gj . Within such
a partition, group(i) identifies the suffix group to which the index i belongs.
An example for suffix groups can be found in Figure 3.2. When comparing suffix
groups as in Definition 3.0.2, we will use the terms lower ordered (or just lower) respec-
tively higher ordered (or just higher) instead of smaller or larger: because suffix groups
are sets, the terms smaller or larger usually refer to set sizes, not to lexicographic com-
parison between group prefixes. Now, within this definitions, it is possible to give a first
idea of the algorithm, as we will see next.

3.1 Basic Sorting Principle

In a first phase, suffixes will be sorted as if each suffix Si consists only of the string
S[i..bi). If S[i..bi) = S[j..b
j) holds for two suffixes Si and Sj , they are placed in the same
group, otherwise in different groups, so S[i..bi) denotes the group prefix of i’s group. The
groups will be ordered according to their group prefixes, as Definition 3.0.2 states. An
example for such a suffix grouping can be found in Figure 3.2.

group suffixes {14} {3} {8} {6} {13} {1} {11, 9, 4} {12, 10, 5} {7, 2}
group prefix
$

r
aindraining
aining

Figure 3.2: Suffix grouping of S = graindraining$ after Phase 1 of Algorithm 1.

Groups are positioned from left lowest ordered to right highest ordered.

Then, in a second phase, this information will be used to sort suffixes finally. Think
about a suffix Si with S[ bi ] = $. Because of the sorting in the first phase it is clear that all
suffixes in lower ordered groups are lexicographically smaller. On the other hand, since
$ is the lexicographically smallest suffix, it is clear that Si must be the lexicographically
smallest suffix in its current group. Now, define sr to be the number of suffixes placed in
lower groups than group(i), sr := |{ j ∈ [1 . . . n] | group(j) < group(i) }|, then Si is the
lexicographically sr + 1-th smallest suffix, and we can set SA[sr + 1] ← i. Additionally,
if we remove Si from its group and put it in a new group placed immediately before i’s
old group, the group order of Definition 3.0.2 stays consistent, and the same procedure
can be repeated for the next minimal element of Si ’s old group.
Using this idea, the second phase will proceed like the following: first, set SA[1] ← n
because of the definition of the sentinel character. Then, iterate the suffix array from 1
to n in increasing order. Within the i-th iteration, compute all suffixes Sj with b j = SA[i],
and execute the procedure described above for all of them. Some exemplar iterations
can be found in Figure 3.3.

8
3.1 Basic Sorting Principle

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
SA[i] 14 − − − − − − − − − − − − −
group suffix { 14 3 8 6 13 1 11 9 4 12 10 5 7 2
15 }{ 14 }{ 14 }{ 8 }{ 14 }{ 3 }{ 13 , 11 , 6 }{ 13 , 11 , 6 }{ 8 , 3 }
\
suffix

SA[i] 14 3 8 − 13 − − − − − − − − −
group suffix { 14 3 8 6 13 1 11 9 4 12 10 5 7 2
15 }{ 14 }{ 14 }{ 8 }{ 14 }{ 3 }{ 13 , 11 , 6 }{ 13 , 11 , 6 }{ 8 , 3 }
\
suffix

SA[i] 14 3 8 − 13 1 − − − − − − 2 −
group suffix { 14 3 8 6 13 1 11 9 4 12 10 5 2 7
15 }{ 14 }{ 14 }{ 8 }{ 14 }{ 3 }{ 13 , 11 , 6 }{ 13 , 11 , 6 }{ 3 }{ 8 }
\
suffix

SA[i] 14 3 8 6 13 1 − − − − − − 2 7
group suffix { 14 3 8 6 13 1 11 9 4 12 10 5 2 7
15 }{ 14 }{ 14 }{ 8 }{ 14 }{ 3 }{ 13 , 11 , 6 }{ 13 , 11 , 6 }{ 3 }{ 8 }
\
suffix

SA[i] 14 3 8 6 13 1 4 − − 5 − − 2 7
group suffix { 14 3 8 6 13 1 4 11 9 5 12 10 2 7
15 }{ 14 }{ 14 }{ 8 }{ 14 }{ 3 }{ 6 }{ 13 , 11 }{ 6 }{ 13 , 11 }{ 3 }{ 8 }
\
suffix
..
.
SA[i] 14 3 8 6 13 1 4 11 9 5 12 10 2 7

Figure 3.3: Phase 2 steps of Algorithm 1, applied to the string S = graindraining$.

Suffixes of groups are listed along with their next lexicographically smaller
suffixes, displayed as fractions.

Now finally we can give a first principle how suffixes can be sorted:

Algorithm 1 Suffix array construction for a given nullterminated string S of length n.

Phase 1: sort suffixes into groups
1: order all suffixes of S into groups: Let Si and Sj be two suffixes.
Then, group(i) = group(j) if and only if S[i..bi) = S[j..b j).
2: order the suffix groups by their group prefixes: Let G1 and G2 be two groups,
i ∈ G1 , j ∈ G2 . Then, G1 < G2 if and only if S[i..bi) <lex S[j..b
j).

Phase 2: construct suffix array from groups

3: SA[1] ← n
4: for i = 1 up to n do
5: for all suffixes Sj with b j = SA[i] do
6: let sr be the number of suffixes placed in lower groups,
i.e. sr := |{ s ∈ [1 . . . n] | group(s) < group(j) }|.
7: SA[sr + 1] ← j
8: remove j from its current group and put it in a new group
placed as immediate predecessor of j’s old group.
9: end for
10: end for

9
Chapter 3 Algorithmic Idea

Next, some more details of Algorithm 1 shall be explained. In line 2, a group G1 is

mentioned to be lower ordered to a group G2 if the group prefix of G1 is lexicographically
smaller than that of G2 . If the group prefixes of both groups differ in at least one
character, then certainly all suffixes of G1 are lexicographically smaller than that of
G2 . Now, as Definition 2.1.4 allows, consider the case when a group prefix is a proper
prefix of another’s group prefix. Within this case, it is not clear whether suffixes of G1
are lexicographically smaller than those of G2 . To ensure correctness in this cases, a
theorem will be expressed now.

Theorem 3.1.1. Let S be a nullterminated string of length n, and let i and j be two
integers in range [1 . . . n]. If S[i..bi) is a proper prefix of S[j..b
j), then it follows that
Si <lex Sj .

Proof. Let k := bi − i. Because S[i..bi) is a proper prefix of S[j..b j), it is clear that
k < j − j and S[i..i + k) = S[j..j + k). By Definition 3.0.1 of next lexicographically
b
smaller suffixes, it is clear that Si >lex Si+k and Sj <lex Sj+k . Now, let u be a number
such that S[i + u] > S[i + k + u] and S[i..i + u) = S[i + k..i + k + u), and let v be a
number such that S[j + v] < S[j + k + v] and S[j..j + v) = S[j + k..j + k + v). We
assume that u < v; the cases u = v and u > v can be handled analogously. Since
S[i..i + u) = S[i + k..i + k + u), the suffix Si starts with repetitions of S[i..i + k) until
the u + k-th character, i.e.

Si = S[i + (0 mod k)]S[i + (1 mod k)] · · · S[i + (u + k − 1 mod k)] · · ·

Also, since S[j..j + v) = S[j + k..j + k + v), the suffix Sj starts with repetitions of
S[j..j + k) until the v + k-th character. Because u < v, the suffix Sj has the form

Sj = S[j + (0 mod k)]S[j + (1 mod k)] · · · S[j + (u + k mod k)] · · ·

Because Si and Sj share the same prefix among the first k characters, S[i..i + k + u) =
S[j..j + k + u) must hold. Now finally, we got S[j + k + u] = S[j + u] = S[i + u] >
S[bi + u] = S[i + k + u], so Si <lex Sj must hold.

Another question is if Algorithm 1 fills the suffix array entirely. More detailed, it has
to be shown that before the i-th iteration of Phase 2 the lexicographically i-th suffix is
placed into the suffix array. It is clear that the first suffix (SA[1]) is correctly placed into
the suffix array in line 3 because of the definition of the sentinel character. Now, let Sj
be the lexicographically i-th smallest suffix. Since Sbj <lex Sj holds, by induction, the
suffix Sbj must already have been handled in one of the i − 1 previous iterations. Within
this iteration, Sj is put in place into the suffix array in lines 5 to 9, and therefore Sj is
available before the i-th iteration.
So far, we’ve seen the basic sorting principle along with some thoughts for correctness,
but there are still a lot of issues remaining: How can Phase 1 be implemented? What
asymptotic time and space is required? How will groups be organized? But instead of
answering those questions and presenting a more precise algorithm directly, the next
section shows a larger running example of the final algorithm first, to get better insight
onto the performed steps and to bridge the gap between the basic sorting principle and
the final technical result.

10
3.2 An Introducing Example

3.2 An Introducing Example

In this section, a running example of the final algorithm is shown. Although the al-
gorithm is not known yet, the example illustrates the relationship between the sorting
principle and the final algorithm. Feel free to skip this section, the algorithm is listed
in the next chapter; I, however, recommend to read the example, since it will give an
idea about what’s happening behind the scenes.
To stay consistent, the running example will construct the suffix array for the string
S = graindraining$. Remember that the sorting principle in Algorithm 1 consists of
two phases: In a first phase, suffixes are sorted into groups by comparing prefixes that
reach until the next lexicographic smaller suffix. Then, in a second phase, this group
information is used to fill the suffix array entirely. Both of these phases will performed
here too, but a lot more in detail.

Phase 1: sort suffixes into groups

First task is to sort suffixes into groups, as explained above. To achieve this goal, a
technique quite similar to prefix doubling (see Section 2.3) will be used. We start with
initial groups that are sorted by the first character of their suffixes, i.e. S[i] = S[j] ⇔
group(i) = group(j) and S[i] < S[j] ⇔ group(i) < group(j), see Figure 3.4.

groups { 14 }{ 3 , 8 }{ 6 }{ 1 , 13 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Figure 3.4: Initial groups of the string S = graindraining$. All suffixes sharing the
same first character are placed in the same group, groups are ordered by
their group prefix character from left to right. Also, links from the group
with prefix i to its suffixes are displayed.

Step 1
We start by processing suffixes of the highest group.

groups { 14 }{ 3 , 8 }{ 6 }{ 1 , 13 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

11
Chapter 3 Algorithmic Idea

Now, for each suffix of this group, we search for the first previous suffix that is placed
in a lower group. More detailed, if Si is a suffix of the current group, we search for
prev(i) := max{ j ∈ [1..i − 1] | group(j) < group(i) }. Additionally, pointers from
suffixes to their first previous suffixes (prev pointers) get stored.

groups { 14 }{ 3 , 8 }{ 6 }{ 1 , 13 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Next, each of those previous suffixes gets rearranged to new groups, placed immedi-
ately after their old groups. One can think of this as follows: Currently, we are working
on suffixes of the highest group. Since their previous suffixes are followed by them,
those previous suffixes are lexicographically larger than suffixes of the same group not
followed by the processed suffixes. On the other hand, since the next higher group
of such previous suffixes contains suffixes that are lexicographically larger, we have to
place them in new groups immediately after their old groups.

groups { 14 }{ 3 , 8 }{ 6 }{ 1 , 13 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Applied to our example, we get the following rearrangements: The suffix S1 is placed
in a new group, between the groups {13} and {4, 9, 11}. The suffix S6 also is placed in
a new group, but since the old group of S6 is empty after removing S6 , the new group
has the same position as the old one.
groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 2
In the next step, we process the next lower group.
groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

12
3.2 An Introducing Example

As in Step 1, for each suffix of the processed group we search for the first previous suffix
in a lower group.

groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Again, those previous suffixes get rearranged in new groups, placed immediately after
their old groups. In our example, all previous suffixes were placed in the same group
before the rearrangement, so they’ll be placed in the same group after the rearrangement.
Additionally, since their old group becomes empty after the rearrangement, the new
group is identical to the old one.

groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 3
Again, we work on suffixes of the next lower group, and search for their first previous
suffixes in lower groups. In our example, this is the first time that a previous suffix is
not the immediate neighbor of a processed suffix, see S11 and its previous suffix S8 .2

groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

A new situation appears: The suffixes S3 and S8 are placed in the same group, but S3
is followed by one suffix of the processed group, while S8 is followed by two suffixes of
the processed group. To handle this case, we can proceed as follows: If only the nearest
suffixes of the processed group are used to rearrange both S3 and S8 , then both will be
placed into the same new group. After performing this step, we use the second suffix to
rearrange S8 , so it gets placed into a new group, and therefore lands in a group higher
than that of S3 . Consequently, S8 must be rearranged to an own group that is placed
higher than that of S3 .
2 Notethat all suffixes on the way from S11 to S8 already carry prev pointers, so they can be used to
speedup the search. This technique also is known as pointer jumping.

13
Chapter 3 Algorithmic Idea

groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 4

In the next step, the group of S1 is handled. Since S1 has no previous suffix in a lower
group, no action is performed.

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 5

The next group to handle is that of suffix S13 . Its previous suffix is S8 3 , but since S8 is
the only suffix in its group, the rearrangement has no effect.

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 6

The next lower group that gets handled now is that of suffix S6 . As in Step 5, the
previous suffix of S6 has no further suffixes in its group, so the rearrangement has no
effect.

3 The pointer jumping technique mentioned in Step 3 now saves time: if we jump from S12 to S11 ,
and further to S8 , only two operations are needed instead of four.

14
3.2 An Introducing Example

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 7
The next step has no effect, since the previous suffix of S8 already is placed in a exclusive
group, as in Step 6.

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Although two more steps will be performed (groups {3} and {14}), they will not be
shown here, because no further prev pointers get computed and so no group rearrange-
ments take place.

Intermediate Result
After completing Phase 1, let’s have a look at the intermediate result in Figure 3.5. As
one can see, the groups are identical to those of the beginning of this chapter, see Figure
3.2 on page 8. The question to be answered now is why those groups are identical.

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Figure 3.5: Groups and prev pointers from the example string S = graindraining$
after Phase 1.

To give a response we first need to characterize the presented algorithmic behaviour.

Clearly, the algorithm behaves in a greedy manner, since groups were processed from
highest to lowest. A hidden aspect of the algorithm is that it also works with dynamic
programming. To explain this point, let’s go back to Step 1. Before the group rearrange-
ments, all suffixes were sorted in groups by their first character. Let’s shortly denote
this character as group context.

15
Chapter 3 Algorithmic Idea

context

r
$
groups { 14 }{ 3 , 8 }{ 6 }{ 1 , 13 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Figure 3.6: Step 1 of Phase 1, before the rearrangements take place.

Now, every time a rearrangement is performed, the group context of the processed
group implicitly gets appended to that of the rearranged suffixes, thus forming new
groups. Additionally, since rearrangements place suffixes between their old and their
next higher groups, the order of the groups stays consistent with the lexicographic order
of their contexts.
dr

gr
context
a

r
$

groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Figure 3.7: Step 1 of Phase 1, after the rearrangements took place. The context r of
the processed group was implicitly appended to the rearranged suffixes.

This also explains the somewhat strange behaviour in Step 3: if a suffix Si of a group
is followed by one group context of the currently processed group, and another suffix
Sj of the same group is followed by two repeated group contexts, the overall context of
Si is a proper prefix of the context of Sj . Thus, the context of Si is lexicographically
smaller than that of Sj , so consequently, group(i) occurs before group(j), see Figures
3.8 and 3.9.
dr

context
a

r
$

groups { 14 }{ 3 , 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Figure 3.8: Step 3 of Phase 1, before the rearrangements.

16
3.2 An Introducing Example

ainin
ain

in
context

r
$
groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Figure 3.9: Step 3 of Phase 1, after the rearrangements. Since the new context of S8 is
lexicographically larger than that of S3 , group(3) < group(8).

This aspect of dynamic programming, or, in string context, prefix doubling, together
with the greedy behaviour of the algorithm, ensures that groups are ordered as required
by the sorting principle.
Now, after having handled the first phase, let’s move on to the second phase to see
how the suffix array can be constructed from the current information.

Phase 2: construct suffix array from groups

After having sorted suffixes into groups, it is time to construct the suffix array. There-
fore, first recall Phase 2 of the basic sorting principle, see Algorithm Excerpt 1.

Algorithm Excerpt 1 Phase 2 of the basic sorting principle in Algorithm 1, page 9.

The main problem in this phase is to find, for a given suffix Si , all suffixes whose next
lexicographically smaller suffix equals Si . More detailed, for a given suffix Si , the set
j = i } has to be computed, see line 5 of Algorithm Excerpt 1. As we
{ j ∈ [1 . . . n] | b
shall see, the prev pointers computed in Phase 1 will handle this task for us, but let’s
see this step by step.

Step 1
We start with the algorithm after line 3, because the first suffix array entry trivially is
correct. So, our current suffix is S14 .

17
Chapter 3 Algorithmic Idea

SA[i] 14 − − − − − − − − − − − − −

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

The first suffix we are going to visit is the preceding suffix of S14 , S13 . Trivially, the
next lexicographically smaller suffix of S13 is S14 . As described in the algorithm on lines
6 to 8, we remove S13 from its group and put it in a new group placed as immediate
predecessor of its old group, since it is followed by the lexicographically smallest suffix
and therefore is the minimal element of its group. Also, S13 is placed in the suffix array.

SA[i] 14 − − − 13 − − − − − − − − −

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Next, we are going to follow the prev pointer from S13 to S8 , repeating the same steps
as above. Recall that the prev pointer by definition points to the first previous suffix
placed in a lower group. Consequently, each suffix Sj between S8 and S13 is placed in a
group equal to or higher than that of S13 . This fact and Theorem 3.1.1 of page 10 imply
that for each such suffix Sj >lex S13 holds (a proof will follow later). Thus, b j ≤ 13, so
those suffixes must be handled in a later step.
On the other hand, since group(8) < group(13), the next lexicographically smaller
suffix of S8 must be behind that of S13 , i.e. b
8≥c 13, so clearly, Sb8 = S14 . Thus, placing
S8 into the suffix array is correct.

SA[i] 14 − 8 − 13 − − − − − − − − −

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

18
3.2 An Introducing Example

For the same reason as mentioned above, we again follow the prev pointer from S8
to S3 , executing lines 6 to 8 of the algorithm. Since S3 has no further prev pointer, the
step now is complete.

SA[i] 14 3 8 − 13 − − − − − − − − −

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 2
In the next step, we process the lexicographically 2-th smallest suffix S3 . Its preceding
suffix is S2 , so we repeat the behaviour from above for S2 . To clarify the correctness of
this behaviour: Assume Sb2 6= S3 , so b2 > 3. By using the definition of next lexicograph-
ically smaller suffixes, S3 >lex S2 must hold. As Phase 2 processes suffixes in increasing
order, S2 should have been placed in the suffix array already, but as one can verify, S2
is not placed in the suffix array after step 1, so Sb2 = S3 .

SA[i] 14 3 8 − 13 − − − − − − − 2 −

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 , 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Next, we follow the prev pointer from S2 to S1 , and place it into the suffix array. As
S1 has no further prev pointer to follow, the step is complete.

19
Chapter 3 Algorithmic Idea

SA[i] 14 3 8 − 13 1 − − − − − − 2 −

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Step 3

In this step, the suffix S8 is processed. As before, we execute lines 6 to 8 of Algorithm

Excerpt 1 with its preceding suffix S7 .

SA[i] 14 3 8 − 13 1 − − − − − − 2 7

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Next, we follow the prev pointer from S7 to S6 , repeating the procedure.

SA[i] 14 3 8 6 13 1 − − − − − − 2 7

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Now, when following the next prev pointer, we reach suffix S3 , which is already
contained in the suffix array. Thus, no action needs to be performed for S3 . Additionally,
since S3 already is placed in the suffix array, any possible prev pointer from S3 to a
previous suffix was handled already, so we do not need to proceed.

20
3.2 An Introducing Example

SA[i] 14 3 8 6 13 1 − − − − − − 2 7

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

Another reason why no further suffix must be processed can be thought of as follows:
suppose there exists another suffix which needs to be processed, i.e. there exists an
i < 3 with bi = 8. Then, by definition of next lexicographically smaller suffixes, S3 >lex
Si >lex Sbi must hold. Since bi = 8, b3 ≤ 8. If b3 < 8, using the definition of the next
lexicographically smaller suffix applied to S[i..bi), Sb3 >lex S8 must hold. If b 3 = 8,
Sb3 =lex S8 , so by combining both cases, Sb3 ≥lex S8 must hold. This means that the
suffix S3 can not have been placed into the suffix array before the current step. This
leads us to a contradiction, and clearly shows that no such i can exist.

Step 4
As next step, the suffix S6 gets processed. As before, we start by handling its preceding
suffix S5 .

SA[i] 14 3 8 6 13 1 − − − 5 − − 2 7

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 , 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

We follow the prev pointer to S4 and repeat the procedure.

SA[i] 14 3 8 6 13 1 4 − − 5 − − 2 7

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 , 9 , 11 }{ 5 }{ 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

21
Chapter 3 Algorithmic Idea

Now, the next prev pointer links from S4 to S3 , but S3 is already placed in the suffix
array, so we can stop here.

SA[i] 14 3 8 6 13 1 4 − − 5 − − 2 7

groups { 14 }{ 3 }{ 8 }{ 6 }{ 13 }{ 1 }{ 4 }{ 9 , 11 }{ 5 }{ 10 , 12 }{ 2 }{ 7 }

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

By now, I would like to come to an end with the running example. The steps 5 to 14
continue quite similarly: start at the processed suffix, and jump to its preceding suffix.
Then, if the suffix is not contained in the suffix array, execute lines 6 to 8 of Algorithm
Excerpt 1 with the suffix, follow the prev pointer if one exists and repeat the procedure.
The reader might perform the steps himself/herself as a little exercise.
What we’ve seen so far is a sorting principle along with an running example illustrating
an efficient implementation of the principle. However, an illustration cannot replace a
concrete algorithm. The next chapter will finally show such an algorithm, along with a
correctness proof and an analysis of the asymptotic runtime. Note that the main points
were explained within this section already, so the following algorithm will not cover too
much new details.

22
Chapter 4
The Algorithm
Without any more preliminaries, Algorithm 2 shows how the suffix array can be con-
structed using the new sorting principle (due to space limitations, the algorithm is split
on two pages).
As described in Chapter 3, the algorithm consists of two phases. Note that the groups
used in the algorithm are not the same as in the basic sorting principle (Algorithm 1 on
page 9). Within this algorithm, the groups are build incrementally, in the same manner
as described in the introducing example from the previous chapter. Nonetheless, the
group definition from page 8 can be applied, so we’re still able to compare groups by their
representative group prefixes. The only difference is that group prefixes consist of an
implicit context, as mentioned in the intermediate result after Phase 1 of the introducing
example, Section 3.2. A formal definition for implicit contexts will be presented later,
first let’s have a look at the algorithm.

Algorithm 2 Suffix array construction of a given nullterminated string S of length n.

Phase 1: sort suffixes into groups
1: order all suffixes of S into groups according to their first character:
Let Si and Sj be two suffixes. Then, group(i) = group(j) ⇔ S[i] = S[j].
2: order the suffix groups: Let G1 be a suffix group with group prefix character u,
G2 be a suffix group with group prefix character v. Then, G1 < G2 if u < v.
3: for each group G in descending group order do
4: for each i ∈ G do
5: prev(i) ← max({ j ∈ [1 . . . i] | group(j) < group(i) } ∪ {0})
6: end for
7: let P be the set of previous suffixes from G,
P := { j ∈ [1 . . . n] | prev(i) = j for any i ∈ G }.
8: split P into k subsets P1 , . . . , Pk such that a subset Pl contains
suffixes whose number of prev pointers from G pointing to them
is equal to l, i.e. i ∈ Pl ⇔ |{ j ∈ G | prev(j) = i }| = l.
9: for l = k down to 1 do
10: split Pl into m subsets Pl1 , . . . , Plm such that suffixes
of same group are gathered in the same subset.
11: for q = 1 up to m do
12: remove suffixes of Plq from their group and put them into a new
group placed as immediate successor of their old group.
13: end for
14: end for
15: end for

23
Chapter 4 The Algorithm

Phase 2: construct suffix array from groups

16: SA[1] ← n
17: for i = 1 up to n do
18: j ← SA[i] − 1
19: while j 6= 0 do
20: let sr be the number of suffixes placed in lower groups,
i.e. sr := |{ s ∈ [1 . . . n] | group(s) < group(j) }|.
21: if SA[sr + 1] 6= nil then
22: break
23: end if
24: SA[sr + 1] ← j
25: remove j from its current group and put it in a new group
placed as immediate predecessor of j’s old group.
26: j ← prev(j)
27: end while
28: end for

4.1 Correctness
We first want to be sure that Algorithm 2 works correctly. A first step towards this
goal will be to show that the suffix groups computed by Phase 1 partition suffixes in
the same manner as in the first phase of the basic sorting algorithm, page 9.
The strategy will be like the following: first, we define prev pointer chains. This
definition then can be used to define the implicit context of any suffix, as discussed at
the beginning of this chapter. Afterwards, a lemma about context extensions will be
expressed, showing that context extensions of the algorithm form an ’avalanche effect’
that sorts suffixes as desired. Finally, this lemma can be used to proof the correct group
structure after Phase 1.

Definition 4.1.1. Let S be a nullterminated string of length n. For any i ∈ [1 . . . n],

during a certain point in time in the execution of Algorithm 2 applied to S, π k (i) denotes
the k-th element of the prev pointer chain starting at index i:

i
 , if k = 0
π k (i) := prev(π k−1 (i)) , if k > 0, π k−1 (i) 6= 0 and prev(π k−1 (i)) 6= nil
, otherwise

0


Furthermore, Π(i) denotes the prev pointer chain set of i, Π(i) := { π k (i) | k ∈ N0 }.

Definition 4.1.2. Let S be a nullterminated string of length n. For any i ∈ [1 . . . n),

during a certain point in time in the execution of Algorithm 2 applied to S, ic denotes
the end of the implicit context of i, ic := min{ j ∈ [i + 1 . . . n] | i ∈ Π(j − 1) }. Also, we
define nc := n + 1, and call S[i..ic ) the implicit context of a suffix Si .

Simpler spoken, the implicit context of a suffix Si is a prefix of Si that extends to the
rightmost position that contains i in its prev pointer chain. Note that this definition

24
4.1 Correctness

exactly matches the more intuitive imagination of contexts from the introducing example
on page 15—except that no formal definition was presented at that point.
These definitions, however, are enough to express an invariant describing the avalanche
effect when processing groups in Phase 1 of Algorithm 2.

Lemma 4.1.1. Let S be a nullterminated string of length n. When applying Algorithm

2 to S, the following propositions hold before every processing of a group G in Phase 1,
line 3:

(i) group(i) = group(j) ⇔ S[i..ic ) =lex S[j..jc ) ∀i, j ∈ [1 . . . n]

(ii) group(i) < group(j) ⇔ S[i..ic ) <lex S[j..jc ) ∀i, j ∈ [1 . . . n]

(iii) For all unprocessed suffixes Si ( i ∈ [1 . . . n) and group(i) ≤ G )

group(ic ) ≤ G and G < group(j) ∀i < j < ic

(iv) For all processed suffixes Si ( i ∈ [1 . . . n) and group(i) > G )

group(ic ) ≤ group(i) and group(i) < group(j) ∀i < j < ic

Proof. The proof is done by induction on the processed groups (lines 3 to 15).

Induction Base: Within lines 1 and 2 of the algorithm, suffixes are sorted and grouped
by their first character. Since no prev pointers exist before the first iteration, the
contexts of all suffixes consist only of their first characters, so propositions (i) and (ii)
are satisfied. Because the first group G to be processed is the highest group, and every
context has length 1, group(ic ) ≤ G holds for all i ∈ [1, n), so (iii) and (iv) are also
fulfilled.

Induction Step: Let G be the currently processed group, and let Ge be the group to
be processed next. We will show that propositions (i) to (iv) are satisfied when the
processing of G has been finished.
Consider the point in time when the group G is processed. First, the algorithm
computes prev pointers and the set P within lines 4 to 7. The first observation to be
pointed out is the following:

p ∈ P ⇔ pc ∈ G and group(p) < G (4.1)

If group(p) ≥ G, the prev pointers computed in line 5 would ignore p, thus p 6∈

P. Next, assume that group(p) < G and pc 6∈ G. Since group(p) < G, p must be
an unprocessed suffix. In this case, as proposition (iii) states, group(j) > G for all
p < j < pc , so any index i ∈ G cannot be placed between p and pc . Additionally,
combining proposition (iii) and precondition pc 6∈ G, group(pc ) < G must hold, so any
index i ∈ G must be placed before p or after pc . In the first case (i < p), the prev
pointer computation of i will ignore p, thus prev(i) 6= p. In the second case (pc < i),
the prev pointer computation of i stops at index pc the latest because of group(pc ) < G,
so prev(i) 6= p. So clearly, p 6∈ P if group(p) < G and pc 6∈ G.
For the backward direction, let p be an index with group(p) < G and pc ∈ G. By using
proposition (iii) for p before the processing of group G, it is clear that group(j) > G for
all j with p < j < pc . Thus, line 5 computes a prev pointer to p, so clearly, p ∈ P.

25
Chapter 4 The Algorithm

Now, consider the point in time after the prev pointer computation. Since the algo-
rithm has computed new prev pointers, the contexts of all suffixes in P are extended,
see Definition 4.1.2. For any p ∈ P, let i be the rightmost index such that prev(i) = p
and i ∈ G. Because i is the rightmost index, using proposition (iii), ic cannot be placed
in group G, otherwise a prev pointer from ic to p would exist, so i wouldn’t have been
the rightmost index. As a consequence, group(ic ) < G holds. Also, by definition of
prev pointers, group(p) < G must hold. By the manner in which rearrangements are
performed (lines 9 to 14), this statement holds even after the rearrangements. Sum-
ming everything up, after the processing of group G, for every p ∈ P, group(p) < G,
group(pc ) < G, and, because of proposition (iii) applied to all appended contexts of p,
group(j) ≥ G for all p < j < pc . Recall that Phase 1 processes groups in descending
group order. Since Ge is the immediate predecessor of G, Ge < G, and thus, proposition
(iii) holds for all p ∈ P after the group G has been processed.
So far, we’ve seen that all unprocessed suffixes Si with ic ∈ G from (4.1) fulfill
proposition (iii). Next, we’ll have a look at unprocessed suffixes Si with group(i) < G
and ic 6∈ G. The contexts of those suffixes are not extended during the iteration. Thus,
after the processing of group G, ic is not changed. Since ic 6∈ G, group(ic ) < G must
hold. Because Ge is the immediate predecessor of G, group(ic ) ≤ Ge and group(i) ≤ Ge
holds. Also, by using proposition (iii) at the time before G was processed, group(j) > Ge
holds for all i < j < ic . Thus, proposition (iii) is satisfied for all of those suffixes, too.
Next, let’s care about already processed suffixes. Because the next processed group Ge
is a predecessor of G, and the algorithm does not change contexts of already processed
suffixes, proposition (iv) is still correct after processing group G. The remaining part
is to show that the current group G fulfills proposition (iv) after it has been processed.
Therefore, let i ∈ G be any currently processed suffix. Since group(i) = G > G, e propo-
sition (iv) has to be shown. When i is processed, by proposition (iii), group(ic ) ≤ G =
group(i) holds. Also, for all j with i < j < ic , group(j) > G = group(i) is satisfied, so
proposition (iv) is fulfilled for i.
Summing up the current results, propositions (iii) and (iv) are correct after processing
of the group G. The next task is to show that suffixes with the same context belong to
the same group, as well as to show that groups are ordered by their group prefixes1 , as
propositions (i) and (ii) state.
More detailed, it has to be shown that the suffixes in a group created in line 12 share
the same context, as well as to ensure the correct placement of the new groups. By our
previous arguments we already know that the algorithm performs context extensions
only for suffixes contained in the set P. As a consequence, the contexts of all other
suffixes remain unchanged, so it’s sufficient to show that the new groups are correctly
set up.
Therefore, let u be the group prefix of group G, p ∈ P be an index and pc be the end
of the context of p before the extension. After the prev pointer computation in lines 4
to 7, the algorithm splits P into subsets P1 , . . . , Pk , line 8. After the split, the following
holds:
p ∈ Pl ⇔ p has new context S[p..pc )ul (4.2)
The reason for this is the following: since l equals the number of prev pointers pointing
1 See Definition 3.0.2 on page 8. In this context, group prefixes are assumed to be the implicit contexts
of the suffixes in the group.

26
4.1 Correctness

from G to p, the context of p is extended by exactly l contexts of G, thus the new context
of p is S[p..pc )ul .
Next, the algorithm processes the sets P1 , . . . , Pk in descending order, splits suffixes
of each subset into smaller subsets such that suffixes belonging to the same group are
gathered together in the same subset, and creates new groups, lines 9 to 14. This has
the consequence that new groups consist of suffixes with the same extended context, so
proposition (i) holds after the iteration.
The last remaining part is to show that the order of the new groups is correct,
proposition (ii). Let p ∈ Pl be an index, pc be the end of the context of p before
the context extensions, and u be the group prefix of G. After the context extensions,
S[p..pc ) <lex S[p..pc )ul , so the new group for p must be placed higher than its old group,
as line 12 of the algorithm performs.
Now, let i be the index of a suffix of the immediate successor of p’s old group. If
i was placed in the same group as p before the iteration, i’s context must have been
extended within this iteration. Then, since the subsets P1 , . . . , Pk are processed in
decreasing order, we know that S[i..ic ) = S[p..pc )ul̃ for some ˜l > l, so the context
of p is lexicographically smaller than that of i, and the group placement is correct.
If i was not placed in the same group as p before the iteration, we know that the
group of i is higher ordered than that of p. Using proposition (ii) before the current
iteration, S[p..pc ) <lex S[i..ic ) holds. Then, if S[p..pc ) is no proper prefix of S[i..ic ),
S[p..pc )ul <lex S[i..ic ) holds, thus the new group of p must be placed lower than the
group of i.
If S[p..pc ) is a proper prefix of S[i..ic ), consider the case when S[p..pc )ul is no proper
prefix of S[i..ic ); otherwise S[p..pc )ul <lex S[i..ic ) holds already. Now, let ˜l be a number
such that S[p..pc )ul̃ is a proper prefix of S[i..ic ), but S[p..pc )ul̃+1 is no proper prefix of
S[i..ic ). Also, define j := i + pc − p + ˜l|u|. Using propositions (iii) and (iv), we know that
G < group(j) holds. Thus, u <lex S[j..jc ) must hold because of proposition (ii). Also,
since u 6= S[j..j + |u|) holds by precondition, u cannot be a proper prefix of S[j..jc ). As
we’ve seen in statement (4.2), contexts are extended by full contexts of other groups,
so jc < ic must hold. Thus, there must exist a k with k < ic − i and k < pc − p + l|u|
such that S[p..p + k) = S[i..i + k) and S[p..p + k] < S[i..i + k]. This clearly shows that
S[p..pc )ul <lex S[i..ic ), so the new group of p must be placed before the group of i.
Summing everything up, the group placements of new groups in line 12 of the algo-
rithm ensure a correct new group order, so proposition (ii) holds after the iteration.
Now, after having proved the correctness of the avalanche effect, we can make sure
that Phase 1 of Algorithm 2 delivers an identical group division as that of the basic
sorting principle.
Theorem 4.1.2. Let S be a nullterminated string of length n. When applying Algorithm
2 to S, the following propositions hold after Phase 1 is completed:
• group(i) = group(j) ⇔ S[i..bi) =lex S[j..b
j) ∀i, j ∈ [1 . . . n]
• group(i) < group(j) ⇔ S[i..bi) <lex S[j..b
j) ∀i, j ∈ [1 . . . n]
Proof. Consider the point in time when the last group ($) of S is processed. Since this
group is the lowest one, it will not change any further groups when being processed,
and since $ occurs only once at the end of S, the group order is correct after the start
of the algorithm.

27
Chapter 4 The Algorithm

We show that ic = bi for all i ∈ [1 . . . n] before the last group was processed, by which
in combination with Lemma 4.1.1 (propositions (i) and (ii)) the theorem automatically
follows. For i = n the theorem already is correct, as shown above.
First, assume bi < ic for any i ∈ [1 . . . n). By using proposition (iv) of Lemma 4.1.1,
it follows that group(ic ) ≤ group(i) < group(bi). Then, using proposition (ii) of Lemma
4.1.1, S[i..ic ) <lex S[bi..bic ) must hold. Because group(bi) > group(ic ), bic ≤ ic must hold;
otherwise, proposition (iv) of Lemma 4.1.1 would be harmed for bi. This means that
S[i..ic ) cannot be a proper prefix of S[bi..bic ). Since S[i..ic ) <lex S[bi..bic ), a k < bic − bi
with S[i..i + k) = S[bi..bi + k) and S[i + k] < S[bi + k] must exist. This further implies
Si <lex Sbi what leads to contradiction against Definition 3.0.1 of next lexicographically
smaller suffixes.
So far, we know that ic ≤ bi holds for all i ∈ [1 . . . n). The next claim to be shown is
that ic = bi for all i ∈ [1 . . . n). The proof is done by induction on the distance bi − i of a
suffix Si .

Induction Base: Let i ∈ [1 . . . n) be an index with bi − i = 1.

Since i < ic ≤ bi holds, clearly, ic = bi.

Induction Step: Let i ∈ [1 . . . n) be an index with bi − i > 1.

Assume that ic 6= bi, so ic < bi. Using Definition 3.0.1 of next lexicographically smaller
suffixes, bi <lex i <lex ic holds. Using the same definition, ic >lex bi implies ibc ≤ bi.
Now we know that i < ic < ibc ≤ bi. Since the distance between ic and ibc is smaller
than that of i and bi, we can apply the induction hypothesis, and obtain icc = ibc .
By our assumptions, i <lex ic holds, so S[i..ic ) ≤lex S[ic ..icc ) follows. First, consider
S[i..ic ) <lex S[ic ..icc ): Using proposition (ii) of Lemma 4.1.1, we obtain group(i) <
group(ic ). But, as (iv) of Lemma 4.1.1 states, group(i) ≥ group(ic ) should hold, con-
tradiction.
Next, consider S[i..ic ) =lex S[ic ..icc ). Using icc = ibc and the definition of next lexico-
graphically smaller suffixes, Si = S[i..ic )Sic >lex S[i..ic )Sibc = Sic holds. Summarizing,
Si <lex Sic and Si >lex Sic should hold, contradiction.
So clearly, our assumption was wrong, and ic = bi must hold.
As expected, Phase 1 of Algorithm 2 sorts suffixes in groups just as desired. The
remaining issue now is to prove that the whole algorithm works correctly, what will be
addressed next.
Theorem 4.1.3. Let S be a nullterminated string of length n. Then, Algorithm 2
applied to the string S computes the suffix array SA of S.
Proof. Per induction on the iterations of the outer loop in Phase 2 (lines 17 to 28). The
induction hypothesis states that before the i-th iteration
(i) SA[i] = j for the lexicographically i-th smallest suffix Sj .
(ii) For any j ∈ [1 . . . n), Sbj <lex SSA[i] ⇔ Sj is correctly placed in the suffix array.2

(iii) The group order is consistent:

group(j) < group(k) ⇒ Sj <lex Sk ∀j, k ∈ [1 . . . n].
2 SA[k] = j ⇔ Sj is the lexicographically k-th smallest suffix.

28
4.1 Correctness

(iv) Correctly placed suffixes belong to their own group:

If SA[k] = j for any j, k ∈ [1 . . . n], then | group(j)| = 1.

Induction Base: Before the first iteration, line 16 sets SA[1] = n. Because of the
definition of the sentinel character $ this is correct, so (i) is fulfilled. Since Sn is the
lexicographically smallest suffix, there exists no suffix Sj with Sbj <lex Sn . Initially, the
suffix array is filled with nil’s except for the first position, thus (ii) is correct, too. Also,
because of the suffix grouping after Phase 1 (Theorem 4.1.2) and Theorem 3.1.1 of page
10, (iii) and (iv) hold.

Induction Step: Consider Algorithm 2 in the i-th iteration. We will first show (ii),
(iii) and (iv), and then use this result to show (i).
So first, we need to show that after the i-th iteration all suffixes Sj with Sbj ≤lex SSA[i]
are correctly placed in the suffix array. By induction hypothesis we know that all suffixes
Sj with Sbj <lex SA[i] are placed correct already, so it’s sufficient to show that all suffixes
Sj with b j = SA[i] are placed correctly during the i-th iteration.
Within the i-th iteration, the algorithm iterates over indices SA[i−1], prev(SA[i − 1]),
prev(prev(SA[i−1])), . . ., until index 0 or an index j with SA[|{ s ∈ [1 . . . n] | group(s) <
group(j) }| + 1] 6= nil is reached, lines 17 to 28. First, let’s discuss the second loop
termination criteria. Let sr := |{ s ∈ [1 . . . n] | group(s) < group(j) }|. Then the
observation is the following:

j is placed into the suffix array already ⇔ SA[sr + 1] 6= nil (4.3)

If j has been placed in the suffix array already, using hypothesis (iii) and (iv), sr is
the number of lexicographically smaller suffixes of Sj . Because of the correct placement
of j, SA[sr + 1] = j must hold, so especially SA[sr + 1] 6= nil holds. Now, consider
SA[sr+1] 6= nil. By hypothesis (iv), | group(SA[sr+1])| = 1 holds. Consequently, |{ s ∈
[1 . . . n] | group(s) < group(SA[sr + 1]) }| = |{ s ∈ [1 . . . n] | group(s) < group(j) }|, so
group(SA[sr + 1]) = group(j). Since SA[sr + 1] belongs to its own group, SA[sr + 1] = j,
so j is placed into the suffix array already.
Using observation (4.3) and Definition 4.1.1 of prev pointer chains from page 24, the
behaviour of the algorithm in the i-th iteration can be described as following: Let k be
the smallest number such that π k (SA[i] − 1) is not contained in the suffix array. Then,
the algorithm iterates the set J := { π l (SA[i] − 1) | l ∈ [0 . . . k) }, places each index
j ∈ J to the suffix array, and creates a new group for each index. Thus, to show that
all suffixes Sj with b j = SA[i] are placed correctly to the suffix array, we need to show
that M := { s ∈ [1 . . . n] | sb = SA[i] } = J , as well as showing that each placement is
correct.
We start with the first part, namely we show that j ∈ J ⇔ j ∈ M. For the forward
direction, the proof is done per induction over the prev pointer chain of SA[i]−1: As base
case, consider j := SA[i] − 1 is not placed into the suffix array. Then, using hypothesis
(i) in the i-th iteration, we know that Sj >lex SSA[i] . Consequently, using the definition
of next lexicographically smaller suffixes, bj = SA[i].
For the induction step, for some l > 0, let j := π l (SA[i] − 1) and k := π l−1 (SA[i] − 1)
be indices such that j is not contained in the suffix array, and b k = SA[i] holds. Since
prev(k) = j, j = min{ s ∈ [1 . . . k) | group(s) < group(k) }, see line 5 of the algorithm,

29
Chapter 4 The Algorithm

as well as the argumentation in Lemma 4.1.1. Now, using the prev pointer definition
and the consistent group order (hypothesis (iii)), group(j) < group(q) implies Sj <lex Sq
for all j < q ≤ k. Also, using the definition of next lexicographically smaller suffixes
for Sk , Sj <lex Sk <lex Sq for all k < q < b k. Consequently, b j ≥b k = SA[i] must hold.
Since j is not contained in the suffix array, using hypothesis (i) in the i-th iteration,
Sj >lex SSA[i] , so clearly, b
j = SA[i].
For the backward direction, let j be an index such that j 6∈ J . If j ≥ SA[i], clearly
j 6= SA[i] holds by the definition of next lexicographically smaller suffixes. Next, consider
b
the case that j is placed between an index k and its prev pointer, i.e. prev(k) < j < k,
and b k = SA[i] holds. Using the definition of prev pointers, group(j) ≥ group(k) holds.
Now, using proposition (iv) of Lemma 4.1.1 after Phase 1, jc ≤ k must hold; if jc > k,
group(j) < group(k) must hold, contradiction. Since the end of the implicit context
equals the next lexicographic smaller suffix (see Theorem 4.1.2), b j = jc ≤ k < SA[i]
holds, so bj 6= SA[i].
For the last missing case, let Sj be a suffix already contained in the suffix array, such that
prev(k) = j for any k ∈ J . We need to show that qb 6= SA[i] for any q ∈ [1 . . . j]. If q = j,
j is contained in the suffix array already. Using hypothesis (ii), Sbj <lex SSA[i] holds, so
j 6= SA[i]. Now, assume for any q with 1 ≤ q < j that qb = SA[i]. Since q < j < SA[i],
b
using the definition of next lexicographically smaller suffixes for q, Sq <lex Sj must
hold. Using the same definition, SSA[i] = Sqb <lex Sq <lex Sj holds. Since j < qb holds
by precondition, this means that b j ≤ SA[i] and Sbj ≥lex SSA[i] hold. Thus, by hypothesis
(ii), the suffix Sj cannot be contained in the suffix array, contradiction.
Both directions show that j ∈ J ⇔ j ∈ M. The missing part is to ensure correct
placement of any j ∈ J into the suffix array. Therefore, consider j to be an index
with b j = SA[i]. Because of the consistent group order (hypothesis (iv)), all suffixes
placed in lower groups are lexicographically than Sj . Now, assume that another suffix
Sk <lex Sj with group(k) = group(j) exists. Since j and k belong to the same group,
using Lemma 4.1.2, S[j..b j) = S[k..bk) must hold. Thus, Sbk <lex Sbj must hold. In this
case, using hypothesis (ii), k must have been placed in the suffix array already. Since k
is placed in the suffix array already, it must belong to its own group, as hypothesis (iv)
states. Consequently, group(k) 6= group(j), so no such suffix can exist, and Sj must be
the lexicographically minimal element of its group. Thus, line 20 computes the number
of lexicographically smaller suffixes of Sj , and line 24 places j at the correct position
into the suffix array. Furthermore, since line 25 places j into its own group, hypothesis
(iv) is correct after the i-th iteration. Finally, since Sj is the lexicographically minimal
suffix of it’s old group, the placement of the new group as immediate predecessor of j’s
old group ensures a consistent group order, so hypothesis (iii) is correct after the i-th
iteration.
So far, we proved hypotheses (ii), (iii) and (iv). Now, it remains to show that after
the i-th iteration the lexicographically i + 1-th smallest suffix Sj is placed in SA, i.e.
SA[i + 1] = j. Since Sj is the lexicographically i + 1-th lowest suffix in S, Sbj ≤lex SSA[i]
must hold. Now, using hypothesis (i) which is proved already, we know that all suffixes
Sk with Sbk = Sbj are correctly placed in the suffix array. As Sj belongs to those suffixes,
SA[i + 1] = j must hold, thus hypothesis (i) is shown.
Since all suffixes are placed correctly to the suffix array, and entries are not changed
(line 21), Algorithm 2 computes the correct and entire suffix array SA of S.

30
4.2 Runtime

To be honest, the correctness proof of Algorithm 2 is quite hard. On the other hand,
the algorithm itself is relatively easy to understand—in my opinion at least. It seems a
bit as if the complexity degree of efficient suffix array construction must been split up
between an algorithm and its proof, forming a balance of complexity between all suffix
array construction algorithms of same asymptotic runtime, but that’s just a marginal
note I was thinking of while proving correctness.
However, we’ve seen the algorithm and its correctness, but one part is missing: asymp-
totic runtime. So, within the next section, the time complexity of the algorithm will be
discussed.

4.2 Runtime
After proving correctness of the algorithm, next issue will be to prove the asymptotic
linear runtime of the algorithm, together with linear space consumption. Since the
algorithm was described in a rather rough way yet, we need to get a bit more technical,
but keeping everything as simple as possible. A more appropriate implementation for
real world usage can be found within the next chapter.
First, recall Phase 1 of Algorithm 2. Main tasks were to build initial groups, iterate
them in descending group order, compute previous smaller suffixes and rearranging
them. In order to explain all necessary steps, Algorithm Excerpt 2 shows the performed
instructions during Phase 1.

Algorithm Excerpt 2 Phase 1 of Algorithm 2, page 23.

Phase 1: sort suffixes into groups
1: order all suffixes of S into groups according to their first character:
Let Si and Sj be two suffixes. Then, group(i) = group(j) ⇔ S[i] = S[j].
2: order the suffix groups: Let G1 be a suffix group with group prefix character u,
G2 be a suffix group with group prefix character v. Then, G1 < G2 if u < v.
3: for each group G in descending group order do
4: for each i ∈ G do
5: prev(i) ← max({ j ∈ [1 . . . i − 1] | group(j) < group(i) } ∪ {0})
6: end for
7: let P be the set of previous suffixes from G,
P := { j ∈ [1 . . . n] | prev(i) = j for any i ∈ G }.
8: split P into k subsets P1 , . . . , Pk such that a subset Pl contains
suffixes whose number of prev pointers from G pointing to them
is equal to l, i.e. i ∈ Pl ⇔ |{ j ∈ G | prev(j) = i }| = l.
9: for l = k down to 1 do
10: split Pl into m subsets Pl1 , . . . , Plm such that suffixes
of same group are gathered in the same subset.
11: for q = 1 up to m do
12: remove suffixes of Plq from their group and put them into a new
group placed as immediate successor of their old group.
13: end for
14: end for
15: end for

31
Chapter 4 The Algorithm

First thing that has to be done is to explain a working set of needed data structures.
Five arrays of size n will be used for this:

• SA contains suffix starting positions, ordered according to the current group order.

• ISA is the inverse permutation of SA, to be able to detect the position of a certain
suffix in SA.

• GSIZE contains the sizes of all groups. Group sizes are ordered according to the
group order, so GSIZE has same order as SA. GSIZE contains the size of each group
only once at the beginning of the group, followed by zeros until the beginning of
the next group.

• GLINK stores pointers from suffixes to their groups. All entries point at the be-
ginning of a group, at the same position where GSIZE contains the size of the
group.

• PREV is used to store prev pointers computed during Phase 1, see line 5 of Al-
gorithm Excerpt 2. All entries initially are set to nil, to detect if a prev pointer
already exists.

An example of the data structure setup can be found in Figure 4.1. The initial setup
of those arrays can be performed in linear time by using a technique called bucket sort
and further iterations using a character count table. Thus, lines 1 and 2 require O(n)
time.

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $

GSIZE[i] 1 2 0 1 2 0 3 0 0 3 0 0 2 0
SA[i] 14 3 8 6 1 13 4 9 11 5 10 12 2 7

GLINK[i] 5 13 2 7 10 4 13 2 7 10 7 10 5 1
ISA[i] 5 13 2 7 10 4 14 3 8 11 9 12 6 1

Figure 4.1: Initial data structure setup after line 2 of Phase 1, applied to the string
S = graindraining$. Prev pointers are not listed since all entries initially
are set to nil, some (but not all) GLINK pointers are displayed for better
illustration.

The first problem that has to be solved is to process the groups in descending group
order, line 3. Assume we got two variables gs and ge, pointing to start and end of a
group. To get to the next lower group, we can set ge ← gs − 1 and gs ← GLINK[SA[gs −
1]]. Doing group iteration in this way, we trivially need O(n) steps to process all groups
in descending group order. Also, the suffixes of the processed group can be accessed by
iterating SA[gs..ge]. Initially, gs can be set to n + 1, to start with the highest group.
Next thing to be handled is the computation of prev pointers, line 5 of Algorithm
Excerpt 2. As already mentioned, we use a technique called pointer jumping for this
purpose: From an index s, we start with its previous index s − 1. If s − 1 belongs to

32
4.2 Runtime

a lower group, we’re done already. If s − 1 belongs to a higher group, its prev pointer
must have been computed already. Since s − 1 is placed in a higher group, all suffixes
between s − 1 and PREV[s − 1] belong to higher groups than that of s, so we can use
PREV[s − 1] to jump above all of those suffixes. After jumping, the procedure will be
repeated, until an index with a lower or equal group is reached, see Algorithm 3.
Let us shortly ignore the special case when Algorithm 3 returns a suffix placed in the
same group as s, we will handle this later on. If every call of function prev-equal
would stop within the first iteration of its inner loop, overall prev pointer computation
would require O(n) work, since it is executed exactly once per suffix. So clearly, the
question is how much additional iterations are performed by pointer jumping.

Algorithm 3 Computation of a prev pointer for the index s where ge indicates the end
of the group of s. Note that the returned index can belong to the same group as s.
1: function prev-equal(s, ge)
2: p←s−1
3: while p > 0 do
4: if ISA[p] ≤ ge then . group(p) ≤ group(s)
5: return p
6: end if
7: p ← PREV[p] . PREV[p] must exist already
8: end while
9: return 0
10: end function

To answer the question, we first will show that prev pointers cannot cross: Let s, s̃ be
two integers in range [1 . . . n] with PREV[s] < PREV[s̃] < s < s̃. Applying the definition
of prev pointers (line 5 of Algorithm Excerpt 2) to s̃, group(PREV[s̃]) < group(s̃) ≤
group(s) must hold. Also, by applying the definition to s, group(PREV[s̃]) ≥ group(s)
holds, contradiction, prev pointers cannot cross.
After the computation of a prev pointer PREV[s] in Algorithm 3, all used pointers
from the pointer jumping technique are overlaid by the new pointer, see Figure 4.2 for
an example. Assume that the pointers used for pointer jumping will be used again, in
a later step. Clearly, all prev pointers for indices between s and PREV[s] already are
computed, because of the descending group order processing. Since a prev pointer is
computed only once per suffix, and Phase 1 processes groups in descending group order,
an index s̃ with PREV[s] < PREV[s̃] < s < s̃ must exist; otherwise, the indices between
PREV[s] and s cannot be accessed. But prev pointers cannot cross, so no pointer used
by the pointer jumping technique is used more than once. Since at most n prev pointers
are computed, at most n pointers can be used by pointer jumping, so the overall prev
pointer computation requires O(n) work.
Now, let’s come back to the special case when Algorithm 3 returns an index that
belongs to the same group as its input index, i.e. group(s) = group(prev-equal(s, ge))
for some s ∈ [1 . . . n]. To solve this case, we need to build some additional instructions:
Let s be an index that belongs to the currently processed group. If the prev pointer
of s is computed already, we proceed with the next index of the currently processed
group. Otherwise, we compute p ← prev-equal(s, ge). If p belongs to a lower ordered
group, we set PREV[s] ← p; if p belongs to the same group, s is added to a list L, and

33
Chapter 4 The Algorithm

GLINK[i] ··· 3 7 10 7 10 5 ···

PREV[i]
(1) Computation of a prev pointer using pointer jumping.

GLINK[i] ··· 3 7 10 7 10 5 ···

PREV[i]

(2) All pointers used by pointer jumping are overlaid by the

new prev pointer.

Figure 4.2: Prev pointer computation using pointer jumping.

the procedure is repeated for s ← p. At some point, the prev pointer for s is correctly
computed. All remaining indices l ∈ L however still require a prev pointer. Since
the prev pointer of s is correct already, caused by function prev-equal, group(j) ≥
group(l) for all s ≤ j < l holds for all l ∈ L. The indices of L belong to the same group
as s, so we set PREV[l] ← PREV[s] for all l ∈ L, resulting in correct prev pointers for
all indices of L. Thus, Algorithm 4 shows how prev pointers of the currently processed
group can be computed.

Algorithm 4 Prev pointer computation of the processed group, lines 4 to 6 of Algorithm

Excerpt 2. Indices of the current group are contained in SA[gs..ge].
1: for i ← gs up to ge do
2: s ← SA[i]
3: initialize an empty list L
4: while PREV[s] = nil do
5: p ←prev-equal(s, ge)
6: if p = 0 or ISA[p] < gs then . p points to a suffix in a lower group
7: PREV[s] ← p
8: else . p points to a suffix in same group
9: add s to list L
10: s←p
11: end if
12: end while
13: for each l ∈ L do . copy prev pointer from s to all stored indices
14: PREV[l] ← PREV[s]
15: end for
16: end for

This extra computation requires at most O(|G|) extra time (where G is the processed
group), and at most O(n) space for the list. Therefore, the prev pointer computation
still is doable in O(n) time, since each group is processed exactly once.
After handling prev pointer computation, next task is to compute the set P of previous
suffixes, and split it into subsets P1 , . . . , Pk , such that each subset Pl consists of suffixes
Si with exactly l indices of G pointing onto i, i.e. i ∈ Pl ⇔ |{ s ∈ G | prev(s) = i}| = l,
see lines 7 to 8 of Algorithm Excerpt 2. An extra array PC of size n is used for this

34
4.2 Runtime

purpose. Initially, all entries of PC are set to zero. Then, after computing prev pointers
in Algorithm 4, for all s ∈ G, PC[PREV[s]] is incremented. During this loop, the set
P can be computed by checking if PC[PREV[s]] = 0 holds: if true, PREV[s] is visited
the first time and can be appended to a list; otherwise, PREV[s] is contained in the list
already. Also, after the loop, for each p ∈ P, PC[p] contains the count of prev pointers
of G pointing onto p, so for the second split, p belongs to the set PPC[p] .
The second split from P into subsets P1 , . . . , Pk can be performed like the following:
While P is not empty, for each p ∈ P, decrement PC[p]. If PC[p] = 0 within the l-th
outer iteration, remove p from P, and add it to the set Pl . This way, the sets P1 , . . . , Pk
are computed in increasing order, and all entries of PC are set to zero, so it can be
reused for the next split operation. The subsets Pl Pl are stored in the SA-interval of
the processed group: define pls(l) := ge + 1 − i=0 |Pi |. Then the set P1 is stored to
SA[pls(1).. pls(0) − 1], the set P2 is stored to SA[pls(2).. pls(1) − 1], and so on. This way,
empty subsets are ignored, and the sets are stored in decreasing order. Additionally, to
know the size of each set, we set GSIZE[pls(l)] ← |Pl | for each non-empty set. Because
the indices of G aren’t used for further instructions, and |P| ≤ |G| holds, no side effects
occur during the storage. Algorithm 5 shows the code for splitting previous suffixes.

Algorithm 5 Computation of set P and subsets P1 , . . . , Pk , lines 7 to 8 of Algorithm

Excerpt 2. The sets P1 , . . . , Pk are stored into the suffix array interval of the group,
from left to right in decreasing order.
1: initialize an empty list P
2: for i ← gs up to ge do . compute set P and count prev pointers
3: p ← PREV[SA[i]]
4: if p > 0 then
5: if PC[p] = 0 then . p was not counted yet
6: add p to P
7: end if
8: PC[p] ← PC[p] + 1 . count pointers from current group to p
9: end if
10: end for
11: pls ← ge + 1 . start of set Pl in the suffix array
12: ple ← ge . end of set Pl in the suffix array
13: while P is not empty do . within the l-th iteration, the set Pl is computed
14: for each p ∈ P do
15: PC[p] ← PC[p] − 1
16: if PC[p] = 0 then
17: remove p from P
18: pls ← pls − 1
19: SA[pls] ← p . store p to set Pl
20: end if
21: end for
22: if pls ≤ ple then . set Pl contains elements
23: GSIZE[pls] ← ple − pls + 1 . store size of Pl
24: ple ← pls − 1 . prepare ple for next set
25: end if
26: end while

35
Chapter 4 The Algorithm

The first loop of Algorithm 5 requires O(|G|) time, since all indices of G are iterated.
Also, the second loop requires O(|G|) time: its inner loop overall requires as much
iterations as prev pointers from G exist, so consequently, at most O(|G|) iterations are
performed.
So, let’s move on to the final step in Phase 1, the rearrangements from lines 9 to 14 in
Algorithm Excerpt 2. By the previous splits, we’re able to iterate subsets Pl in descend-
ing order. It actually turns out that the additional split of a subset Pl from line 10 is not
required to rearrange suffixes: First, for all p ∈ Pl , decrement GSIZE[GLINK[p]], and ex-
change SA[ISA[p]] with the rightmost suffix of its group, SA[GLINK[p]+GSIZE[GLINK[p]]].
Also, update the new ISA values, so ISA stays the correct inverse permutation of SA.
This way, suffixes in Pl are removed from their groups, because they’ll be placed at the
back of their groups, and the sizes of the groups don’t cover them any more. Remaining
task is to set up GLINK and GSIZE, so the new groups are correctly captured. First, for
all p ∈ Pl , set GLINK[p] ← GLINK[p] + GSIZE[GLINK[p]]. Because of the decrements of
GSIZE in the previous loop, GLINK now correctly points to the group beginnings of the
new groups. Last step is to increment GSIZE[GLINK[p]] for all p ∈ Pl , so the sizes of the
new groups are set up correctly. Algorithm 6 shows the full code for rearrangements,
an example can be found in Figure 4.3.

Algorithm 6 Suffix rearrangements of Phase 1, lines 9 to 14 of Algorithm Excerpt 2.

Variables pls and ple are the same as in Algorithm 5, and initially contain the same
values as contained after the termination of Algorithm 5.
1: while pls ≤ ge do . iterate subsets in descending order
2: ple ← pls + GSIZE[pls] . SA[pls..ple] contains all suffixes of set Pl
3: for i ← pls up to ple do . decrement group sizes of previous suffixes
4: p ← SA[i]
5: sr ← GLINK[p]
6: GSIZE[sr] ← GSIZE[sr] − 1
7: sr ← sr + GSIZE[sr]
8: s ← SA[sr] . move p to back of its group by exchange with last suffix
9: pr ← ISA[p]
10: SA[pr] ← s ISA[s] ← pr
11: SA[sr] ← p ISA[p] ← sr
12: end for
13: for i ← pls up to ple do . set new GLINK for rearranged suffixes
14: p ← SA[i]
15: pgs ← GLINK[p]
16: pgs ← pgs + GSIZE[pgs]
17: GLINK[p] ← pgs
18: end for
19: for i ← pls up to ple do . set GSIZE for new groups
20: p ← SA[i]
21: pgs ← GLINK[p]
22: GSIZE[pgs] ← GSIZE[pgs] + 1
23: end for
24: pls ← ple + 1
25: end while

36
4.2 Runtime

i 1 2 3 4 5 6 7 8
−1 −1
GSIZE[i] 1 2 0 1 2 0 3 0 ···
SA[i] 14 3 8 6 1 13 4 9 ···

GLINK[i] 5 13 2 7 10 4 13 2 ···
(1) for each p ∈ Pi , move p to the back of its group and
decrement group size.

i 1 2 3 4 5 6 7 8

GSIZE[i] 1 2 0 0 1 0 3 0 ···
+ +
GLINK[i] 5 13 2 7 10 4 13 2 ···
(2) for each p ∈ Pi , add gsize of old group to GLINK[p].

i 1 2 3 4 5 6 7 8
+1 +1
GSIZE[i] 1 2 0 0 1 0 3 0 ···

GLINK[i] 6 13 2 7 10 4 13 2 ···
(3) for each p ∈ Pi , increment group size of new group.

i 1 2 3 4 5 6 7 8

GSIZE[i] 1 2 0 1 1 1 3 0 ···
SA[i] 14 3 8 6 13 1 4 9 ···

GLINK[i] 6 13 2 7 10 4 13 2 ···
(4) Result after the rearrangements. All suffixes of set Pl
were rearranged in new groups, placed as immediate
successors of their old groups.

Figure 4.3: Suffix rearrangements for the set Pl = {1, 6}. Items of Pl are marked bold.

Since each subset Pl is iterated three times, Algorithm 6 requires O(|P|) = O(|G|)
asymptotic time.
Now, we’re almost done with Phase 1. The last bit I want to mention here is a
preparation step for Phase 2. Recall that gs and ge were the indices of the current
group’s start and end in SA. After the rearrangements took place, SA[ge] will be set to
gs. This last index of the group later will perform as a group counter. Also, to ensure
each suffix has access to the group counter, we set ISA[s] ← ge for all indices s of the
processed group. This step has to be performed before the computation of the set P in
Algorithm 5, since Algorithm 5 overwrites the set indices with prev pointers. Note that
this preparations don’t cause side effects: Each group is processed only once, so SA can
be overwritten without problems. Also, after the preparation, gs ≤ ISA[s] ≤ ge holds

37
Chapter 4 The Algorithm

for all indices s of the processed group, so we can still determine if a suffix belongs to
an already processed group.
Before Phase 2 will be discussed, let’s first recall the performed steps of Phase 2, see
Algorithm Excerpt 3. The only instructions not directly implementable are those from
lines 20 to 25.

Algorithm Excerpt 3 Phase 2 of Algorithm 2, page 24.

Phase 2: construct suffix array from groups
16: SA[1] ← n
17: for i = 1 up to n do
18: j ← SA[i] − 1
19: while j 6= 0 do
20: let sr be the number of suffixes placed in lower groups,
i.e. sr := |{ s ∈ [1 . . . n] | group(s) < group(j) }|.
21: if SA[sr + 1] 6= nil then
22: break
23: end if
24: SA[sr + 1] ← j
25: remove j from its current group and put it in a new group
placed as immediate predecessor of j’s old group.
26: j ← prev(j)
27: end while
28: end for

Let’s start with the first point, checking if a suffix is contained in SA already, line
21. Since SA was modified during Phase 1, we cannot check if any SA - entry is nil.
To solve this problem, we will use ISA: whenever an index s is placed into SA, we set
ISA[s] ← 0. Then, to check if a suffix is placed into the suffix array already, we only
need to compare its ISA-value with zero.
Next, we’ll have a look at suffix rank computation and suffix placement in lines 20
and 25 of Algorithm Excerpt 3. Recall that each group was prepared in Phase 1, so for
each suffix Sj , ISA[j] points to the end of its group in SA. Also, for every group, the end
of a group contains a counter initially set to the start of the group in SA. To compute
sr in line 20, we follow this counter, i.e. sr ← SA[ISA[j]]. Then, to move a suffix to the
front of its group, we set SA[sr] ← j. To ’remove’ the suffix Sj from its old group, it’s
sufficient to increment the group counter SA[ISA[j]]. Note that the group counter must
be incremented before the placement of j in SA, since both positions can be equal. A
full example for the placement of a suffix can be found in Figure 4.4, Algorithm 7 shows
the full implementation of Phase 2.
Using this implementation of Phase 2, every described operation is supported in
constant time. The overall time for Phase 2 is O(n) plus the number of iterations of the
inner loop. By the correctness of the algorithm (Theorem 4.1.3 from page 28) we know
that within the i-th iteration of the outer loop, the inner loop will process all suffixes j
with bj = SA[i]. Since each suffix has exactly one next lexicographically smaller suffix,
the inner loop iterates n − 1 suffixes, thus Phase 2 requires O(n) time.

38
4.2 Runtime

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
+1
SA[i] 14 3 8 4 13 6 − − 7 − − 10 − 13

ISA[i] 6 14 0 9 12 4 14 0 9 12 9 12 0 0
(1) let sr ← SA[ISA[j]], then increment SA[ISA[j]].

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14

SA[i] 14 3 8 4 13 6 − − 7 − − 10 2 14

ISA[i] 6 0 0 9 12 4 14 0 9 12 9 12 0 0
(2) set SA[sr] to j, and ISA[j] to zero.

Figure 4.4: Suffix placement for j = 2. Suffix items are marked bold, empty positions
in SA are placed over arrows indicating the start of their groups.

Algorithm 7 Implementation of Phase 2, see Algorithm Excerpt 3.

1: SA[1] ← n
2: for i = 1 up to n do
3: j ← SA[i] − 1
4: while j 6= 0 do
5: ge ← ISA[j]
6: if ge = 0 then
7: break
8: end if
9: sr ← SA[ge]
10: SA[ge] ← SA[ge] + 1
11: SA[sr] ← j
12: ISA[j] ← 0
13: j ← PREV[j]
14: end while
15: end for

Now finally, we’re able to show that the algorithm works in asymptotic optimal run-
time and linear space.

Theorem 4.2.1. Let S be a nullterminated string of length n. Then, Algorithm 2

applied to the string S computes the suffix array SA of S in O(n) time, with asymptotic
linear word space.

Proof. The initial group setup can be performed in O(n). Also, computation of prev
pointers requires O(n) time. All remaining operations in the outer loop of Phase 1 re-
quire O(|G|) time, where G is the current processed group. Since each group is processed
only once, overall time complexity for Phase 1 is O(n). Phase 2 also can be performed

39
Chapter 4 The Algorithm

in O(n) time, thus the overall time complexity of Algorithm 2 is O(n). Each additional
used array or list has length of at most n, so overall space complexity is O(n) words.

We’ve seen that the algorithm can be implemented with optimal asymptotic runtime
and linear space. This is quite a nice result, but for real world applications, the algorithm
needs to be tuned to consume as less space as possible and run as fast as possible. The
next chapter will show some of such optimisations, resulting in a competitive linear–time
suffix array construction algorithm.

40
Chapter 5
Implementation
After having presented the algorithm, this chapter’s purpose is to describe an imple-
mentation suitable for real world usage. Within Section 4.2, a possible implementation
was described already, but it has several downsides: A lot of additional data structures
are required, and some algorithmic solutions seem a bit awkward. To fix these issues,
the implementation of Section 4.2 will be modified.
First, the required data structure framework is described. It consists of four arrays
of size n, exactly in the same manner as described in Section 4.2, page 32:

• SA contains suffix starting positions, ordered according to the current group order.

• ISA is the inverse permutation of SA, to be able to detect the position of a certain
suffix in SA.

• GLINK stores pointers from suffixes to their groups. All entries point at the be-
ginning of a group, at the same position where GSIZE contains the size of the
group.

In contrast to the implementation of Section 4.2, this will be all data structures we
need.
Before we proceed on, let’s shortly recapitulate the steps performed within the first
phase of the algorithm. Pseudocode can be found in Algorithm Excerpt 2 on page 31,
it won’t be listed here again. Initially, suffixes were placed in groups according to their
first character, groups themselves are sorted according to the rank of their representative
character. This initial step can be done using bucket sort. Then, a loop processes groups
in descending order. For each processed group G, prev pointers are computed, the set
P of previous suffixes is computed, and suffixes of P are split into subsets P1 , . . . , Pk
according to the number of prev pointers of G pointing to them.1 The last step is to
process the subsets P1 , . . . , Pk in descending order, and rearranging suffixes of a subset
Pl by placing all indices p ∈ Pl that belong to the same group into a new group, as
immediate successor of the old group.
Processing groups in descending order works in the same way as in Section 4.2: Let
gs be the start of a group. To get to the next group, we set ge to gs − 1 and then set
1A subset Pl consists of indices p ∈ P, such that exactly l prev pointers from G are pointing to p.

41
Chapter 5 Implementation

gs ← GLINK[ge]. Now, the suffixes of the new group are contained in SA[gs..ge], and
can be processed.
Next, let’s move on to prev pointer computation. The first thing to be mentioned is
the storage of prev pointers: no separate array is declared for this purpose. To solve
this issue, prev pointers will be stored in the GLINK array: Since the GLINK array is
used only for suffixes of groups that have not been processed yet, and rearrangements
of suffixes only take place in unprocessed groups, no side effects occur. The only thing
we need to be aware of is to use ISA to check if a group has been processed already: if
ISA[s] > ge, suffix s belongs to a group that has been processed already.
Before proceeding, another optimisation shall be discussed. Consider the case when
two prev pointers of suffixes from same group point to the same position, i.e. two indices
i, j ∈ G with i < j exist such that prev(i) = prev(j), see Figure 5.1. In this case, the
left prev pointer prev(i) can be removed: since prev pointers do not cross, prev(i) is
overlaid by the right prev pointer prev(j), thus it won’t be used in the pointer jumping
technique used for prev pointer computation. Additionally, in the second phase, the
right prev pointer prev(j) will be visited first2 , so the suffix Sprev(j) will already be
contained in the suffix array when prev(i) is used. Since the algorithm stops in such
cases, it doesn’t matter whether the left prev pointer exists or not.

S[i] ··· a i n i n ··· S[i] ··· a i n i n ···

PREV[i] PREV[i]

(1) Two prev pointers from same group point (2) The left prev pointer is overlaid by the
onto the same suffix. right one, and can be removed.

Figure 5.1: Prev pointer overlay by two pointers of same group.

Using this optimisation, prev pointer computation and the split of suffixes can be
handled easier: First, for each suffix in the current group, compute prev-equal (see
page 33). If the previous suffix belongs to the same group, mark it. Then, filter out
all marked suffixes, and repeat the following procedure for all remaining suffix starting
positions s: let p be the prev pointer of s, p = PREV[s]. if p belongs to the same group
as s, update the prev pointer of s by using the prev pointer of its previous suffix Sp
(PREV[s] ← PREV[p]), and set PREV[p] to zero. Otherwise, add p to a list, and remove s
from the remaining suffix starting positions. After the iteration of all remaining suffixes,
store the computed list to a list of lists. By repeating this behaviour as long as suffixes
remain, all prev pointers get computed correctly. Also, the generated lists correspond
to the subsets P1 , . . . , Pk . In essence, generating lists is not necessary, since the suffixes
can be placed into the index storage of the currently processed group, namely SA[gs..ge]:
Each subset Pl is stored to an interval SA[ps..pe] with gs ≤ ps ≤ pe ≤ ge, in the same
manner as described in the previous chapter (Algorithm 5 on page 35). Algorithm 8
shows this prev pointer computation combined with the split of previous suffixes.

2 Contexts of both suffixes Si and Sj are the same. Since both prev pointers point to the same position,
the context of the left suffix Si ends at the position of the right suffix Sj , i.e. ic = j. Since group G
is processed at the moment, ic won’t be changed during the algorithm, so bi = ic holds by Theorem
4.1.2. This clearly shows that bi = j, so the second phase will process Sj before Si .

42
Algorithm 8 Prev pointer computation and split in the first phase of the algorithm,
variables gs and ge contain start and end bounds of the current group. Every occurrence
of PREV must be replaced with GLINK, but replacement is omitted for better readability.
1: GSIZE[gs] ← 0 . make place for markings
2: for i ← ge down to gs do . compute prev pointers
3: p ← prev-equal(SA[i], ge) . Algorithm 3 on page 33
4: if p > 0 and ISA[p] ≥ gs then . p belongs to current group
5: GSIZE[ISA[p]] ← 1 . mark p
6: end if
7: PREV[SA[i]] ← p . store pointer
8: end for

9: pe ← gs
10: for i ← gs up to ge do . move unmarked suffixes to front
11: if GSIZE[i] 6= 1 then
12: SA[pe] ← SA[i]
13: pe ← pe + 1
14: end if
15: ISA[SA[i]] ← ge . preparation for the second phase
16: end for

17: ps ← gs
18: pe ← pe + 1
19: l←0
20: repeat . compute final pointers and split suffixes
21: i ← pe − 1
22: tmp ← pe
23: while i ≥ ps do
24: p ← PREV[SA[i]]
25: if p > 0 then
26: if ISA[p] < gs then . p is in an other group
27: pe ← pe − 1
28: SA[i] ← SA[pe]
29: SA[pe] ← p . push pointer to back
30: else . p is in current group
31: PREV[SA[i]] ← PREV[p] . copy pointer
32: PREV[p] ← 0 . clear pointer of p, won’t be used any more
33: end if
34: i←i−1
35: else . p points to nothing, remove it
36: SA[i] ← SA[ps]
37: ps ← ps + 1
38: end if
39: end while
40: if pe < tmp then . at least one prev pointer was pushed to back
41: GSIZE[pe] ← tmp − pe . store number of pointers
42: l ←l+1 . and update number of subsets
43: end if
44: until ps = pe

43
Chapter 5 Implementation

The rearrangements can be performed in exactly the same way as described in Section
4.2. Algorithm 9 shows a modified version which works with the modified split lists of
Algorithm 8.

Algorithm 9 Suffix rearrangements in the first phase of the algorithm. Variables gs

and ge contain start and end bounds of the current group, ps and pe contain the start
position of the first suffix set, directly after the execution of Algorithm 8.
1: for k = 1 up to l do . iterate subsets P1 , . . . , Pk in descending order
2: ps ← pe
3: pe ← pe + GSIZE[pe] . get bounds for set

4: for i ← pe − 1 down to ps do . decrement group counts of previous suffixes

5: p ← SA[i]
6: sr ← GLINK[p]
7: GSIZE[sr] ← GSIZE[sr] − 1
8: sr ← sr + GSIZE[sr]
9: s ← SA[sr] . move p to back of its group by exchange with last suffix
10: pr ← ISA[p]
11: SA[pr] ← s ISA[s] ← pr
12: SA[sr] ← p ISA[p] ← sr
13: end for

14: for i ← pe − 1 down to ps do . set new GLINK for rearranged suffixes

15: p ← SA[i]
16: pgs ← GLINK[p]
17: pgs ← pgs + GSIZE[pgs]
18: GLINK[p] ← pgs
19: end for

20: for i ← pe − 1 down to ps do . set GSIZE for new groups

21: p ← SA[i]
22: pgs ← GLINK[p]
23: GSIZE[pgs] ← GSIZE[pgs] + 1
24: end for
25: end for
26: SA[ge] ← gs . Preparation for Phase 2

The last step of the first phase is to prepare the current group for processing in the
second phase, by setting ISA[s] ← ge for all suffixes s of the group, and SA[ge] ← gs.
Both steps are performed within the Algorithms 8 and 9. The second phase can be
implemented in exactly the same way as described in Section 4.2, Algorithm 6. The
only thing to be considered is that prev pointers are contained in the GLINK array, not
in a separate array.
Using these tricks, the algorithm can be implemented using 17n bytes of space: n
bytes for the text itself, 4n bytes for SA, and additional 12n bytes for ISA, GLINK and
GSIZE, assuming that an integer requires 4 bytes. An implementation in C requires
about 200 lines of code for the full algorithm, and can be found on github [2].

44
An additional space optimisation can be applied to the GSIZE array: GSIZE stores
group sizes (and markings) whose distance is greater than or equal to their value. More
detailed, let GSIZE[i] = k and GSIZE[j] = l with i < j be any two numbers in GSIZE.
Then the algorithm always ensures that i+k ≤ j holds. This permits the use of variable-
length number storage in GSIZE, e.g. by using Elias Gamma Coding [5], and reduces
the space requirements of GSIZE from 4n bytes to 2n bits.
Summing everything up, the algorithm can be implemented using 13.25n bytes of
space. The implementation available on github [2] does not contain the variable-length
number storage (basically because additional operations for fetching variable-length
numbers would slow down the algorithm), thus it requires 17n bytes of space.
Beside space consumption, performance is a big point of interest on any algorithm.
We already know that the proposed algorithm runs in asymptotic linear time, but
theoretical runtime and practical performance can vary considerably. To fill this gap,
the issue of the next chapter will be to measure practical performance, thus we will see
whether the algorithm is suitable for real world usage.

45
Chapter 6
Performance Analyses
After the discussion about the algorithms functional principle, it is time to check if it
is suitable for real world usage by comparing its performance with other suffix array
construction algorithms.
A list of competing SACAs can be found in Table 6.1. It includes most common
linear-time SACAs, such as the SA-IS by Nong et al., the algorithm by Ko and Aluru
and the DC3 a.k.a. Skew Algorithm by Kärkkäinen and Sanders. Also, a very fast
SACA named divsufsort is included, to compare results with the current state of the
art.
Since the SACA described in this thesis makes heavy use of groups, and also works
in a greedy manner, I called it GSACA—quite an imposing name for a suffix array
construction algorithm, but results will tell whether the promising name satisfies its
expectations.

Algorithm Resource Runtime Extra Working Space

divsufsort [22, Yuta Mori] O(n log n) O(1)
SAIS [23, Yuta Mori] O(n) O(log n) + max 2n
KA [15, Pang Ko] O(n) O(log n) + 4.16n
DC3 [29, Peter Sanders] O(n) O(log n) + max 24n
GSACA1 [2, This Thesis] O(n) O(1) + 12n

Table 6.1: Used algorithms in benchmarks. Extra Working Space contains memory
requirements for function calls (O(1) for non–recursive, O(log n) for recursive
algorithms), as well as space required in addition to the 5n for the text and
the suffix array. All space requirements are measured in bytes, where an
integer is assumed to require 4 bytes of space.

Now, a word to the test data. It is a selection of three text corpuses, namely the
Silesia Corpus [4] with small–sized files (< 40 MB), the Pizza & Chili Corpus [7] with
medium–sized files and the Repetitive Corpus [8] with highly repetitive files. The text
selection contains texts with quite different types: normal texts, source codes, database
data, html/xml documents as well as dna and protein sequences. This text variety
gives a good overview over SACA performance in different contexts, and should lead to
representative results in suffix array construction.
1 8.25nbytes of extra working space are possible using the variable-length storage described in Chapter
5, but additional operations would slow down the algorithm.

47
Chapter 6 Performance Analyses

All experiments were conducted on a 64 bit Ubuntu 14.04.3 LTS (Kernel 3.13) system
equipped with two ten-core Intel Xeon processors E5-2680v2 with 2.8 GHz and 128GB
of RAM. The benchmark itself was compiled using g++ (version 4.8.4) and the -O3
option. Construction speeds2 were measured using C++ built–in timers, while cache
miss rates3 were measured using perf_events4 . Results (average of 10 runs) can be
found in Tables 6.2, 6.3 and 6.4.

File Measurement divsufsort SAIS KA DC3 GSACA

webster speed2 12.1 MB/s 12.4 MB/s 5.6 MB/s 1.9 MB/s 3.4 MB/s

n= 39.5 MB, σ = 98 cache misses3 39.6 % 46.2 % 32.6 % 68.3 % 68.2 %

dickens speed2 16.9 MB/s 17.6 MB/s 8.2 MB/s 3.5 MB/s 5.1 MB/s
3
n= 9.7 MB, σ = 100 cache misses 6.5 % 6.4 % 7.4 % 28.3 % 49.8 %
nci speed2 18.7 MB/s 21.5 MB/s 10.6 MB/s 3.3 MB/s 5.0 MB/s

n= 32.0 MB, σ = 63 cache misses3 33.5 % 45.5 % 32.6 % 59.6 % 65.5 %

Table 6.2: Benchmark results of the Silesia Corpus.

File Measurement divsufsort SAIS KA DC3 GSACA

sources.200MB speed2 11.5 MB/s 10.3 MB/s 4.2 MB/s 1.1 MB/s 3.2 MB/s

n = 200.0 MB, σ = 231 cache misses3 55.4 % 69.7 % 52.6 % 83.2 % 74.0 %
dblp.xml.200MB speed2 10.9 MB/s 10.3 MB/s 4.1 MB/s 1.3 MB/s 3.3 MB/s
3
n = 200.0 MB, σ = 97 cache misses 47.4 % 76.0 % 59.2 % 90.0 % 79.5 %
dna.200MB speed2 8.2 MB/s 6.8 MB/s 3.5 MB/s 1.1 MB/s 2.9 MB/s

n = 200.0 MB, σ = 17 cache misses3 42.9 % 77.0 % 54.8 % 82.8 % 82.0 %

proteins.200MB speed2 7.2 MB/s 5.9 MB/s 2.7 MB/s 0.9 MB/s 2.7 MB/s

n = 200.0 MB, σ = 26 cache misses3 47.4 % 74.4 % 55.0 % 88.2 % 80.3 %

english.200MB speed2 8.1 MB/s 7.2 MB/s 2.9 MB/s 0.9 MB/s 2.7 MB/s
3
n = 200.0 MB, σ = 226 cache misses 54.4 % 77.1 % 54.2 % 86.4 % 79.2 %

Table 6.3: Benchmark results of the Pizza & Chili Corpus.

File Measurement divsufsort SAIS KA DC3 GSACA

world leaders speed2 24.1 MB/s 22.3 MB/s 9.7 MB/s 2.8 MB/s 4.3 MB/s

n= 44.8 MB, σ = 90 cache misses3 28.1 % 37.6 % 31.7 % 59.0 % 68.5 %

para speed2 8.4 MB/s 11.0 MB/s 3.1 MB/s 1.2 MB/s 3.0 MB/s

n = 409.4 MB, σ = 6 cache misses3 44.8 % 83.5 % 62.4 % 88.1 % 83.3 %

influenza speed2 10.5 MB/s 13.6 MB/s 5.4 MB/s 1.7 MB/s 3.7 MB/s
3
n = 147.6 MB, σ = 16 cache misses 39.2 % 78.5 % 54.8 % 83.0 % 78.9 %
coreutils speed2 10.3 MB/s 13.0 MB/s 4.4 MB/s 1.3 MB/s 3.4 MB/s

n = 195.8 MB, σ = 237 cache misses3 55.1 % 73.0 % 51.7 % 82.3 % 72.6 %
Escherichia Coli speed2 9.4 MB/s 11.1 MB/s 4.1 MB/s 1.4 MB/s 3.0 MB/s

n = 107.5 MB, σ = 16 cache misses3 42.5 % 70.5 % 47.8 % 77.8 % 81.1 %

Table 6.4: Benchmark results of the Repetitive Corpus.

2 Construction speed: size of input/time to construct SA, in MB/s.

3 Cache miss rate: number of cache misses/number of cache accesses of last–level cache, in %.
4 http://web.eece.maine.edu/~vweaver/projects/perf_events/, last visited 10/2015.

48
The results clearly show that divsufsort and SAIS are playing in their own league:
Construction speed is about 2 to 3 times faster than that of KA, DC3 and GSACA.
Additionally, as Table 6.1 shows, the working space required by both of these algorithms
is significantly smaller than that of KA, DC3 and GSACA.
Among the latter three algorithms, KA performs best, followed by GSACA. The DC3
algorithm performs worst of all algorithms, but this is owed to a very simple implemen-
tation requiring only about 50 lines of code. There might exist better implementations,
but I wasn’t able to find one running in linear time. Therefore, the bad results of DC3
shouldn’t be overrated.
Let’s have a closer look at the algorithm presented in this thesis, GSACA. To be
honest, it performs quite poorly compared to the fastest algorithms presented here: it
requires a lot of extra working space, and is about 2 to 5 times slower than SAIS or
divsufsort. On the other hand, such results are not really surprising: the algorithm uses
a lot of dependent non–parallelizable memory accesses, affecting quite different memory
locations. As a direct conclusion, GSACA has the highest average cache miss rates of
all algorithms except for the DC3 algorithm. Also note that even for very small–sized
files (dickens from the Silesia Corpus), the cache miss rates produced by GSACA are
orders of magnitude higher than that of other algorithms.
Cache miss rates of course are not the only reason for bad performance—as one can
see, SAIS performs well despite high cache miss rates on bigger files—but high rates
in consumption with a big working–constant lead to long construction times. So truly,
compared to the state of the art, GSACA will not be used in real world applications at
the current stage—the competitors clearly outperform GSACA.

49
Chapter 7
Conclusion
This thesis has presented a new approach for linear-time suffix array construction. A
new suffix sorting principle was introduced, leading to the first non-recursive linear-time
suffix array construction algorithm, GSACA.
Unfortunately, GSACA cannot hold up with current state of the art suffix array
construction algorithms: Construction speed is about 2 to 5 times slower than that of
faster algorithms, and the extra working space consumption is quite large.
As a result, the algorithm presented in this thesis is interesting only from a theoretical
point of view, since it is the first non-recursive linear-time SACA. Nonetheless, the new
sorting principle may be used to design better linear-time algorithms. To give some
ideas at least, GSACA deals a lot with previous smaller and next smaller values, what
hints to a stack-based computation for next lexicographically smaller suffixes. The
group belonging of a suffix may be computed ’on the fly’, during the computation of
next lexicographically smaller suffixes. Finally, another representation of suffix groups
may leads to a cache-friendly implementation of the second phase, resulting in a much
faster algorithm. However, these ideas are suggestions, no specific description of a better
algorithm.
Summarizing, the results of this thesis are quite promising: Compared to the devel-
opmental history of the currently fastest linear-time suffix array construction algorithm
SA-IS, development of GSACA is in its infancy. Thus, there’s a lot of room for improve-
ments, so it’s worth using GSACA as a starting point for future research topics.

51
Bibliography
[1] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing Suffix Trees with En-
hanced Suffix Arrays. Journal of Discrete Algorithms, 2(1):53–86, 2004.

[2] U. Baier. GSACA Algorithm. https://github.com/waYne1337/gsaca. last vis-

ited 10/2015.

[3] M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm.

Technical Report 124, Digital Equipment Corporation, 1994.

[4] S. Deorowicz. Silesia Corpus. http://sun.aei.polsl.pl/~sdeor/index.php?

page=silesia. last visited 10/2015.

[5] P. Elias. Universal codeword sets and representations of the integers. IEEE Trans-
actions on Information Theory, 21(2):194–203, 1975.

[6] P. Ferragina and G. Manzini. Opportunistic Data Structures with Applications. In

Proceedings of the 41th Annual Symposium on Foundations of Computer Science,
FOCS ’00, pages 390–398, 2000.

[7] P. Ferragina and G. Navarro. Pizza & Chili Corpus. http://pizzachili.dcc.

uchile.cl/texts.html. last visited 10/2015.

[8] P. Ferragina and G. Navarro. Repetitive Corpus. http://pizzachili.dcc.

uchile.cl/repcorpus.html. last visited 10/2015.

[9] J.-L. Gailly and M. Adler. http://www.gzip.org/. last visited 7/2015.

[10] R. Grossi and G. F. Italiano. Suffix trees and their applications in string algorithms.
In Proceedings of the 1st South American Workshop on String Processing, pages 57–
76, 1993.

[11] W.-K. Hon, K. Sadakane, and W.-K. Sung. Breaking a Time-and-Space Barrier in
Constructing Full-Text Indices. In Proceedings of the 44th Annual IEEE Symposium
on Foundations of Computer Science, FOCS ’03, pages 251–260, 2003.

[12] J. Kärkkäinen and P. Sanders. Simple Linear Work Suffix Array Construction.
In Proceedings of the 30th International Conference on Automata, Languages and
Programming, ICALP ’03, pages 943–955, 2003.

[13] J. Kärkkäinen, P. Sanders, and S. Burkhardt. Linear Work Suffix Array Construc-
tion. Journal of the ACM, 53(6):918–936, 2006.

[14] D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time Construction of Suffix

Arrays. In Proceedings of the 14th Annual Conference on Combinatorial Pattern
Matching, CPM ’03, pages 186–199, 2003.

53
Bibliography

[15] P. Ko. Ko–Aluru Algorithm. https://sites.google.com/site/yuta256/KA.

tar.bz2. last visited 10/2015.

[16] P. Ko and S. Aluru. Space Efficient Linear Time Construction of Suffix Arrays.
In Proceedings of the 14th Annual Conference on Combinatorial Pattern Matching,
CPM ’03, pages 200–210, 2003.
[17] S. Kreft and G. Navarro. On Compressing and Indexing Repetitive Sequences.
Theoretical Computer Science, 483:115–133, 2013.

[18] J. Kärkkäinen, D. Kempa, and S. J. Puglisi. Linear time Lempel-Ziv factorization:

Simple, fast, small. In Proceedings of the 24th Annual Symposium on Combinatorial
Pattern Matching, CPM ’13, pages 189–200, 2013.
[19] J. Kärkkäinen, D. Kempa, and S. J. Puglisi. Parallel External Memory Suffix
Sorting. In Proceedings of the 26th Annual Symposium on Combinatorial Pattern
Matching, CPM ’15, pages 329–342, 2015.
[20] U. Manber and G. Myers. Suffix Arrays: A New Method for On-line String Searches.
In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms,
SODA ’90, pages 319–327, 1990.

[21] E. M. McCreight. A Space-Economical Suffix Tree Construction Algorithm. Journal

of the ACM, 23(2):262–272, 1976.
[22] Y. Mori. libdivsufsort. https://github.com/y-256/libdivsufsort. last visited
10/2015.
[23] Y. Mori. sais–lite–2.4.1. https://sites.google.com/site/yuta256/sais. last
visited 10/2015.
[24] Y. Mori. Suffix Array Construction Benchmark. https://code.google.com/p/
libdivsufsort/wiki/SACA_Benchmarks, 2008. last visited 7/2015.
[25] J. C. Na. Linear-Time Construction of Compressed Suffix Arrays Using O(N Log
N)-bit Working Space for Large Alphabets. In Proceedings of the 16th Annual
Conference on Combinatorial Pattern Matching, CPM ’05, pages 57–67, 2005.
[26] G. Nong, S. Zhang, and W. H. Chan. Linear Suffix Array Construction by Almost
Pure Induced-Sorting. In Proceedings of the 2009 Data Compression Conference,
DCC ’09, pages 193–202, 2009.

[27] G. Nong, S. Zhang, and W. H. Chan. Two Efficient Algorithms for Linear Time
Suffix Array Construction. IEEE Transactions on Computers, 60(10):1471–1484,
2011.
[28] S. J. Puglisi, W. F. Smyth, and A. H. Turpin. A Taxonomy of Suffix Array Con-
struction Algorithms. ACM Computational Survey, 39(2), 2007.

[29] P. Sanders. DC3 Algorithm. http://people.mpi-inf.mpg.de/~sanders/

programs/suffix/. last visited 10/2015.
[30] J. Seward. http://www.bzip.org/. last visited 7/2015.

54
[31] P. Weiner. Linear Pattern Matching Algorithms. In Proceedings of the 14th Annual
Symposium on Switching and Automata Theory, SWAT ’73, pages 1–11, 1973.

[32] J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE
Transactions on Information Theory, 23:337–343, 1977.

55
Name: Uwe Baier Matriculation Number: 721798

Declaration

I declare that I have developed and written the enclosed Master Thesis completely
by myself, and have not used sources or means without declaration in the text. Any
thoughts from others or literal quotations are clearly marked. The Master Thesis was
not used in the same or in a similar version to achieve an academic grading or is being
published elsewhere.

Ulm, the . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Uwe Baier

Microsoft Tree Questions
No ratings yet
Microsoft Tree Questions
29 pages
LIPIcs.CPM.2016.23
No ratings yet
LIPIcs.CPM.2016.23
12 pages
Linear Suffix Array Construction by Almost Pure Induced-Sorting
No ratings yet
Linear Suffix Array Construction by Almost Pure Induced-Sorting
10 pages
Suffix Array
No ratings yet
Suffix Array
71 pages
Better External Memory Suffix Array Construction-05
No ratings yet
Better External Memory Suffix Array Construction-05
14 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
22 pages
Suffix Arrays
No ratings yet
Suffix Arrays
20 pages
Generic Non-recursive Suffix Array Construction
No ratings yet
Generic Non-recursive Suffix Array Construction
42 pages
FM 072
No ratings yet
FM 072
20 pages
Seminar 2
No ratings yet
Seminar 2
20 pages
Better External Memory Suffix Array Construction: Roman Dementiev, Juha K Arkk Ainen, Jens Mehnert, Peter Sanders
No ratings yet
Better External Memory Suffix Array Construction: Roman Dementiev, Juha K Arkk Ainen, Jens Mehnert, Peter Sanders
12 pages
Lecture04_SuffixArray
No ratings yet
Lecture04_SuffixArray
5 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
No ratings yet
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
13 pages
2412.10160v1
No ratings yet
2412.10160v1
15 pages
Suffix Trees and Their Applications in String Algo
No ratings yet
Suffix Trees and Their Applications in String Algo
21 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
No ratings yet
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
78 pages
suffix
No ratings yet
suffix
29 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Programming-Assignment-3
No ratings yet
Programming-Assignment-3
17 pages
Longest Common Substring
No ratings yet
Longest Common Substring
33 pages
Current Challenges in Textual Databases: Gonzalo Navarro
No ratings yet
Current Challenges in Textual Databases: Gonzalo Navarro
44 pages
Week 4
No ratings yet
Week 4
18 pages
12_strings.v3
No ratings yet
12_strings.v3
111 pages
12 - Strings Matching
No ratings yet
12 - Strings Matching
111 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
193
No ratings yet
193
16 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
Tries
No ratings yet
Tries
17 pages
Suffix Trees, Suffix Arrays, and Their Applications
No ratings yet
Suffix Trees, Suffix Arrays, and Their Applications
29 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
Suffix Tree
No ratings yet
Suffix Tree
6 pages
Lecture4 - Indexing and Searching I
No ratings yet
Lecture4 - Indexing and Searching I
56 pages
Tutorial Suffix Tree
No ratings yet
Tutorial Suffix Tree
16 pages
Suffix Trees in Detail
No ratings yet
Suffix Trees in Detail
23 pages
10 5445ir1000085031
No ratings yet
10 5445ir1000085031
396 pages
10 TSP Exam Sol
No ratings yet
10 TSP Exam Sol
8 pages
Module 2-Data Structures and Algorithms For Retrieval-Cat1
No ratings yet
Module 2-Data Structures and Algorithms For Retrieval-Cat1
133 pages
09 SuffixTrees
No ratings yet
09 SuffixTrees
21 pages
Chapter 2- String Processing
No ratings yet
Chapter 2- String Processing
26 pages
02-3-sais
No ratings yet
02-3-sais
64 pages
Lec 6-String Processing
100% (1)
Lec 6-String Processing
25 pages
Naive String Matching
No ratings yet
Naive String Matching
2 pages
Suf Tree
No ratings yet
Suf Tree
6 pages
9 Suffix Trees: Tttta
No ratings yet
9 Suffix Trees: Tttta
9 pages
Chapter 3 - String Processing
No ratings yet
Chapter 3 - String Processing
28 pages
2. String Processing
No ratings yet
2. String Processing
19 pages
51 Stringsorts
No ratings yet
51 Stringsorts
69 pages
Application of Tries
No ratings yet
Application of Tries
38 pages
Exact String Matchin
No ratings yet
Exact String Matchin
7 pages
On-Line Construction of Suffix Trees
No ratings yet
On-Line Construction of Suffix Trees
18 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
No ratings yet
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
String Basics - JetBrains Academy - Learn Programming by Building Your Own Apps
No ratings yet
String Basics - JetBrains Academy - Learn Programming by Building Your Own Apps
5 pages
Unit 3
No ratings yet
Unit 3
34 pages
DFA
No ratings yet
DFA
33 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Concepts of Combinatorial Optimization
From Everand
Concepts of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
python-cheat-sheet
No ratings yet
python-cheat-sheet
1 page
Eloquent JavaScript
No ratings yet
Eloquent JavaScript
18 pages
Hmi Fire Fighting Robot
No ratings yet
Hmi Fire Fighting Robot
72 pages
General Coding Skills Evaluation Framework CodeSignal Skills Evaluation Lab Short
No ratings yet
General Coding Skills Evaluation Framework CodeSignal Skills Evaluation Lab Short
9 pages
[FREE PDF sample] Java Programs to Accompany Programming Logic and Design 7th Edition Jo Ann Smith ebooks
100% (3)
[FREE PDF sample] Java Programs to Accompany Programming Logic and Design 7th Edition Jo Ann Smith ebooks
72 pages
Microsoft Engineer SDE2 Handbook
No ratings yet
Microsoft Engineer SDE2 Handbook
59 pages
JAVA Course Notes
No ratings yet
JAVA Course Notes
67 pages
PPS unit-III PDF
No ratings yet
PPS unit-III PDF
161 pages
C-LAB-MANUAL
No ratings yet
C-LAB-MANUAL
72 pages
Icc Javascript
No ratings yet
Icc Javascript
153 pages
Daily Code DSA
No ratings yet
Daily Code DSA
20 pages
Java Complete Collection Framework
No ratings yet
Java Complete Collection Framework
28 pages
Java Program
No ratings yet
Java Program
36 pages
Drawing ROBOT Doc
No ratings yet
Drawing ROBOT Doc
68 pages
Ye 3
No ratings yet
Ye 3
9 pages
BCA II Semester
No ratings yet
BCA II Semester
19 pages
DS in 7 Hours
No ratings yet
DS in 7 Hours
346 pages
Practice R Progamming Lab Manual 2024-2025
No ratings yet
Practice R Progamming Lab Manual 2024-2025
75 pages
Tybsc Cs 367 Operating System Practical Slips Sem VI
No ratings yet
Tybsc Cs 367 Operating System Practical Slips Sem VI
31 pages
Comprog PreFi Reviewer
No ratings yet
Comprog PreFi Reviewer
10 pages
(Ebook) Concepts of Computer and C Programming by M.K. Sharma Dr. Krishan Kumar Goyal ISBN 9789380386409, 9380386400 - Read the ebook online or download it to own the complete version
100% (3)
(Ebook) Concepts of Computer and C Programming by M.K. Sharma Dr. Krishan Kumar Goyal ISBN 9789380386409, 9380386400 - Read the ebook online or download it to own the complete version
71 pages
Fortran 3days
No ratings yet
Fortran 3days
308 pages
Computer Science Syllabus RU
No ratings yet
Computer Science Syllabus RU
55 pages
InterviewSeries
No ratings yet
InterviewSeries
70 pages
Telemac Guide For Programming v6p0
No ratings yet
Telemac Guide For Programming v6p0
143 pages
Delphi Dictionary
No ratings yet
Delphi Dictionary
7 pages
First Year Syllabus Ggsipu 2021 Onwards
No ratings yet
First Year Syllabus Ggsipu 2021 Onwards
43 pages
Artificial Intelligence Unit 1 &2 Notes
No ratings yet
Artificial Intelligence Unit 1 &2 Notes
82 pages
MECA 241 PS Week 5 Exercises Q
No ratings yet
MECA 241 PS Week 5 Exercises Q
4 pages