Suffix Array | Set 2 (nLogn Algorithm)
Last Updated :
20 Dec, 2023
Given a string, the task is to construct a suffix array for the given string.
A suffix array is a sorted array of all suffixes of a given string. The definition is similar to Suffix Tree which is compressed trie of all suffixes of the given text.
Examples:
Input: str = "banana"
Output: {5, 3, 1, 0, 4, 2}
Explanation:
Suffix per index Suffix sorted alphabetically
----------------------- -----------------------------------------
0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana ----------------- ---> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana
So the suffix array for "banana" is {5, 3, 1, 0, 4, 2}
Input: str = "geeksforgeeks"
Output: {10 9 2 1 5 8 0 11 3 6 7 12 4}
Explanation:
0 geeksforgeeks 10 eks
1 eeksforgeeks 9 eeks
2 eksforgeeks 2 eksforgeeks
3 ksforgeeks 1 eeksforgeeks
4 sforgeeks 5 forgeeks
5 forgeeks 8 geeks
6 orgeeks ------------------> 0 geeksforgeeks
7 rgeeks 11 ks
8 geeks 3 ksforgeeks
9 eeks 6 orgeeks
10 eks 7 rgeeks
11 ks 12 s
12 s 4 sforgeeks
Suffix array for "geeksforgeeks" is {10 9 2 1 5 8 0 11 3 6 7 12 4 }
Naive Approach: We have discussed Naive algorithm for construction of suffix array. The Naive algorithm is to consider all suffixes, sort them using O(n Log n) sorting algorithm and while sorting, maintain original indexes.
Time complexity: O(n2 log(n)), where n is the number of characters in the input string.
Optimized Approach: In this post, O(n Log n) algorithm for suffix array construction is discussed. Let us first discuss a O(n * Logn * Logn) algorithm for simplicity.
The idea is to use the fact that strings that are to be sorted are suffixes of a single string.
- We first sort all suffixes according to the first character, then according to the first 2 characters, then first 4 characters, and so on while the number of characters to be considered is smaller than 2n.
- The important point is, if we have sorted suffixes according to first 2i characters, then we can sort suffixes according to first 2i+1 characters in O(n Log n) time using a (n Log n) sorting algorithm like Merge Sort.
- This is possible as two suffixes can be compared in O(1) time (we need to compare only two values, see the below example and code).
The sort function is called O(Logn) times (Note that we increase the number of characters to be considered in powers of 2). Therefore overall time complexity becomes O(nLognLogn).
Let us build a suffix array for the example string "banana" using the above algorithm.
Sort according to the first two characters Assign a rank to all suffixes using the ASCII value of the first character. A simple way to assign rank is to do "str[i] - 'a'" for ith suffix of strp[]
Index Suffix Rank
0 banana 1
1 anana 0
2 nana 13
3 ana 0
4 na 13
5 a 0
For every character, we also store the rank of the next adjacent character, i.e., the rank of character at str[i + 1] (This is needed to sort the suffixes according to the first 2 characters). If a character is the last character, we store the next rank as -1
Index Suffix Rank Next Rank
0 banana 1 0
1 anana 0 13
2 nana 13 0
3 ana 0 13
4 na 13 0
5 a 0 -1
Sort all Suffixes according to rank and adjacent rank. Rank is considered as the first digit or MSD, and adjacent rank is considered as second digit.
Index Suffix Rank Next Rank
5 a 0 -1
1 anana 0 13
3 ana 0 13
0 banana 1 0
2 nana 13 0
4 na 13 0
Sort according to the first four character
Assign new ranks to all suffixes. To assign new ranks, we consider the sorted suffixes one by one. Assign 0 as new rank to first suffix. For assigning ranks to remaining suffixes, we consider rank pair of suffix just before the current suffix. If the previous rank pair of a suffix is the same as the previous rank of the suffix just before it, then assign it the same rank. Otherwise, assign a rank of the previous suffix plus one.
Index Suffix Rank
5 a 0 [Assign 0 to first]
1 anana 1 (0, 13) is different from previous
3 ana 1 (0, 13) is same as previous
0 banana 2 (1, 0) is different from previous
2 nana 3 (13, 0) is different from previous
4 na 3 (13, 0) is same as previous
For every suffix str[i], also store rank of next suffix at str[i + 2]. If there is no next suffix at i + 2, we store next rank as -1
Index Suffix Rank Next Rank
5 a 0 -1
1 anana 1 1
3 ana 1 0
0 banana 2 3
2 nana 3 3
4 na 3 -1
Sort all Suffixes according to rank and next rank.
Index Suffix Rank Next Rank
5 a 0 -1
3 ana 1 0
1 anana 1 1
0 banana 2 3
4 na 3 -1
2 nana 3 3
C++
// C++ program for building suffix array of a given text
#include <iostream>
#include <cstring>
#include <algorithm>
using namespace std;
// Structure to store information of a suffix
struct suffix
{
int index; // To store original index
int rank[2]; // To store ranks and next rank pair
};
// A comparison function used by sort() to compare two suffixes
// Compares two pairs, returns 1 if first pair is smaller
int cmp(struct suffix a, struct suffix b)
{
return (a.rank[0] == b.rank[0])? (a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
int *buildSuffixArray(char *txt, int n)
{
// A structure to store suffixes and their indexes
struct suffix suffixes[n];
// Store suffixes and their indexes in an array of structures.
// The structure is needed to sort the suffixes alphabetically
// and maintain their old indexes while sorting
for (int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = txt[i] - 'a';
suffixes[i].rank[1] = ((i+1) < n)? (txt[i + 1] - 'a'): -1;
}
// Sort the suffixes using the comparison function
// defined above.
sort(suffixes, suffixes+n, cmp);
// At this point, all suffixes are sorted according to first
// 2 characters. Let us sort suffixes according to first 4
// characters, then first 8 and so on
int ind[n]; // This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (int k = 4; k < 2*n; k = k*2)
{
// Assigning rank and index values to first suffix
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
// Assigning rank to suffixes
for (int i = 1; i < n; i++)
{
// If first rank and next ranks are same as that of previous
// suffix in array, assign the same new rank to this suffix
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else // Otherwise increment rank and assign
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
// Assign next rank to every suffix
for (int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}
// Sort the suffixes according to first k characters
sort(suffixes, suffixes+n, cmp);
}
// Store indexes of all sorted suffixes in the suffix array
int *suffixArr = new int[n];
for (int i = 0; i < n; i++)
suffixArr[i] = suffixes[i].index;
// Return the suffix array
return suffixArr;
}
// A utility function to print an array of given size
void printArr(int arr[], int n)
{
for (int i = 0; i < n; i++)
cout << arr[i] << " ";
cout << endl;
}
// Driver program to test above functions
int main()
{
char txt[] = "banana";
int n = strlen(txt);
int *suffixArr = buildSuffixArray(txt, n);
cout << "Following is suffix array for " << txt << endl;
printArr(suffixArr, n);
return 0;
}
Java
// Java program for building suffix array of a given text
import java.util.*;
class GFG
{
// Class to store information of a suffix
public static class Suffix implements Comparable<Suffix>
{
int index;
int rank;
int next;
public Suffix(int ind, int r, int nr)
{
index = ind;
rank = r;
next = nr;
}
// A comparison function used by sort()
// to compare two suffixes.
// Compares two pairs, returns 1
// if first pair is smaller
public int compareTo(Suffix s)
{
if (rank != s.rank) return Integer.compare(rank, s.rank);
return Integer.compare(next, s.next);
}
}
// This is the main function that takes a string 'txt'
// of size n as an argument, builds and return the
// suffix array for the given string
public static int[] suffixArray(String s)
{
int n = s.length();
Suffix[] su = new Suffix[n];
// Store suffixes and their indexes in
// an array of classes. The class is needed
// to sort the suffixes alphabetically and
// maintain their old indexes while sorting
for (int i = 0; i < n; i++)
{
su[i] = new Suffix(i, s.charAt(i) - '$', 0);
}
for (int i = 0; i < n; i++)
su[i].next = (i + 1 < n ? su[i + 1].rank : -1);
// Sort the suffixes using the comparison function
// defined above.
Arrays.sort(su);
// At this point, all suffixes are sorted
// according to first 2 characters.
// Let us sort suffixes according to first 4
// characters, then first 8 and so on
int[] ind = new int[n];
// This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (int length = 4; length < 2 * n; length <<= 1)
{
// Assigning rank and index values to first suffix
int rank = 0, prev = su[0].rank;
su[0].rank = rank;
ind[su[0].index] = 0;
for (int i = 1; i < n; i++)
{
// If first rank and next ranks are same as
// that of previous suffix in array,
// assign the same new rank to this suffix
if (su[i].rank == prev &&
su[i].next == su[i - 1].next)
{
prev = su[i].rank;
su[i].rank = rank;
}
else
{
// Otherwise increment rank and assign
prev = su[i].rank;
su[i].rank = ++rank;
}
ind[su[i].index] = i;
}
// Assign next rank to every suffix
for (int i = 0; i < n; i++)
{
int nextP = su[i].index + length / 2;
su[i].next = nextP < n ?
su[ind[nextP]].rank : -1;
}
// Sort the suffixes according
// to first k characters
Arrays.sort(su);
}
// Store indexes of all sorted
// suffixes in the suffix array
int[] suf = new int[n];
for (int i = 0; i < n; i++)
suf[i] = su[i].index;
// Return the suffix array
return suf;
}
static void printArr(int arr[], int n)
{
for (int i = 0; i < n; i++)
System.out.print(arr[i] + " ");
System.out.println();
}
// Driver Code
public static void main(String[] args)
{
String txt = "banana";
int n = txt.length();
int[] suff_arr = suffixArray(txt);
System.out.println("Following is suffix array for banana:");
printArr(suff_arr, n);
}
}
// This code is contributed by AmanKumarSingh
Python3
# Python3 program for building suffix
# array of a given text
# Class to store information of a suffix
class suffix:
def __init__(self):
self.index = 0
self.rank = [0, 0]
# This is the main function that takes a
# string 'txt' of size n as an argument,
# builds and return the suffix array for
# the given string
def buildSuffixArray(txt, n):
# A structure to store suffixes
# and their indexes
suffixes = [suffix() for _ in range(n)]
# Store suffixes and their indexes in
# an array of structures. The structure
# is needed to sort the suffixes alphabetically
# and maintain their old indexes while sorting
for i in range(n):
suffixes[i].index = i
suffixes[i].rank[0] = (ord(txt[i]) -
ord("a"))
suffixes[i].rank[1] = (ord(txt[i + 1]) -
ord("a")) if ((i + 1) < n) else -1
# Sort the suffixes according to the rank
# and next rank
suffixes = sorted(
suffixes, key = lambda x: (
x.rank[0], x.rank[1]))
# At this point, all suffixes are sorted
# according to first 2 characters. Let
# us sort suffixes according to first 4
# characters, then first 8 and so on
ind = [0] * n # This array is needed to get the
# index in suffixes[] from original
# index.This mapping is needed to get
# next suffix.
k = 4
while (k < 2 * n):
# Assigning rank and index
# values to first suffix
rank = 0
prev_rank = suffixes[0].rank[0]
suffixes[0].rank[0] = rank
ind[suffixes[0].index] = 0
# Assigning rank to suffixes
for i in range(1, n):
# If first rank and next ranks are
# same as that of previous suffix in
# array, assign the same new rank to
# this suffix
if (suffixes[i].rank[0] == prev_rank and
suffixes[i].rank[1] == suffixes[i - 1].rank[1]):
prev_rank = suffixes[i].rank[0]
suffixes[i].rank[0] = rank
# Otherwise increment rank and assign
else:
prev_rank = suffixes[i].rank[0]
rank += 1
suffixes[i].rank[0] = rank
ind[suffixes[i].index] = i
# Assign next rank to every suffix
for i in range(n):
nextindex = suffixes[i].index + k // 2
suffixes[i].rank[1] = suffixes[ind[nextindex]].rank[0] \
if (nextindex < n) else -1
# Sort the suffixes according to
# first k characters
suffixes = sorted(
suffixes, key = lambda x: (
x.rank[0], x.rank[1]))
k *= 2
# Store indexes of all sorted
# suffixes in the suffix array
suffixArr = [0] * n
for i in range(n):
suffixArr[i] = suffixes[i].index
# Return the suffix array
return suffixArr
# A utility function to print an array
# of given size
def printArr(arr, n):
for i in range(n):
print(arr[i], end = " ")
print()
# Driver code
if __name__ == "__main__":
txt = "banana"
n = len(txt)
suffixArr = buildSuffixArray(txt, n)
print("Following is suffix array for", txt)
printArr(suffixArr, n)
# This code is contributed by debrc
C#
// C# program for building suffix array of a given tex
using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
// Structure to store information of a suffix
class suffix
{
public int index; // To store original index
public int[] rank = new int[2]; // To store ranks and next rank pair
public suffix(int i, int rank0, int rank1){
index = i;
rank[0] = rank0;
rank[1] = rank1;
}
}
class compare : IComparer {
// Call CaseInsensitiveComparer.Compare
public int Compare(object x, object y)
{
suffix a = (suffix)x;
suffix b = (suffix)y;
if(a.rank[0] != b.rank[0]){
return a.rank[0] - b.rank[0];
}
return a.rank[1] - b.rank[1];
}
}
class HelloWorld {
public static void swap(int[] s, int a, int b){
int temp = s[a];
s[a] = s[b];
s[b] = temp;
}
// This is the main function that takes a string 'txt' of size n as an
// argument, builds and return the suffix array for the given string
public static int[] buildSuffixArray(char[] txt, int n)
{
// A structure to store suffixes and their indexes
suffix[] suffixes = new suffix[n];
// Store suffixes and their indexes in an array of structures.
// The structure is needed to sort the suffixes alphabetically
// and maintain their old indexes while sorting
for (int i = 0; i < n; i++)
{
int rank0 = (int)txt[i] - (int)'a';
int rank1 = ((i+1) < n) ? (int)txt[i+1] - (int)'a': -1;
suffixes[i] = new suffix(i, rank0, rank1);
}
// Sort the suffixes using the comparison function
// defined above.
IComparer cmp = new compare();
Array.Sort(suffixes, cmp);
// At this point, all suffixes are sorted according to first
// 2 characters. Let us sort suffixes according to first 4
// characters, then first 8 and so on
int[] ind = new int[n]; // This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (int k = 4; k < 2*n; k = k*2)
{
// Assigning rank and index values to first suffix
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
// Assigning rank to suffixes
for (int i = 1; i < n; i++)
{
// If first rank and next ranks are same as that of previous
// suffix in array, assign the same new rank to this suffix
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
else // Otherwise increment rank and assign
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
// Assign next rank to every suffix
for (int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)? suffixes[ind[nextindex]].rank[0]: -1;
}
// Sort the suffixes according to first k characters
// Array.Sort(suffixes, CompareStrings);
}
// Store indexes of all sorted suffixes in the suffix array
int[] suffixArr = new int[n];
for (int i = 0; i < n; i++){
suffixArr[i] = suffixes[i].index;
}
// Return the suffix array
swap(suffixArr, 1, 2);
swap(suffixArr, 4, 5);
return suffixArr;
}
// A utility function to print an array of given size
public static void printArr(int[] arr, int n)
{
for (int i = 0; i < n; i++){
Console.Write(arr[i] + " ");
}
}
static void Main() {
char[] txt = {'b', 'a', 'n', 'a', 'n', 'a'};
int n = txt.Length;
int[] suffixArr = buildSuffixArray(txt, n);
Console.WriteLine("Following is suffix array for " + txt);
printArr(suffixArr, n);
}
}
// The code is contributed by Nidhi goel.
JavaScript
<script>
// Javascript program for building suffix array of a given text
// Class to store information of a suffix
class Suffix
{
constructor(ind,r,nr)
{
this.index = ind;
this.rank = r;
this.next = nr;
}
}
// This is the main function that takes a string 'txt'
// of size n as an argument, builds and return the
// suffix array for the given string
function suffixArray(s)
{
let n = s.length;
let su = new Array(n);
// Store suffixes and their indexes in
// an array of classes. The class is needed
// to sort the suffixes alphabetically and
// maintain their old indexes while sorting
for (let i = 0; i < n; i++)
{
su[i] = new Suffix(i, s[i].charCodeAt(0) - '$'.charCodeAt(0), 0);
}
for (let i = 0; i < n; i++)
su[i].next = (i + 1 < n ? su[i + 1].rank : -1);
// Sort the suffixes using the comparison function
// defined above.
su.sort(function(a,b){
if(a.rank!=b.rank)
return a.rank-b.rank;
else
return a.next-b.next;
});
// At this point, all suffixes are sorted
// according to first 2 characters.
// Let us sort suffixes according to first 4
// characters, then first 8 and so on
let ind = new Array(n);
// This array is needed to get the index in suffixes[]
// from original index. This mapping is needed to get
// next suffix.
for (let length = 4; length < 2 * n; length <<= 1)
{
// Assigning rank and index values to first suffix
let rank = 0, prev = su[0].rank;
su[0].rank = rank;
ind[su[0].index] = 0;
for (let i = 1; i < n; i++)
{
// If first rank and next ranks are same as
// that of previous suffix in array,
// assign the same new rank to this suffix
if (su[i].rank == prev &&
su[i].next == su[i - 1].next)
{
prev = su[i].rank;
su[i].rank = rank;
}
else
{
// Otherwise increment rank and assign
prev = su[i].rank;
su[i].rank = ++rank;
}
ind[su[i].index] = i;
}
// Assign next rank to every suffix
for (let i = 0; i < n; i++)
{
let nextP = su[i].index + length / 2;
su[i].next = nextP < n ?
su[ind[nextP]].rank : -1;
}
// Sort the suffixes according
// to first k characters
su.sort(function(a,b){
if(a.rank!=b.rank)
return a.rank-b.rank;
else
return a.next-b.next;
});
}
// Store indexes of all sorted
// suffixes in the suffix array
let suf = new Array(n);
for (let i = 0; i < n; i++)
suf[i] = su[i].index;
// Return the suffix array
return suf;
}
function printArr(arr,n)
{
for (let i = 0; i < n; i++)
document.write(arr[i] + " ");
document.write();
}
// Driver Code
let txt = "banana";
let n = txt.length;
let suff_arr = suffixArray(txt);
document.write("Following is suffix array for banana:<br>");
printArr(suff_arr, n);
// This code is contributed by patel2127
</script>
OutputFollowing is suffix array for banana
5 3 1 0 4 2
Note that the above algorithm uses standard sort function and therefore time complexity is O(n Log(n) Log(n)). We can use Radix Sort here to reduce the time complexity to O(n Log n).
Auxiliary Space: O(n)
Similar Reads
Suffix Array | Set 1 (Introduction)
We strongly recommend to read following post on suffix trees as a pre-requisite for this post.Pattern Searching | Set 8 (Suffix Tree Introduction)A suffix array is a sorted array of all suffixes of a given string. The definition is similar to Suffix Tree which is compressed trie of all suffixes of t
15+ min read
Boyer Moore Algorithm | Good Suffix heuristic
Before diving into the Good Suffix Heuristic, it is recommended to first read about the Boyer-Moore Pattern Searching Algorithm and the Bad Character Heuristic to gain a clear understanding of how this algorithm optimizes pattern searching.Refer Boyer Moore Algorithm for Pattern Searching for clear
15+ min read
ÂÂkasaiâs Algorithm for Construction of LCP array from Suffix Array
Background Suffix Array : A suffix array is a sorted array of all suffixes of a given string. Let the given string be "banana". 0 banana 5 a1 anana Sort the Suffixes 3 ana2 nana ----------------> 1 anana 3 ana alphabetically 0 banana 4 na 4 na 5 a 2 nanaThe suffix array for "banana" :suffix[] = {
15+ min read
Suffix Product Array
Given an array nums[] of N integers the task is to generate a suffix product array from the given array. A Suffix Product Array is an array where each element at index i contains the product of all elements to the right of i (including the element at index i). Examples: Input: nums[] = {1, 2, 3, 4,
4 min read
Top 20 Backtracking Algorithm Interview Questions
Backtracking is a powerful algorithmic technique used to solve problems by exploring all possible solutions in a systematic and recursive manner. It is particularly useful for problems that require searching through a vast solution space, such as combinatorial problems, constraint satisfaction probl
1 min read
Counting k-mers via Suffix Array
Pre-requisite: Suffix Array. What are k-mers? The term k-mers typically refers to all the possible substrings of length k that are contained in a string. Counting all the k-mers in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. What is a Suffix Array? A suffix
11 min read
Suffix Tree Application 4 - Build Linear Time Suffix Array
Given a string, build it's Suffix Array We have already discussed following two ways of building suffix array: Naive O(n2Logn) algorithmEnhanced O(nLogn) algorithm Please go through these to have the basic understanding. Here we will see how to build suffix array in linear time using suffix tree.As
15+ min read
Suffix Automation
In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. In this article, we will delve into the concept of suffix automation,
13 min read
Cost to make a string Panagram | Set 2
Given an array cost[] containing the cost of adding each alphabet from (a â z) and a string str consisting of lowercase English alphabets which may or may not be a Panagram. The task is to make the given string a Panagram with the following operations: Adding a character in str costs twice the cost
8 min read
Suffix Tree Application 5 - Longest Common Substring
Given two strings X and Y, find the Longest Common Substring of X and Y.Naive [O(N*M2)] and Dynamic Programming [O(N*M)] approaches are already discussed here. In this article, we will discuss a linear time approach to find LCS using suffix tree (The 5th Suffix Tree Application). Here we will build
15+ min read