0% found this document useful (0 votes)
7 views

54.string 2notes

Uploaded by

SUBHASIS SAMANTA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

54.string 2notes

Uploaded by

SUBHASIS SAMANTA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

String II

Manacher’s Algorithm (Longest palindromic substring)

Problem statement:

Given a string, find the longest substring which is a palindrome.

Approach:

We can use multiple methods to solve this problem and we can easily solve it using
brute force also but it will require O(N2) time. To optimize this we have an
algorithm
Manachaer’s Algorithm using which we can solve the above problem in just linear
time O(n). which is quite more efficient than the brute force approach. The main
idea behind this algorithm is to avoid unnecessary computation and recomputation
by storing the useful data of the previous computation. Manacher’s algorithm is
probably considered complex to understand, so here we will discuss it in a detailed
way.

Algorithm:

Let’s take a string=”acacacd” to understand the algorithm more clearly. First of all,
let's understand the difference between a palindrome of odd length and even
length string. An odd length string has a center but an even length string doesn't.
To get rid of that we can insert some # in between every character of the string
because it won’t affect the palidromacity of the string and also help us to generate

1
center in even length substrings. According to this, we will modify the above string
and it will become “#a#c#a#c#a#c#d#”.

● We will take 3 variables i, c, r, and an array arr of length 2*n+1 and initialize
it with 0.
○ i = current index.
○ c = center of the last longest palindromic string (LPS).
○ r = rightmost element of the last longest palindromic string (LPS).
○ arr[i] = length of longest palindrome taking i.
● To avoid recomputation we can check If there is a palindrome of some length
2*r centered at any position c, then we may not need to compare all
characters on the left and right side at position c+1. We already calculated
LPS at positions before c and they can help to avoid some of the
comparisons after position c.

● At i = 0, there are no LPS at all (no character on the left side to compare), so
the length of LPS will be 0.
● At i = 1, LPS is a, so the length of LPS will be 1.
● At i = 2, there are no LPS at all (left and right characters a and c don’t match),
so the length of LPS will be 0.
● At i = 3, LPS is aca, so the length of LPS will be 3.
● At i = 4, there are no LPS at all (left and right characters c and a don’t match),
so the length of LPS will be 0.
● At i = 5, LPS is acaca, so the length of LPS will be 5.

.
.
And so on.

2
We can store all these palindromic lengths in an array, say arr. Then string S
and LPS Length arr look like below:

String (S) # a # c # a # c # a # c # d

Position 0 1 2 3 4 5 6 7 8 9 10 11 12 13
(i)

LPS 0 1 0 3 0 5 0 5 0 3 0 1 0 0

In LPS Array arr:

● LPS length value at odd positions (the actual character positions) will
be odd and greater than or equal to 1 (1 will come from the center
character itself if nothing else matches in left and right side of it)
● LPS length value at even positions (the positions between two
characters, extreme left and right positions) will be even and greater
than or equal to 0 (0 will come when there is no match in left and right
side)

Position and index for the string are two different things here. For a given
string S of length N, indexes will be from 0 to N-1 (total N indexes) and
positions will be from 0 to 2*N (total 2*N+1 positions).

LPS length value can be interpreted in two ways, one in terms of index and
second in terms of position. LPS value d at position i (arr[i] = d) tells that:

● Substring from position i-d to i+d is a palindrome of length d (in terms


of position)
● Substring from index (i-d)/2 to [(i+d)/2 – 1] is a palindrome of length d
(in terms of index)

3
Code:

#include <iostream>
#include <bits/stdc++.h>
#define ll long long
using namespace std;

string modify(string s){


string newString="";
for(int i=0;i<s.size();i++){
newString+=(char)('#');
newString+=(char)(s[i]);
}
newString+=(char)('#');
return newString;
}

vector<int> manchers(string s){


int n = s.size();
vector<int>arr(n);

int center=0,r=0;
for(int i=1;i<s.size()-1;i++){
int other = 2*center -i;
if(i<r) arr[i] = min(arr[other],r-i);
while(i-arr[i]-1>=1 && i+arr[i]+1<n-1 && s[i+arr[i]+1] ==
s[i-arr[i]-1]) arr[i]++;
if(i+arr[i]>r){
center=i;
r=i+arr[i];
}
}
return arr;
}

void solve(){
string text;
cin>>text;
text = modify(text);
int tn = text.size();

4
vector<int>arr = manchers(text);

cout<<*max_element(arr.begin(),arr.end())<<"\n";
}

int main(){
solve();
return 0;
}

Time Complexity: O(N), where N is the length of the string.


Space Complexity: O(N), where N is the length of the string.

Longest common prefix (LCP using suffix array)

Problem statement:

Find the longest common prefix from an array of all the suffixes.

Introduction of suffix array:

A suffix array is a sorted collection of all the suffixes in a string. Let's say the string
is "banana."

0 banana 5a

1 anana Sort the Suffixes 3 ana

2 nana ----------------> 1 anana

3 ana alphabetically 0 banana

5
4 na 4 na

5a 2 nana

The suffix array for “banana” : suffix[] = {5, 3, 1, 0, 4, 2}.

LCP Array is an array of size n (like Suffix Array). A value LCP[i] indicates length of
the longest common prefix of the suffixes indexed by suffix[i] and suffix[i+1].
suffix[n-1] is not defined as there is no suffix after it.

Example:

str[0..n-1] = "banana", suffix[] = {5, 3, 1, 0, 4, 2} , lcp[] = {1, 3, 0, 0, 2, 0}.

Suffixes represented by suffix array in order are:

{"a", "ana", "anana", "banana", "na", "nana"}

lcp[0] = Longest Common Prefix of "a" and "ana" =1

lcp[1] = Longest Common Prefix of "ana" and "anana" = 3

lcp[2] = Longest Common Prefix of "anana" and "banana" = 0

lcp[3] = Longest Common Prefix of "banana" and "na" = 0

lcp[4] = Longest Common Prefix of "na" and "nana" = 2

lcp[5] = Longest Common Prefix of "nana" and None = 0

Approach:

6
We first compute the LCP of the first suffix in text which is “banana“. We need the
next suffix in the suffix array to compute LCP (Remember lcp[i] is defined as the
Longest Common Prefix of suffix[i] and suffix[i+1]). To find the next suffix in
suffixArr[], we use InvSuff[]. The next suffix is “na”. Since there is no common
prefix between “banana” and “na”, the value of LCP for “banana” is 0 and it is at
index 3 in the suffix array, so we fill lcp[3] as 0.

Next, we compute the LCP of the second suffix which is “anana“. The next suffix of
“anana” in the suffix array is “banana”. Since there is no common prefix, the value
of LCP for “anana” is 0 and it is at index 2 in the suffix array, so we fill lcp[2] as 0.

Next, we compute the LCP of the third suffix which is “nana“. Since there is no next
suffix, the value of LCP for “nana” is not defined. We fill lcp[5] as 0.

The next suffix in the text is “ana”. The next suffix of “ana” in the suffix array is
“anana”. Since there is a common prefix of length 3, the value of LCP for “ana” is 3.
We fill lcp[1] as 3.

Now we lcp for the next suffix in text which is “na“. This is where Kasai’s algorithm
uses the trick that the LCP value must be at least 2 because the previous LCP value
was 3. Since there is no character after “na”, the final value of LCP is 2. We fill lcp[4]
as 2.

Next suffix in the text is “a“. LCP value must be at least 1 because the previous
value was 2. Since there is no character after “a”, the final value of LCP is 1. We fill
lcp[0] as 1.

Algorithm:
Compute the LCP array as a byproduct to the suffix array.

● Use an already constructed suffix array in order to compute the LCP values.

7
● Take the strings in order from the suffix array and then find the index of this
string in the sorted suffix array.
● Now we will match this string with the string below to it and store the length
of the common prefix in a variable i.e K.
● Now we move to the next string the suffix array and do the same thing but
instead of matching all the characters from the start we will take the K-1th
character and start comparing.
● We will repeat the above steps for all the strings in the array and in the end
we will return the maximum value of K achieved.

Code:

#include <bits/stdc++.h>
using namespace std;

struct suffix
{
int index;
int rank[2];
};

int cmp(struct suffix a, struct suffix b)


{
return (a.rank[0] == b.rank[0])?
(a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}

vector<int> buildSuffixArray(string txt, int n)


{

struct suffix suffixes[n];


for (int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = (int)txt[i];
suffixes[i].rank[1] = ((i+1) < n)?

8
(int)txt[i + 1]: -1;
}

sort(suffixes, suffixes+n, cmp);


int ind[n];
for (int k = 4; k < 2*n; k = k*2)
{

int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;

for (int i = 1; i < n; i++)


{

if (suffixes[i].rank[0] == prev_rank &&


suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}

else
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}

for (int i = 0; i < n; i++)


{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}

9
sort(suffixes, suffixes+n, cmp);
}

vector<int>suffixArr;
for (int i = 0; i < n; i++)
suffixArr.push_back(suffixes[i].index);

return suffixArr;
}

vector<int> kasai(string txt, vector<int> suffixArr)


{
int n = suffixArr.size();

vector<int> lcp(n, 0);

vector<int> invSuff(n, 0);

for (int i=0; i < n; i++)


invSuff[suffixArr[i]] = i;

int k = 0;

for (int i=0; i<n; i++)


{
if (invSuff[i] == n-1)
{
k = 0;
continue;
}

int j = suffixArr[invSuff[i]+1];

while (i+k<n && j+k<n && txt[i+k]==txt[j+k])


k++;

lcp[invSuff[i]] = k;

10
if (k>0)
k--;
}

return lcp;
}

void solve(){
string s;
cin>>s;
int n = s.length();

vector<int> suffixArr = buildSuffixArray(s, n);


vector<int> lcp = kasai(s, suffixArr);

cout<<"suffix array ";


for(auto i : suffixArr) cout<<i<<" ";
cout<<"\n";
cout<<"lcp array ";
for(auto i : lcp) cout<<i<<" ";
cout<<"\n";
}

int main(){
solve();
}

Time Complexity:

O(N), where N is the length of the string.

Space Complexity:

O(N), where N is the length of the string.

11
Live Problems:

Palindromes and Indexes:

Problem statement:

You are given a string S consisting of lowercase characters only, an index ‘i’ and
length ‘len’. Your task is to find the count of all palindromic substrings in the string
‘S’ which start at index ‘i’ and have a length of at least ‘len’.

A string is called palindromic if it reads the same backward as forward. For


example, "aba" is a palindrome but "abaab" is not a palindrome.

Solution:

We will use the idea of Manacher's Algorithm to compute the length of the largest
palindrome starting at each index. We transform the string by adding ‘#’ between
every character so that every palindrome is now of odd length and the original
length can be found out by taking the floor(L/2) of the new length.

For every index, in the string, we check for the palindrome centered at this index
and check if it covers our index ‘i’ and has a length of at least ‘LEN’. If yes we
increment the answer by 1.

12
Algorithm:

● palindromesAtIndex(‘S’, ‘i’, ‘LEN’) takes a string, index, and length as input


and returns the number of palindromic substrings starting at index ‘i’ with
length at least ‘LEN’.
● We first transform the string using the transform function.
● We then calculate the manacher’s array P for this new string.
● Start at index i and keep the other pointer j at index i + ‘LEN’ - 1.
● Check if the value of P[j] is greater than or equal to ‘LEN’ and covers the
index i.
● If yes, increment the answer.
● Increment the pointer to the next index
● Return the final answer.
● transform(S) takes a string S as input and returns the string with ‘#’ after
every character and before the first character.
● manacher(S) takes a string S as input and returns the manacher’s array of
the string S where each index of the array stores the length of the longest
palindrome centered at i.

Code:

#include <iostream>
#include <bits/stdc++.h>
#define ll long long
using namespace std;

string modify(string s){


string newString="";
for(int i=0;i<s.size();i++){
newString+=(char)('#');
newString+=(char)(s[i]);
}

13
newString+=(char)('#');
return newString;
}

vector<int> manchers(string s){


int n = s.size();
vector<int>arr(n);

int center=0,r=0;
for(int i=1;i<s.size()-1;i++){
int other = 2*center -i;
if(i<r) arr[i] = min(arr[other],r-i);
while(i-arr[i]-1>=1 && i+arr[i]+1<n-1 && s[i+arr[i]+1] ==
s[i-arr[i]-1]) arr[i]++;
if(i+arr[i]>r){
center=i;
r=i+arr[i];
}
}
return arr;
}

void solve(){
string text;
cin>>text;
int ind,len;
cin>>ind>>len;

text = modify(text);
int ans=0;

ind--;
vector<int>arr = manchers(text);

int newInd = 2*ind+1;


for(int i=2*ind+len;i<text.size();i++){
if(i-arr[i] <= newInd){
ans++;
}
}

14
cout<<ans<<"\n";
return;
}

int main(){
int t;
cin>>t;
while(t--){
solve();
}
return 0;
}

Time Complexity:

O(N), where N is the length of the string in the input.

We spend O(N) time in transforming an O(N) time manacher’s algorithm. Also for
each index in the array we only check for it once. Hence time complexity becomes
O(N + N + N) = O(N).

Space Complexity:

O(N), Where N is the length of the string in the input.

Since we require O(N) extra space for calculating the manacher’s array.

15
Distinct Substrings:

Problem Statement:

Given a string 'S', you are supposed to return the number of distinct
substrings(including empty substring) of the given string.

Solution:

There are multiple ways to solve this problem. Like by using Hashmap, Trie, and

suffix array. We are going to use the suffix array to solve this problem as we are just

learned it. The main idea behind using a suffix array is that we will construct an LCP

array and from that array, we will count the number of common substrings. After

this we just simply subtract the number of common strings with the total number

of substrings ((n*(n+1))/2) and get the answer.

Algorithm:

● First of all, we will build a suffix array (you can refer to the above notes for
that).
● After building the suffix array we will construct a LCP array (Longest
common prefix). Which will give us the length of the common prefixes.
● Now, the length of the common prefix will also be equal to the number of
common substrings between two prefixes.
● Using the above information we can find the total number of common
substrings, which will be equal to the sum of the LCP array.

16
● In the end, to calculate the answer we simply subtract the sum of the LCP
array from the total number of substrings (n*(n+1))/2 - sum of LCP array.

Code:

#include <bits/stdc++.h>
using namespace std;

struct suffix
{
int index,rank[2];

};

int cmp(struct suffix a, struct suffix b)


{
return (a.rank[0] == b.rank[0])?
(a.rank[1] < b.rank[1] ?1: 0):
(a.rank[0] < b.rank[0] ?1: 0);
}

vector<int> buildSuffixArray(string txt, int n)


{

struct suffix suffixes[n];


for (int i = 0; i < n; i++)
{
suffixes[i].index = i;
suffixes[i].rank[0] = (int)txt[i];
suffixes[i].rank[1] = ((i+1) < n)?
(int)txt[i + 1]: -1;
}
sort(suffixes, suffixes+n, cmp);

int ind[n]; // This array is needed to get the

for (int k = 4; k < 2*n; k = k*2)


{

17
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
for (int i = 1; i < n; i++)
{
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}

else // Otherwise increment rank and assign


{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
for (int i = 0; i < n; i++)
{
int nextindex = suffixes[i].index + k/2;
suffixes[i].rank[1] = (nextindex < n)?
suffixes[ind[nextindex]].rank[0]: -1;
}

sort(suffixes, suffixes+n, cmp);


}
vector<int>suffixArr;
for (int i = 0; i < n; i++)
suffixArr.push_back(suffixes[i].index);
return suffixArr;
}
vector<int> kasai(string txt, vector<int> suffixArr)
{
int n = suffixArr.size();
vector<int> lcp(n, 0);
vector<int> invSuff(n, 0);
for (int i=0; i < n; i++)

18
invSuff[suffixArr[i]] = i;
int k = 0;
for (int i=0; i<n; i++)
{
if (invSuff[i] == n-1)
{
k = 0;
continue;
}
int j = suffixArr[invSuff[i]+1];
while (i+k<n && j+k<n && txt[i+k]==txt[j+k])
k++;

lcp[invSuff[i]] = k; // lcp for the present suffix.

if (k>0)
k--;
}

return lcp;
}

void solve(){
string s;
cin>>s;
int n = s.length();
vector<int> suffixArr = buildSuffixArray(s, n);
vector<int> lcp = kasai(s, suffixArr);
int result = (n*(n+1))/2;
for(auto i : lcp) result-=i;
cout<<result<<"\n";
}

int main(){
int t;
cin>>t;
while(t--){
solve();
}
}

19
Time Complexity:

O(N*(logN)), where N is the length of the string in the input.

We spend O(N*logN) time to construct a suffix array using Kasai Algorithm. And
after that we will construct an LCP array and calculate the sum of LCP array which
will take O(2N) time. So the over all time complexity will be

O(N*logN) + O(2N) = O(N*logN)

Space Complexity:

O(N), Where N is the length of the string in the input.

Since we require O(N) extra space for construct and store the suffix array.

20

You might also like