54.string 2notes
54.string 2notes
Problem statement:
Approach:
We can use multiple methods to solve this problem and we can easily solve it using
brute force also but it will require O(N2) time. To optimize this we have an
algorithm
Manachaer’s Algorithm using which we can solve the above problem in just linear
time O(n). which is quite more efficient than the brute force approach. The main
idea behind this algorithm is to avoid unnecessary computation and recomputation
by storing the useful data of the previous computation. Manacher’s algorithm is
probably considered complex to understand, so here we will discuss it in a detailed
way.
Algorithm:
Let’s take a string=”acacacd” to understand the algorithm more clearly. First of all,
let's understand the difference between a palindrome of odd length and even
length string. An odd length string has a center but an even length string doesn't.
To get rid of that we can insert some # in between every character of the string
because it won’t affect the palidromacity of the string and also help us to generate
1
center in even length substrings. According to this, we will modify the above string
and it will become “#a#c#a#c#a#c#d#”.
● We will take 3 variables i, c, r, and an array arr of length 2*n+1 and initialize
it with 0.
○ i = current index.
○ c = center of the last longest palindromic string (LPS).
○ r = rightmost element of the last longest palindromic string (LPS).
○ arr[i] = length of longest palindrome taking i.
● To avoid recomputation we can check If there is a palindrome of some length
2*r centered at any position c, then we may not need to compare all
characters on the left and right side at position c+1. We already calculated
LPS at positions before c and they can help to avoid some of the
comparisons after position c.
● At i = 0, there are no LPS at all (no character on the left side to compare), so
the length of LPS will be 0.
● At i = 1, LPS is a, so the length of LPS will be 1.
● At i = 2, there are no LPS at all (left and right characters a and c don’t match),
so the length of LPS will be 0.
● At i = 3, LPS is aca, so the length of LPS will be 3.
● At i = 4, there are no LPS at all (left and right characters c and a don’t match),
so the length of LPS will be 0.
● At i = 5, LPS is acaca, so the length of LPS will be 5.
.
.
And so on.
2
We can store all these palindromic lengths in an array, say arr. Then string S
and LPS Length arr look like below:
String (S) # a # c # a # c # a # c # d
Position 0 1 2 3 4 5 6 7 8 9 10 11 12 13
(i)
LPS 0 1 0 3 0 5 0 5 0 3 0 1 0 0
● LPS length value at odd positions (the actual character positions) will
be odd and greater than or equal to 1 (1 will come from the center
character itself if nothing else matches in left and right side of it)
● LPS length value at even positions (the positions between two
characters, extreme left and right positions) will be even and greater
than or equal to 0 (0 will come when there is no match in left and right
side)
Position and index for the string are two different things here. For a given
string S of length N, indexes will be from 0 to N-1 (total N indexes) and
positions will be from 0 to 2*N (total 2*N+1 positions).
LPS length value can be interpreted in two ways, one in terms of index and
second in terms of position. LPS value d at position i (arr[i] = d) tells that:
3
Code:
#include <iostream>
#include <bits/stdc++.h>
#define ll long long
using namespace std;
int center=0,r=0;
for(int i=1;i<s.size()-1;i++){
int other = 2*center -i;
if(i<r) arr[i] = min(arr[other],r-i);
while(i-arr[i]-1>=1 && i+arr[i]+1<n-1 && s[i+arr[i]+1] ==
s[i-arr[i]-1]) arr[i]++;
if(i+arr[i]>r){
center=i;
r=i+arr[i];
}
}
return arr;
}
void solve(){
string text;
cin>>text;
text = modify(text);
int tn = text.size();
4
vector<int>arr = manchers(text);
cout<<*max_element(arr.begin(),arr.end())<<"\n";
}
int main(){
solve();
return 0;
}
Problem statement:
Find the longest common prefix from an array of all the suffixes.
A suffix array is a sorted collection of all the suffixes in a string. Let's say the string
is "banana."
0 banana 5a
5
4 na 4 na
5a 2 nana
LCP Array is an array of size n (like Suffix Array). A value LCP[i] indicates length of
the longest common prefix of the suffixes indexed by suffix[i] and suffix[i+1].
suffix[n-1] is not defined as there is no suffix after it.
Example:
Approach:
6
We first compute the LCP of the first suffix in text which is “banana“. We need the
next suffix in the suffix array to compute LCP (Remember lcp[i] is defined as the
Longest Common Prefix of suffix[i] and suffix[i+1]). To find the next suffix in
suffixArr[], we use InvSuff[]. The next suffix is “na”. Since there is no common
prefix between “banana” and “na”, the value of LCP for “banana” is 0 and it is at
index 3 in the suffix array, so we fill lcp[3] as 0.
Next, we compute the LCP of the second suffix which is “anana“. The next suffix of
“anana” in the suffix array is “banana”. Since there is no common prefix, the value
of LCP for “anana” is 0 and it is at index 2 in the suffix array, so we fill lcp[2] as 0.
Next, we compute the LCP of the third suffix which is “nana“. Since there is no next
suffix, the value of LCP for “nana” is not defined. We fill lcp[5] as 0.
The next suffix in the text is “ana”. The next suffix of “ana” in the suffix array is
“anana”. Since there is a common prefix of length 3, the value of LCP for “ana” is 3.
We fill lcp[1] as 3.
Now we lcp for the next suffix in text which is “na“. This is where Kasai’s algorithm
uses the trick that the LCP value must be at least 2 because the previous LCP value
was 3. Since there is no character after “na”, the final value of LCP is 2. We fill lcp[4]
as 2.
Next suffix in the text is “a“. LCP value must be at least 1 because the previous
value was 2. Since there is no character after “a”, the final value of LCP is 1. We fill
lcp[0] as 1.
Algorithm:
Compute the LCP array as a byproduct to the suffix array.
● Use an already constructed suffix array in order to compute the LCP values.
7
● Take the strings in order from the suffix array and then find the index of this
string in the sorted suffix array.
● Now we will match this string with the string below to it and store the length
of the common prefix in a variable i.e K.
● Now we move to the next string the suffix array and do the same thing but
instead of matching all the characters from the start we will take the K-1th
character and start comparing.
● We will repeat the above steps for all the strings in the array and in the end
we will return the maximum value of K achieved.
Code:
#include <bits/stdc++.h>
using namespace std;
struct suffix
{
int index;
int rank[2];
};
8
(int)txt[i + 1]: -1;
}
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
else
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = ++rank;
}
ind[suffixes[i].index] = i;
}
9
sort(suffixes, suffixes+n, cmp);
}
vector<int>suffixArr;
for (int i = 0; i < n; i++)
suffixArr.push_back(suffixes[i].index);
return suffixArr;
}
int k = 0;
int j = suffixArr[invSuff[i]+1];
lcp[invSuff[i]] = k;
10
if (k>0)
k--;
}
return lcp;
}
void solve(){
string s;
cin>>s;
int n = s.length();
int main(){
solve();
}
Time Complexity:
Space Complexity:
11
Live Problems:
Problem statement:
You are given a string S consisting of lowercase characters only, an index ‘i’ and
length ‘len’. Your task is to find the count of all palindromic substrings in the string
‘S’ which start at index ‘i’ and have a length of at least ‘len’.
Solution:
We will use the idea of Manacher's Algorithm to compute the length of the largest
palindrome starting at each index. We transform the string by adding ‘#’ between
every character so that every palindrome is now of odd length and the original
length can be found out by taking the floor(L/2) of the new length.
For every index, in the string, we check for the palindrome centered at this index
and check if it covers our index ‘i’ and has a length of at least ‘LEN’. If yes we
increment the answer by 1.
12
Algorithm:
Code:
#include <iostream>
#include <bits/stdc++.h>
#define ll long long
using namespace std;
13
newString+=(char)('#');
return newString;
}
int center=0,r=0;
for(int i=1;i<s.size()-1;i++){
int other = 2*center -i;
if(i<r) arr[i] = min(arr[other],r-i);
while(i-arr[i]-1>=1 && i+arr[i]+1<n-1 && s[i+arr[i]+1] ==
s[i-arr[i]-1]) arr[i]++;
if(i+arr[i]>r){
center=i;
r=i+arr[i];
}
}
return arr;
}
void solve(){
string text;
cin>>text;
int ind,len;
cin>>ind>>len;
text = modify(text);
int ans=0;
ind--;
vector<int>arr = manchers(text);
14
cout<<ans<<"\n";
return;
}
int main(){
int t;
cin>>t;
while(t--){
solve();
}
return 0;
}
Time Complexity:
We spend O(N) time in transforming an O(N) time manacher’s algorithm. Also for
each index in the array we only check for it once. Hence time complexity becomes
O(N + N + N) = O(N).
Space Complexity:
Since we require O(N) extra space for calculating the manacher’s array.
15
Distinct Substrings:
Problem Statement:
Given a string 'S', you are supposed to return the number of distinct
substrings(including empty substring) of the given string.
Solution:
There are multiple ways to solve this problem. Like by using Hashmap, Trie, and
suffix array. We are going to use the suffix array to solve this problem as we are just
learned it. The main idea behind using a suffix array is that we will construct an LCP
array and from that array, we will count the number of common substrings. After
this we just simply subtract the number of common strings with the total number
Algorithm:
● First of all, we will build a suffix array (you can refer to the above notes for
that).
● After building the suffix array we will construct a LCP array (Longest
common prefix). Which will give us the length of the common prefixes.
● Now, the length of the common prefix will also be equal to the number of
common substrings between two prefixes.
● Using the above information we can find the total number of common
substrings, which will be equal to the sum of the LCP array.
16
● In the end, to calculate the answer we simply subtract the sum of the LCP
array from the total number of substrings (n*(n+1))/2 - sum of LCP array.
Code:
#include <bits/stdc++.h>
using namespace std;
struct suffix
{
int index,rank[2];
};
17
int rank = 0;
int prev_rank = suffixes[0].rank[0];
suffixes[0].rank[0] = rank;
ind[suffixes[0].index] = 0;
for (int i = 1; i < n; i++)
{
if (suffixes[i].rank[0] == prev_rank &&
suffixes[i].rank[1] == suffixes[i-1].rank[1])
{
prev_rank = suffixes[i].rank[0];
suffixes[i].rank[0] = rank;
}
18
invSuff[suffixArr[i]] = i;
int k = 0;
for (int i=0; i<n; i++)
{
if (invSuff[i] == n-1)
{
k = 0;
continue;
}
int j = suffixArr[invSuff[i]+1];
while (i+k<n && j+k<n && txt[i+k]==txt[j+k])
k++;
if (k>0)
k--;
}
return lcp;
}
void solve(){
string s;
cin>>s;
int n = s.length();
vector<int> suffixArr = buildSuffixArray(s, n);
vector<int> lcp = kasai(s, suffixArr);
int result = (n*(n+1))/2;
for(auto i : lcp) result-=i;
cout<<result<<"\n";
}
int main(){
int t;
cin>>t;
while(t--){
solve();
}
}
19
Time Complexity:
We spend O(N*logN) time to construct a suffix array using Kasai Algorithm. And
after that we will construct an LCP array and calculate the sum of LCP array which
will take O(2N) time. So the over all time complexity will be
Space Complexity:
Since we require O(N) extra space for construct and store the suffix array.
20