Longest substring with K unique characters using Binary Search
Last Updated :
25 Apr, 2023
Given a string str and an integer K, the task is to print the length of the longest possible substring that has exactly K unique characters. If there is more than one substring of the longest possible length, then print any one of them or print -1 if there is no such substring possible.
Examples:
Input: str = "aabacbebebe", K = 3
Output: 7
"cbebebe" is the required substring.
Input: str = "aabc", K = 4
Output: -1
Approach: An approach to solve this problem has been discussed in this article. In this article, a binary search based approach will be discussed. Binary search will be applied to the length of the substring which has at least K unique characters. Let's say we try for length len and check whether a substring of size len is there which is having at least k unique characters. If it is possible, then try to maximize the size by searching for this length to the maximum possible length, i.e. the size of the input string. If it is not possible, then search for a lower size len.
To check that the length given by binary search will have k unique characters, a set can be used to insert all the characters, and then if the size of the set is less than k then the answer is not possible, else the answer given by the binary search is the max answer.
Binary search is applicable here because it is known if for some len the answer is possible and we want to maximize the len so the search domain changes and we search from this len to n.
Below is the implementation of the above approach:
C++
// C++ implementation of the approach
#include <bits/stdc++.h>
using namespace std;
// Function that returns true if there
// is a substring of length len
// with <=k unique characters
bool isValidLen(string s, int len, int k)
{
// Size of the string
int n = s.size();
// Map to store the characters
// and their frequency
unordered_map<char, int> mp;
int right = 0;
// Update the map for the
// first substring
while (right < len) {
mp[s[right]]++;
right++;
}
if (mp.size() <= k)
return true;
// Check for the rest of the substrings
while (right < n) {
// Add the new character
mp[s[right]]++;
// Remove the first character
// of the previous window
mp[s[right - len]]--;
// Update the map
if (mp[s[right - len]] == 0)
mp.erase(s[right - len]);
if (mp.size() <= k)
return true;
right++;
}
return mp.size() <= k;
}
// Function to return the length of the
// longest substring which has K
// unique characters
int maxLenSubStr(string s, int k)
{
// Check if the complete string
// contains K unique characters
set<char> uni;
for (auto x : s)
uni.insert(x);
if (uni.size() < k)
return -1;
// Size of the string
int n = s.size();
// Apply binary search
int lo = -1, hi = n + 1;
while (hi - lo > 1) {
int mid = lo + hi >> 1;
if (isValidLen(s, mid, k))
lo = mid;
else
hi = mid;
}
return lo;
}
// Driver code
int main()
{
string s = "aabacbebebe";
int k = 3;
cout << maxLenSubStr(s, k);
return 0;
}
Java
// Java implementation of the approach
import java.util.*;
class GFG
{
// Function that returns true if there
// is a subString of length len
// with <=k unique characters
static boolean isValidLen(String s,
int len, int k)
{
// Size of the String
int n = s.length();
// Map to store the characters
// and their frequency
Map<Character,
Integer> mp = new HashMap<Character,
Integer>();
int right = 0;
// Update the map for the
// first subString
while (right < len)
{
if (mp.containsKey(s.charAt(right)))
{
mp.put(s.charAt(right),
mp.get(s.charAt(right)) + 1);
}
else
{
mp.put(s.charAt(right), 1);
}
right++;
}
if (mp.size() <= k)
return true;
// Check for the rest of the subStrings
while (right < n)
{
// Add the new character
if (mp.containsKey(s.charAt(right)))
{
mp.put(s.charAt(right),
mp.get(s.charAt(right)) + 1);
}
else
{
mp.put(s.charAt(right), 1);
}
// Remove the first character
// of the previous window
if (mp.containsKey(s.charAt(right - len)))
{
mp.put(s.charAt(right - len),
mp.get(s.charAt(right - len)) - 1);
}
// Update the map
if (mp.get(s.charAt(right - len)) == 0)
mp.remove(s.charAt(right - len));
if (mp.size() <= k)
return true;
right++;
}
return mp.size() <= k;
}
// Function to return the length of the
// longest subString which has K
// unique characters
static int maxLenSubStr(String s, int k)
{
// Check if the complete String
// contains K unique characters
Set<Character> uni = new HashSet<Character>();
for (Character x : s.toCharArray())
uni.add(x);
if (uni.size() < k)
return -1;
// Size of the String
int n = s.length();
// Apply binary search
int lo = -1, hi = n + 1;
while (hi - lo > 1)
{
int mid = lo + hi >> 1;
if (isValidLen(s, mid, k))
lo = mid;
else
hi = mid;
}
return lo;
}
// Driver code
public static void main(String[] args)
{
String s = "aabacbebebe";
int k = 3;
System.out.print(maxLenSubStr(s, k));
}
}
// This code is contributed by Rajput-Ji
Python3
# Python3 implementation of the approach
# Function that returns True if there
# is a sub of length len
# with <=k unique characters
def isValidLen(s, lenn, k):
# Size of the
n = len(s)
# Map to store the characters
# and their frequency
mp = dict()
right = 0
# Update the map for the
# first sub
while (right < lenn):
mp[s[right]] = mp.get(s[right], 0) + 1
right += 1
if (len(mp) <= k):
return True
# Check for the rest of the subs
while (right < n):
# Add the new character
mp[s[right]] = mp.get(s[right], 0) + 1
# Remove the first character
# of the previous window
mp[s[right - lenn]] -= 1
# Update the map
if (mp[s[right - lenn]] == 0):
del mp[s[right - lenn]]
if (len(mp) <= k):
return True
right += 1
return len(mp)<= k
# Function to return the length of the
# longest sub which has K
# unique characters
def maxLenSubStr(s, k):
# Check if the complete
# contains K unique characters
uni = dict()
for x in s:
uni[x] = 1
if (len(uni) < k):
return -1
# Size of the
n = len(s)
# Apply binary search
lo = -1
hi = n + 1
while (hi - lo > 1):
mid = lo + hi >> 1
if (isValidLen(s, mid, k)):
lo = mid
else:
hi = mid
return lo
# Driver code
s = "aabacbebebe"
k = 3
print(maxLenSubStr(s, k))
# This code is contributed by Mohit Kumar
C#
// C# implementation of the approach
using System;
using System.Collections.Generic;
class GFG
{
// Function that returns true if there
// is a subString of length len
// with <=k unique characters
static bool isValidLen(String s,
int len, int k)
{
// Size of the String
int n = s.Length;
// Map to store the characters
// and their frequency
Dictionary<char,
int> mp = new Dictionary<char,
int>();
int right = 0;
// Update the map for the
// first subString
while (right < len)
{
if (mp.ContainsKey(s[right]))
{
mp[s[right]] = mp[s[right]] + 1;
}
else
{
mp.Add(s[right], 1);
}
right++;
}
if (mp.Count <= k)
return true;
// Check for the rest of the subStrings
while (right < n)
{
// Add the new character
if (mp.ContainsKey(s[right]))
{
mp[s[right]] = mp[s[right]] + 1;
}
else
{
mp.Add(s[right], 1);
}
// Remove the first character
// of the previous window
if (mp.ContainsKey(s[right - len]))
{
mp[s[right - len]] = mp[s[right - len]] - 1;
}
// Update the map
if (mp[s[right - len]] == 0)
mp.Remove(s[right - len]);
if (mp.Count <= k)
return true;
right++;
}
return mp.Count <= k;
}
// Function to return the length of the
// longest subString which has K
// unique characters
static int maxLenSubStr(String s, int k)
{
// Check if the complete String
// contains K unique characters
HashSet<char> uni = new HashSet<char>();
foreach (char x in s.ToCharArray())
uni.Add(x);
if (uni.Count < k)
return -1;
// Size of the String
int n = s.Length;
// Apply binary search
int lo = -1, hi = n + 1;
while (hi - lo > 1)
{
int mid = lo + hi >> 1;
if (isValidLen(s, mid, k))
lo = mid;
else
hi = mid;
}
return lo;
}
// Driver code
public static void Main(String[] args)
{
String s = "aabacbebebe";
int k = 3;
Console.Write(maxLenSubStr(s, k));
}
}
// This code is contributed by Rajput-Ji
JavaScript
<script>
// Javascript implementation of the approach
// Function that returns true if there
// is a substring of length len
// with <=k unique characters
function isValidLen(s, len, k)
{
// Size of the string
var n = s.length;
// Map to store the characters
// and their frequency
var mp = new Map();
var right = 0;
// Update the map for the
// first substring
while (right < len) {
if(mp.has(s[right]))
mp.set(s[right],mp.get(s[right])+1)
else
mp.set(s[right], 1)
right++;
}
if (mp.size <= k)
return true;
// Check for the rest of the substrings
while (right < n) {
// Add the new character
if(mp.has(s[right]))
mp.set(s[right],mp.get(s[right])+1)
else
mp.set(s[right], 1)
// Remove the first character
// of the previous window
if(mp.has(s[right - len]))
mp.set(s[right - len], mp.get(s[right - len])-1)
// Update the map
if(mp.has(s[right - len]) && mp.get(s[right - len])==0)
mp.delete(s[right - len]);
if (mp.size <= k)
return true;
right++;
}
return mp.size <= k;
}
// Function to return the length of the
// longest substring which has K
// unique characters
function maxLenSubStr(s, k)
{
// Check if the complete string
// contains K unique characters
var uni = new Set();
s.split('').forEach(x => {
uni.add(x);
});
if (uni.size < k)
return -1;
// Size of the string
var n = s.length;
// Apply binary search
var lo = -1, hi = n + 1;
while (hi - lo > 1) {
var mid = lo + hi >> 1;
if (isValidLen(s, mid, k))
lo = mid;
else
hi = mid;
}
return lo;
}
// Driver code
var s = "aabacbebebe";
var k = 3;
document.write( maxLenSubStr(s, k));
</script>
The time complexity of the given program is O(N*logN), where N is the length of the input string. This is because the program uses binary search to find the maximum length of a substring with K unique characters, and the isValidLen function checks whether a substring of a given length has at most K unique characters.
The space complexity of the given program is O(N), where N is the length of the input string. This is because the program uses an unordered_map to keep track of the frequency of each character in the current substring, and the size of this map can be at most the number of unique characters in the input string, which is O(N) in the worst case. Additionally, the program uses a set to check whether the input string contains at least K unique characters, and the size of this set can also be at most O(N).
Similar Reads
DSA Tutorial - Learn Data Structures and Algorithms
DSA (Data Structures and Algorithms) is the study of organizing data efficiently using data structures like arrays, stacks, and trees, paired with step-by-step procedures (or algorithms) to solve problems effectively. Data structures manage how data is stored and accessed, while algorithms focus on
7 min read
C++ Programming Language
C++ is a computer programming language developed by Bjarne Stroustrup as an extension of the C language. It is known for is fast speed, low level memory management and is often taught as first programming language. It provides:Hands-on application of different programming concepts.Similar syntax to
5 min read
Quick Sort
QuickSort is a sorting algorithm based on the Divide and Conquer that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array. It works on the principle of divide and conquer, breaking down the problem into s
12 min read
Merge Sort - Data Structure and Algorithms Tutorials
Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows the divide-and-conquer approach. It works by recursively dividing the input array into two halves, recursively sorting the two halves and finally merging them back together to obtain the sorted array. Merge
14 min read
Breadth First Search or BFS for a Graph
Given a undirected graph represented by an adjacency list adj, where each adj[i] represents the list of vertices connected to vertex i. Perform a Breadth First Search (BFS) traversal starting from vertex 0, visiting vertices from left to right according to the adjacency list, and return a list conta
15+ min read
Bubble Sort Algorithm
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in the wrong order. This algorithm is not suitable for large data sets as its average and worst-case time complexity are quite high.We sort the array using multiple passes. After the fir
8 min read
Insertion Sort Algorithm
Insertion sort is a simple sorting algorithm that works by iteratively inserting each element of an unsorted list into its correct position in a sorted portion of the list. It is like sorting playing cards in your hands. You split the cards into two groups: the sorted cards and the unsorted cards. T
9 min read
Binary Search Algorithm - Iterative and Recursive Implementation
Binary Search Algorithm is a searching algorithm used in a sorted array by repeatedly dividing the search interval in half. The idea of binary search is to use the information that the array is sorted and reduce the time complexity to O(log N). Binary Search AlgorithmConditions to apply Binary Searc
15 min read
Data Structures Tutorial
Data structures are the fundamental building blocks of computer programming. They define how data is organized, stored, and manipulated within a program. Understanding data structures is very important for developing efficient and effective algorithms. What is Data Structure?A data structure is a st
2 min read
Dijkstra's Algorithm to find Shortest Paths from a Source to all
Given a weighted undirected graph represented as an edge list and a source vertex src, find the shortest path distances from the source vertex to all other vertices in the graph. The graph contains V vertices, numbered from 0 to V - 1.Note: The given graph does not contain any negative edge. Example
12 min read