Python - Bigrams Frequency in String
Last Updated :
12 Apr, 2023
Sometimes while working with Python Data, we can have problem in which we need to extract bigrams from string. This has application in NLP domains. But sometimes, we need to compute the frequency of unique bigram for data collection. The solution to this problem can be useful. Lets discuss certain ways in which this task can be performed.
Method #1 : Using Counter() + generator expression The combination of above functions can be used to solve this problem. In this, we compute the frequency using Counter() and bigram computation using generator expression and string slicing.
Python3
# Python3 code to demonstrate working of
# Bigrams Frequency in String
# Using Counter() + generator expression
from collections import Counter
# initializing string
test_str = 'geeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# Bigrams Frequency in String
# Using Counter() + generator expression
res = Counter(test_str[idx : idx + 2] for idx in range(len(test_str) - 1))
# printing result
print("The Bigrams Frequency is : " + str(dict(res)))
Output :
The original string is : geeksforgeeks The Bigrams Frequency is : {'ee': 2, 'ks': 2, 'ek': 2, 'sf': 1, 'fo': 1, 'ge': 2, 'rg': 1, 'or': 1}
Method #2 : Using Counter() + zip() + map() + join The combination of above functions can also be used to solve this problem. In this, we perform the task of constructing bigrams using zip() + map() + join.
Python3
# Python3 code to demonstrate working of
# Bigrams Frequency in String
# Using Counter() + zip() + map() + join
from collections import Counter
# initializing string
test_str = 'geeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# Bigrams Frequency in String
# Using Counter() + zip() + map() + join
res = Counter(map(''.join, zip(test_str, test_str[1:])))
# printing result
print("The Bigrams Frequency is : " + str(dict(res)))
Output :
The original string is : geeksforgeeks The Bigrams Frequency is : {'ee': 2, 'ks': 2, 'ek': 2, 'sf': 1, 'fo': 1, 'ge': 2, 'rg': 1, 'or': 1}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 3: use a loop and a dictionary to keep track of the bigram frequencies.
- Initialize an empty dictionary to keep track of the bigram frequencies.
- Loop through the characters in the input string, starting from the second character.
- For each character, get the previous character and concatenate them to form a bigram.
- Check if the bigram is already in the dictionary.
- If the bigram is not in the dictionary, add it with a frequency of 1.
- If the bigram is already in the dictionary, increment its frequency by 1.
- Print the bigram frequencies.
Python3
# Python3 code to demonstrate working of
# Bigrams Frequency in String
# Using a loop and dictionary
# initializing string
test_str = 'geeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# Bigrams Frequency in String
# Using a loop and dictionary
freq_dict = {}
for i in range(1, len(test_str)):
bigram = test_str[i-1:i+1]
if bigram in freq_dict:
freq_dict[bigram] += 1
else:
freq_dict[bigram] = 1
# printing result
print("The Bigrams Frequency is : " + str(freq_dict))
OutputThe original string is : geeksforgeeks
The Bigrams Frequency is : {'ge': 2, 'ee': 2, 'ek': 2, 'ks': 2, 'sf': 1, 'fo': 1, 'or': 1, 'rg': 1}
Time complexity: O(n), where n is the length of the input string.
Auxiliary space: O(k), where k is the number of unique bigrams in the input string.
Method #4 : Using count() method
Approach
- Initiated a for loop to append all the bigrams of string test_str to a list x using slicing, create an empty dictionary freq_dict
- Initiated another for loop to create a dictionary with values of list x(bigrams ) as keys and count of each bigram in test_str as values
- Display the dictionary
Python3
# Python3 code to demonstrate working of
# Bigrams Frequency in String
# Using a loop and dictionary
# initializing string
test_str = 'geeksforgeeks'
# printing original string
print("The original string is : " + str(test_str))
# Bigrams Frequency in String
# Using a loop and dictionary
freq_dict = {}
x=[]
for i in range(1, len(test_str)):
bigram = test_str[i-1:i+1]
x.append(bigram)
for i in x:
freq_dict[i]=test_str.count(i)
# printing result
print("The Bigrams Frequency is : " + str(freq_dict))
OutputThe original string is : geeksforgeeks
The Bigrams Frequency is : {'ge': 2, 'ee': 2, 'ek': 2, 'ks': 2, 'sf': 1, 'fo': 1, 'or': 1, 'rg': 1}
Time Complexity : O(N) N - length of bigrams list
Auxiliary Space : O(N) N - length of dictionary freq_dict
Similar Reads
Python - All substrings Frequency in String Given a String, extract all unique substrings with their frequency. Input : test_str = "ababa" Output : {'a': 3, 'ab': 2, 'aba': 2, 'abab': 1, 'ababa': 1, 'b': 2, 'ba': 2, 'bab': 1, 'baba': 1} Explanation : All substrings with their frequency extracted. Input : test_str = "GFGF" Output : {'G': 2, 'G
5 min read
Maximum Frequency Character in String - Python The task of finding the maximum frequency character in a string involves identifying the character that appears the most number of times. For example, in the string "hello world", the character 'l' appears the most frequently (3 times).Using collection.CounterCounter class from the collections modul
3 min read
Python - List Words Frequency in String Given a List of Words, Map frequency of each to occurrence in String. Input : test_str = 'geeksforgeeks is best for geeks and best for CS', count_list = ['best', 'geeksforgeeks', 'computer'] Output : [2, 1, 0] Explanation : best has 2 occ., geeksforgeeks 1 and computer is not present in string.Input
4 min read
Frequency of Numbers in String - Python We are given a string and we have to determine how many numeric characters (digits) are present in the given string. For example: "Hello123World456" has 6 numeric characters (1, 2, 3, 4, 5, 6).Using re.findall() re.findall() function from the re module is a powerful tool that can be used to match sp
3 min read
Python - Least Frequent Character in String The task is to find the least frequent character in a string, we count how many times each character appears and pick the one with the lowest count.Using collections.CounterThe most efficient way to do this is by using collections.Counter which counts character frequencies in one go and makes it eas
3 min read