Python | Remove all duplicates words from a given sentence
Last Updated :
30 Dec, 2024
Goal is to process a sentence such that all duplicate words are removed, leaving only the first occurrence of each word. Final output should maintain the order of the words as they appeared in the original sentence. Let's understand how to achieve the same using different methods:
Using set with join()
A set in Python is a collection that automatically removes duplicate values. This method is the simplest and fastest. However, set does not maintain order hence, it's not ideal for situations where maintaining order is intended.
Python
s1 = "Geeks for Geeks"
s2 = s1.split() # Split the sentence into words
# Convert the list to a set and back to a list to remove duplicates
s3 = list(set(s2))
# Join the list back into a sentence
s4 = ' '.join(s3)
print(s4)
Explanation:
- sentence s1 = "Geeks for Geeks" is split into individual words using split() method. This results in the list s2 = ["Geeks", "for", "Geeks"].
- list s2 is converted into a set using set(s2). This automatically removes duplicates because sets do not allow repeated elements. The set will be {"Geeks", "for"}.
- Then, set is converted back into a list with list(set(s2)), which results in the list s3 = ["Geeks", "for"]. Note that the order of elements may not be preserved when converting from a set back to a list.
Other methods that we can use to remove all duplicates words from a given sentence are:
Using List Comprehension with Set
This method uses list comprehension which is a concise way to create lists. We also use a set to track the words we have already seen. By doing this we can efficiently remove duplicates and maintain the original order of the words in the sentence.
Python
s1 = "Geeks for Geeks"
a = s1.split() # Split the sentence into words
seen = set() # Set to track unique words
# Use list comprehension to filter out duplicates while maintaining order
res = [word for word in a if not (word in seen or seen.add(word))]
# Join the list back into a sentence
s2 = ' '.join(res)
print(s2)
Explanation:
- An empty set seen is created to keep track of the words that have already been encountered.
- list comprehension is used to iterate over a. The condition not (word in seen or seen.add(word)) ensures each word is added only the first time it appears, while maintaining the original order.
- The unique words are joined back into a string with ' '.join(res), resulting in s2 = "Geeks for".
Using dict.fromkeys()
In this method we use dictionaries. In Python 3.7 and later dictionaries remember the order in which items are added. By using dict.fromkeys() we can remove duplicates while keeping the order of the words intact.
Python
s1 = "Geeks for Geeks"
s2 = s1.split() # Split the sentence into words
# Use a dictionary to remove duplicates and preserve order
s3 = list(dict.fromkeys(s2))
# Join the list back into a sentence
s4 = ' '.join(s3)
print(s4)
Explanation:
- dict.fromkeys(s2) creates a dictionary where each word in s2 becomes a key. Since dictionaries do not allow duplicate keys, this automatically removes any duplicates while preserving the order. Converting the dictionary back to a list with list() gives s3 = ["Geeks", "for"].
Using Simple Loop
This method is the most basic way of removing duplicates but it can be slower for longer sentences. It loops through each word checks if it's already been added to a result list and adds it only if it hasn't appeared before.
Python
s1 = "Geeks for Geeks"
s2 = s1.split() # Split the sentence into words
res = [] # List to store unique words
# Loop through words and add only unique ones to the result
for word in s2:
if word not in res:
res.append(word)
# Join the list back into a sentence
s3 = ' '.join(res)
print(s3)
Example:
- An empty list res is initialized to store the unique words.
- for loop iterates through each word in s2. If the word is not already in res, it is appended to res. This ensures that only the first occurrence of each word is added.
Remove all duplicates words from a given sentence
Similar Reads
Remove All Duplicates from a Given String in Python The task of removing all duplicates from a given string in Python involves retaining only the first occurrence of each character while preserving the original order. Given an input string, the goal is to eliminate repeated characters and return a new string with unique characters. For example, with
2 min read
Ways to remove duplicates from list in Python In this article, we'll learn several ways to remove duplicates from a list in Python. The simplest way to remove duplicates is by converting a list to a set.Using set()We can use set() to remove duplicates from the list. However, this approach does not preserve the original order.Pythona = [1, 2, 2,
2 min read
Ways to remove duplicates from list in Python In this article, we'll learn several ways to remove duplicates from a list in Python. The simplest way to remove duplicates is by converting a list to a set.Using set()We can use set() to remove duplicates from the list. However, this approach does not preserve the original order.Pythona = [1, 2, 2,
2 min read
Ways to remove duplicates from list in Python In this article, we'll learn several ways to remove duplicates from a list in Python. The simplest way to remove duplicates is by converting a list to a set.Using set()We can use set() to remove duplicates from the list. However, this approach does not preserve the original order.Pythona = [1, 2, 2,
2 min read
Python | Remove consecutive duplicates from list Removing consecutive duplicates from a list means eliminating repeated elements that appear next to each other in the list. If an element repeats consecutively, only the first occurrence should remain and the duplicates should be removed.Example:Input: ['a', 'a', 'b', 'b', 'c', 'a', 'a', 'a']Output:
3 min read
Python | Remove consecutive duplicates from list Removing consecutive duplicates from a list means eliminating repeated elements that appear next to each other in the list. If an element repeats consecutively, only the first occurrence should remain and the duplicates should be removed.Example:Input: ['a', 'a', 'b', 'b', 'c', 'a', 'a', 'a']Output:
3 min read