Open In App

Python | Remove all duplicates words from a given sentence

Last Updated : 30 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Goal is to process a sentence such that all duplicate words are removed, leaving only the first occurrence of each word. Final output should maintain the order of the words as they appeared in the original sentence. Let's understand how to achieve the same using different methods:

Using set with join()

A set in Python is a collection that automatically removes duplicate values. This method is the simplest and fastest. However, set does not maintain order hence, it's not ideal for situations where maintaining order is intended.

Python
s1 = "Geeks for Geeks"
s2 = s1.split()  # Split the sentence into words

# Convert the list to a set and back to a list to remove duplicates
s3 = list(set(s2))

# Join the list back into a sentence
s4 = ' '.join(s3)
print(s4)

Output
for Geeks

Explanation:

  • sentence s1 = "Geeks for Geeks" is split into individual words using split() method. This results in the list s2 = ["Geeks", "for", "Geeks"].
  • list s2 is converted into a set using set(s2). This automatically removes duplicates because sets do not allow repeated elements. The set will be {"Geeks", "for"}.
  • Then, set is converted back into a list with list(set(s2)), which results in the list s3 = ["Geeks", "for"]. Note that the order of elements may not be preserved when converting from a set back to a list.

Other methods that we can use to remove all duplicates words from a given sentence are:

Using List Comprehension with Set

This method uses list comprehension which is a concise way to create lists. We also use a set to track the words we have already seen. By doing this we can efficiently remove duplicates and maintain the original order of the words in the sentence.

Python
s1 = "Geeks for Geeks"
a = s1.split()  # Split the sentence into words
seen = set()  # Set to track unique words

# Use list comprehension to filter out duplicates while maintaining order
res = [word for word in a if not (word in seen or seen.add(word))]

# Join the list back into a sentence
s2 = ' '.join(res)
print(s2)

Output
Geeks for

Explanation:

  • An empty set seen is created to keep track of the words that have already been encountered.
  • list comprehension is used to iterate over a. The condition not (word in seen or seen.add(word)) ensures each word is added only the first time it appears, while maintaining the original order.
  • The unique words are joined back into a string with ' '.join(res), resulting in s2 = "Geeks for".

Using dict.fromkeys()

In this method we use dictionaries. In Python 3.7 and later dictionaries remember the order in which items are added. By using dict.fromkeys() we can remove duplicates while keeping the order of the words intact.

Python
s1 = "Geeks for Geeks"
s2 = s1.split()  # Split the sentence into words

# Use a dictionary to remove duplicates and preserve order
s3 = list(dict.fromkeys(s2))

# Join the list back into a sentence
s4 = ' '.join(s3)
print(s4)

Output
Geeks for

Explanation:

  • dict.fromkeys(s2) creates a dictionary where each word in s2 becomes a key. Since dictionaries do not allow duplicate keys, this automatically removes any duplicates while preserving the order. Converting the dictionary back to a list with list() gives s3 = ["Geeks", "for"].

Using Simple Loop

This method is the most basic way of removing duplicates but it can be slower for longer sentences. It loops through each word checks if it's already been added to a result list and adds it only if it hasn't appeared before.

Python
s1 = "Geeks for Geeks"
s2 = s1.split()  # Split the sentence into words
res = []  # List to store unique words

# Loop through words and add only unique ones to the result
for word in s2:
    if word not in res:
        res.append(word)

# Join the list back into a sentence
s3 = ' '.join(res)
print(s3)

Output
Geeks for

Example:

  • An empty list res is initialized to store the unique words.
  • for loop iterates through each word in s2. If the word is not already in res, it is appended to res. This ensures that only the first occurrence of each word is added.

Next Article

Similar Reads