FuzzyWuzzy Python Library
Last Updated :
29 Sep, 2024
There are many methods of comparing string in python. Some of the main methods are:
- Using regex
- Simple compare
- Using difflib
But one of the very easy method is by using
fuzzywuzzy
library where we can have a score out of 100, that denotes two string are equal by giving similarity index. This article talks about how we start using fuzzywuzzy library. FuzzyWuzzy is a library of Python which is used for string matching. Fuzzy string matching is the process of finding strings that match a given pattern. Basically it uses
Levenshtein Distance to calculate the differences between sequences.
FuzzyWuzzy has been developed and open-sourced by SeatGeek, a service to find sport and concert tickets. Their original use case, as discussed in their
blog.Requirements of fuzzywuzzy
- Python 2.7 or higher
- python-Levenshtein
- difflib
Install via pip :
pip install fuzzywuzzypip install python-Levenshtein
How to use FuzzyWuzzy Python Library ?
First of import these modules,
Python
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
Simple ratio usage :
Python
fuzz.ratio('geeksforgeeks', 'geeksgeeks')
87
# Exact match
fuzz.ratio('GeeksforGeeks', 'GeeksforGeeks')
100
fuzz.ratio('geeks for geeks', 'Geeks For Geeks ')
80
Python
fuzz.partial_ratio("geeks for geeks", "geeks for geeks!")
100
# Exclamation mark in second string,
but still partially words are same so score comes 100
fuzz.partial_ratio("geeks for geeks", "geeks geeks")
64
# score is less because there is a extra
token in the middle middle of the string.
Now, token set ratio an token sort ratio:
Python
# Token Sort Ratio
fuzz.token_sort_ratio("geeks for geeks", "for geeks geeks")
100
# This gives 100 as every word is same, irrespective of the position
# Token Set Ratio
fuzz.token_sort_ratio("geeks for geeks", "geeks for for geeks")
88
fuzz.token_set_ratio("geeks for geeks", "geeks for for geeks")
100
# Score comes 100 in second case because token_set_ratio
considers duplicate words as a single word.
Now suppose if we have list of list of options and we want to find the closest match(es), we can use the
process
module
Python
query = 'geeks for geeks'
choices = ['geek for geek', 'geek geek', 'g. for geeks']
# Get a list of matches ordered by score, default limit to 5
process.extract(query, choices)
[('geeks geeks', 95), ('g. for geeks', 95), ('geek for geek', 93)]
# If we want only the top one
process.extractOne(query, choices)
('geeks geeks', 95)
There is also one more ratio which is used often called
WRatio
, sometimes its better to use WRatio instead of simple ratio as WRatio handles lower and upper cases and some other parameters too.
Python
fuzz.WRatio('geeks for geeks', 'Geeks For Geeks')
100
fuzz.WRatio('geeks for geeks!!!','geeks for geeks')
100
# whereas simple ratio will give for above case
fuzz.ratio('geeks for geeks!!!','geeks for geeks')
91
Full Code
Python
# Python code showing all the ratios together,
# make sure you have installed fuzzywuzzy module
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
s1 = "I love GeeksforGeeks"
s2 = "I am loving GeeksforGeeks"
print "FuzzyWuzzy Ratio: ", fuzz.ratio(s1, s2)
print "FuzzyWuzzy PartialRatio: ", fuzz.partial_ratio(s1, s2)
print "FuzzyWuzzy TokenSortRatio: ", fuzz.token_sort_ratio(s1, s2)
print "FuzzyWuzzy TokenSetRatio: ", fuzz.token_set_ratio(s1, s2)
print "FuzzyWuzzy WRatio: ", fuzz.WRatio(s1, s2),'\n\n'
# for process library,
query = 'geeks for geeks'
choices = ['geek for geek', 'geek geek', 'g. for geeks']
print "List of ratios: "
print process.extract(query, choices), '\n'
print "Best among the above list: ",process.extractOne(query, choices)
Output:
FuzzyWuzzy Ratio: 84
FuzzyWuzzy PartialRatio: 85
FuzzyWuzzy TokenSortRatio: 84
FuzzyWuzzy TokenSetRatio: 86
FuzzyWuzzy WRatio: 84
List of ratios:
[('g. for geeks', 95), ('geek for geek', 93), ('geek geek', 86)]
Best among the above list: ('g. for geeks', 95)
The FuzzyWuzzy library is built on top of difflib library, python-Levenshtein is used for speed. So it is one of the best way for string matching in python.
Conclusion
The FuzzyWuzzy library offers an efficient and straightforward approach for string comparison in Python. It simplifies the process of measuring similarity between strings by providing various ratios like Simple Ratio, Token Sort Ratio, and WRatio, making it highly versatile for different use cases. With its foundation built on Levenshtein Distance, FuzzyWuzzy is not only powerful but also easy to implement, requiring minimal setup via pip installation. Whether you’re working on text matching, data deduplication, or comparing user inputs, FuzzyWuzzy stands out as one of the best libraries for fuzzy string matching in Python.
FuzzyWuzzy Python library -FAQs
1. What is FuzzyWuzzy used for in Python?
FuzzyWuzzy is a Python library used for fuzzy string matching, which helps find approximate matches between strings. It is commonly used for tasks like data deduplication, matching user inputs, and comparing text with minor differences by providing a similarity score.
2. How does FuzzyWuzzy calculate string similarity?
FuzzyWuzzy uses Levenshtein Distance to calculate the difference between two strings. It provides various ratios, such as Simple Ratio, Token Sort Ratio, and WRatio, to measure the similarity between strings and return a score out of 100.
3. What are the key features of FuzzyWuzzy?
Key features of FuzzyWuzzy include easy-to-use string comparison functions, multiple similarity ratios (like TokenSetRatio and WRatio), and support for finding the best match from a list of strings. It also handles case sensitivity and ignores minor variations in strings.
4. When should I use WRatio over Simple Ratio in FuzzyWuzzy?
WRatio is more versatile than Simple Ratio because it handles case sensitivity and other minor variations in strings. It is ideal when comparing strings with inconsistent casing or extra characters, offering a more robust comparison method.
Similar Reads
Libraries in Python
Normally, a library is a collection of books or is a room or place where many books are stored to be used later. Similarly, in the programming world, a library is a collection of precompiled codes that can be used later on in a program for some specific well-defined operations. Other than pre-compil
8 min read
6 Best Python Libraries For Fun
Being one of the most popular languages in the entire world, Python has created a buzz around among developers over the past few years. This came into the limelight when the number of Python developers outnumbered Java back in 2020. Having easy syntax and easy to understand (just like English), it h
6 min read
Top 10 Python Libraries For Cybersecurity
In today's society, in which technological advances surround us, one of the important priorities is cybersecurity. Cyber threats have been growing quickly, and it has become challenging for cybersecurity experts to keep up with these attacks. Python plays a role here. Python, a high-level programmin
15+ min read
Python Crash Course
If you are aware of programming languages and ready to unlock the power of Python, enter the world of programming with this free Python crash course. This crash course on Python is designed for beginners to master Python's fundamentals in record time! Experienced Python developers developed this fre
7 min read
Top 5 Python Libraries For Big Data
Python has become PandasThe development of panda started between 2008 and the very first version was published back in 2012 which became the most popular open-source framework introduced by Wes McKinney. The demand for Pandas has grown enormously over the past few years and even today if collective
4 min read
Python 3 basics
Python was developed by Guido van Rossum in the early 1990s and its latest version is 3.11.0, we can simply call it Python3. Python 3.0 was released in 2008. and is interpreted language i.e it's not compiled and the interpreter will check the code line by line. This article can be used to learn the
10 min read
Learn Python Basics
âPython is a versatile, high-level programming language known for its readability and simplicity. Whether you're a beginner or an experienced developer, Python offers a wide range of functionalities that make it a popular choice in various domains such as web development, data science, artificial in
9 min read
Introduction to Python Pydantic Library
In modern Python development, data validation and parsing are essential components of building robust and reliable applications. Whether we're developing APIs, working with configuration files, or handling data from various sources, ensuring that our data is correctly validated and parsed is crucial
7 min read
List of Python GUI Library and Packages
Graphical User Interfaces (GUIs) play a pivotal role in enhancing user interaction and experience. Python, known for its simplicity and versatility, has evolved into a prominent choice for building GUI applications. With the advent of Python 3, developers have been equipped with lots of tools and li
12 min read
Python Features
Python is a dynamic, high-level, free open source, and interpreted programming language. It supports object-oriented programming as well as procedural-oriented programming. In Python, we don't need to declare the type of variable because it is a dynamically typed language. For example, x = 10 Here,
5 min read