Compare Two Csv Files Using Python
Last Updated :
30 Apr, 2025
We are given two files and our tasks is to compare two CSV files based on their differences in Python. In this article, we will see some generally used methods for comparing two CSV files and print differences.
file1.csv contains
Name,Age,City
John,25,New York
Emily,30,Los Angeles
Michael,40,Chicago
file2.csv contains
Name,Age,City
John,25,New York
Michael,45,Chicago
Emma,35,San Francisco
Using compare()
compare() method in pandas is used to compare two DataFrames and return the differences. It highlights only the rows and columns where the values differ, making it ideal for structured data comparison.
Python
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
# Compare DataFrames
res = df1.compare(df2)
print(res)
Output
Using compare()Explanation: It first reads file1.csv and file2.csv into two separate DataFrames, df1 and df2. The compare() method is then applied to identify differences between the two DataFrames.
Using set operations
This method reads both files line-by-line and stores their content as sets. Using set difference (a - b) allows you to quickly identify lines that are present in one file but not the other.
Python
with open('file1.csv') as f1, open('file2.csv') as f2:
a = set(f1.readlines())
b = set(f2.readlines())
print(a - b)
print(a - b)
Output
Using set operationsExplanation: It first opens file1.csv and file2.csv, reads their contents line by line and stores them as sets a and b. The difference a - b is then printed to show lines present in file1.csv but not in file2.csv.
Using difflib
Python’s difflib module provides detailed differences between files, similar to Unix's diff command. It can generate unified or context diffs showing what was added, removed, or changed.
Python
import difflib
with open('file1.csv') as f1, open('file2.csv') as f2:
d = difflib.unified_diff(f1.readlines(), f2.readlines(), fromfile='file1.csv', tofile='file2.csv')
for line in d:
print(line, end='')
Output
Using difflibExplanation: It opens file1.csv and file2.csv, reads their contents, and uses difflib.unified_diff() to generate a line-by-line comparison. The output shows added, removed or changed lines between the two files in a unified diff format.
Similar Reads
How To Create A Csv File Using Python CSV stands for comma-separated values, it is a type of text file where information is separated by commas (or any other delimiter), they are commonly used in databases and spreadsheets to store information in an organized manner. In this article, we will see how we can create a CSV file using Python
3 min read
Compare Two Xml Files in Python We are given two XML files and our task is to compare these two XML files and find out if both files are some or not by using different approaches in Python. In this article, we will see how we can compare two XML files in Python. Compare Two XML Files in PythonBelow are the possible approaches to c
3 min read
How to Add Numbers in a Csv File Using Python When working with CSV files in Python, adding numbers in the CSV file is a common requirement. This article will guide you through the process of adding numbers within a CSV file. Whether you're new to data analysis or an experienced practitioner, understanding this skill is vital for efficient data
3 min read
How to compare two lists in Python? In Python, there might be a situation where you might need to compare two lists which means checking if the lists are of the same length and if the elements of the lists are equal or not. Let us explore this with a simple example of comparing two lists.Pythona = [1, 2, 3, 4, 5] b = [1, 2, 3, 4, 5] #
3 min read
How to Compare Two Iterators in Python Python iterators are powerful tools for traversing through sequences of elements efficiently. Sometimes, you may need to compare two iterators to determine their equality or to find their differences. In this article, we will explore different approaches to compare two iterators in Python. Compare T
3 min read