Character Encoding Detection With Chardet in Python
Last Updated :
21 Mar, 2024
We are given some characters in the form of text files, unknown encoded text, and website content and our task is to detect the character encoding with Chardet in Python. In this article, we will see how we can perform character encoding detection with Chardet in Python.
Example:
Input: data = b'\xff\xfe\x41\x00\x42\x00\x43\x00'
Output: UTF-16
Explanation: Encoding is detected of the above given data.
Character Encoding Detection With Chardet in Python
Below are some of the examples by which we can understand how to detect the character encoding with Chardet in Python:
Installing Chardet in Python
First of all, we will install chardet in Python by using the following command and then we will perform other operations to detect character encoding in Python:
pip install chardet
Example 1: Detecting Encoding of a String
In this example, the Python script uses the chardet
library to detect the character encoding of a given byte sequence (data
). The detected encoding and its confidence level are printed, revealing information about the encoding scheme of the provided binary data.
Python3
import chardet
# String with unknown encoding
data = b'\xff\xfe\x41\x00\x42\x00\x43\x00'
# Detect the encoding
result = chardet.detect(data)
print(result['encoding'])
Output:
UTF-16
Example 2: Detecting Encoding of a Website Content
In this example, the Python script utilizes the requests
library to fetch the HTML content of the GeeksforGeeks webpage. The chardet
library is then employed to detect the character encoding of the retrieved content. The detected encoding and its confidence level are printed, providing insights into the encoding scheme used by the webpage.
Python3
import requests
import chardet
# Fetch the web page content
response = requests.get('https://www.geeksforgeeks.org/')
html_content = response.content
# Detect the encoding
result = chardet.detect(html_content)
print(result['encoding'])
Output:
utf-8
Example 3: Detecting Encoding of a Text File
In this example, the Python script reads the content of a text file ('utf-8.txt') in binary mode using open
and rb
. The chardet
library is then used to detect the character encoding of the file's content. The detected encoding and its confidence level are printed, offering information about the encoding scheme used in the specified text file.
utf-8.txt

Python3
import chardet
# Read the text file
with open('utf-8.txt', 'rb') as f:
data = f.read()
# Detect the encoding
result = chardet.detect(data)
print(result['encoding'])
Output:
utf-8
Similar Reads
Check if string contains character - Python We are given a string and our task is to check if it contains a specific character, this can happen when validating input or searching for a pattern. For example, if we check whether 'e' is in the string 'hello', the output will be True.Using in Operatorin operator is the easiest way to check if a c
2 min read
Detect Encoding of CSV File in Python When working with CSV (Comma Separated Values) files in Python, it is crucial to handle different character encodings appropriately. Encoding determines how characters are represented in binary format, and mismatched encodings can lead to data corruption or misinterpretation. In this article, we wil
3 min read
Python - Test if Kth character is digit in String Given a String, check if Kth index is a digit. Input : test_str = 'geeks9geeks', K = 5 Output : True Explanation : 5th idx element is 9, a digit, hence True.Input : test_str = 'geeks9geeks', K = 4 Output : False Explanation : 4th idx element is s, not a digit, hence False. Method #1: Using in operat
5 min read
Python - Extract only characters from given string To extract only characters (letters) from a given string we can use various easy and efficient methods in Python. Using str.isalpha() in a Loop str.isalpha() method checks if a character in a string is an alphabetic letter. Using a loop, we can iterate through each character in a string to filter ou
2 min read
Find position of a character in given string - Python Given a string and a character, our task is to find the first position of the occurrence of the character in the string using Python. For example, consider a string s = "Geeks" and character k = 'e', in the string s, the first occurrence of the character 'e' is at index1. Let's look at various metho
2 min read