Python | Character Encoding Last Updated : 29 May, 2021 Summarize Comments Improve Suggest changes Share Like Article Like Report Finding the text which is having nonstandard character encoding is a very common step to perform in text processing. All the text would have been from utf-8 or ASCII encoding ideally but this might not be the case always. So, in such cases when the encoding is not known, such non-encoded text has to be detected and the be converted to a standard encoding. So, this step is important before processing the text further. Charade Installation : For performing the detection and conversion of encoding, charade - a Python library is required. This module can be simply installed using sudo easy_install charade or pip install charade. Let's see the wrapper function around the charade module. Code : encoding.detect(string), to detect the encoding Python3 # -*- coding: utf-8 -*- import charade def detect(s): try: # check it in the charade list if isinstance(s, str): return charade.detect(s.encode()) # detecting the string else: return charade.detect(s) # in case of error # encode with 'utf -8' encoding except UnicodeDecodeError: return charade.detect(s.encode('utf-8')) The detect functions will return 2 attributes : Confidence : the probability of charade being correct. Encoding : which encoding it is. Code : encoding.convert(string) to convert the encoding. Python3 # -*- coding: utf-8 -*- import charade def convert(s): # if in the charade instance if isinstance(s, str): s = s.encode() # retrieving the encoding information # from the detect() output encode = detect(s)['encoding'] if encode == 'utf-8': return s.decode() else: return s.decode(encoding) Code : Example Python3 # importing library import encoding d1 = encoding.detect('geek') print ("d1 is encoded as : ", d1) d2 = encoding.detect('ascii') print ("d2 is encoded as : ", d2) Output : d1 is encoded as : (confidence': 0.505, 'encoding': 'utf-8') d2 is encoded as : ('confidence': 1.0, 'encoding': 'ascii') detect() : It is a charade.detect() wrapper. It encodes the strings and handles the UnicodeDecodeError exceptions. It expects a bytes object so therefore the string is encoded before trying to detect the encoding.convert() : It is a charade.convert() wrapper. It calls detect() first to get the encoding. Then, it returns a decoded string. Comment More infoAdvertise with us Next Article Python | Character Encoding M mathemagic Follow Improve Article Tags : Python Natural-language-processing Practice Tags : python Similar Reads Python Escape Characters In Python, escape characters are used when we need to include special characters in a string that are otherwise hard (or illegal) to type directly. These are preceded by a backslash (\), which tells Python that the next character is going to be a special character. Theyâre especially helpful for:For 2 min read numpy.defchararray.encode() in Python numpy.core.defchararray.encode(arr, encoding): This numpy function encodes the string(object) based on the specified codec. Parameters: arr : array-like or string. encoding : [str] Name of encoding being followed. error : Specifying how to handle error. Returns : Encoded string Code:Â Python3 # Pyth 1 min read chr() Function in Python chr() function returns a string representing a character whose Unicode code point is the integer specified. chr() Example: Python3 num = 97 print("ASCII Value of 97 is: ", chr(num)) OutputASCII Value of 97 is: a Python chr() Function Syntaxchr(num) Parametersnum: an Unicode code integerRet 3 min read codecs.decode() in Python With the help of codecs.decode() method, we can decode the binary string into normal form by using codecs.decode() method. Syntax : codecs.decode(b_string) Return : Return the decoded string. Example #1 : In this example we can see that by using codecs.decode() method, we are able to get the decoded 1 min read base64.encodestring(s) in Python With the help of base64.encodestring(s) method, we can encode the string using base64 encoded data into the binary form. Syntax : base64.encodestring(string) Return : Return the encoded string. Example #1 : In this example we can see that by using base64.encodestring(s) method, we are able to get th 1 min read Like