Working with Binary Data in Python
Last Updated :
22 Jun, 2020
Alright, lets get this out of the way! The basics are pretty standard:
- There are 8 bits in a byte
- Bits either consist of a 0 or a 1
- A byte can be interpreted in different ways, like binary octal or hexadecimal
Note: These are not character encodings, those come later. This is just a way to look at a set of 1's and 0's and see it in three different ways(or number systems).
Examples:
Input : 10011011
Output :
1001 1011 ---- 9B (in hex)
1001 1011 ---- 155 (in decimal)
1001 1011 ---- 233 (in octal)
This clearly shows a string of bits can be interpreted differently in different ways. We often use the hex representation of a byte instead of the binary one because it is shorter to write, this is just a representation and not an interpretation.
Encoding
Now that we know what a byte is and what it looks like, let us see how it is interpreted, mainly in strings. Character Encodings are a way to assign values to bytes or sets of bytes that represent a certain character in that scheme. Some encodings are ASCII(probably the oldest), Latin, and UTF-8(most widely used as of today. In a sense encodings are a way for computers to represent, send and interpret human readable characters. This means that a sentence in one encoding might become completely incomprehensible in another encoding.
Python and Bytes
From a developer’s point of view, the largest change in Python 3 is the handling of strings. In Python 2, the str type was used for two different kinds of values – text and bytes, whereas in Python 3, these are separate and incompatible types. This means that before Python3 we could treat a set of bytes as a string and work from there, this is not the case now, now we have a separate data type, called bytes. This data type can be briefly explained as a string of bytes, which essentially means, once the bytes data type is initialized it is immutable.
Example:
Python3
bytestr = bytes(b'abc')
# initializing a string with b
# makes it a binary string
print(bytestr)
print(bytestr[0])
bytestr[0] = 97
Output:
b'abc'
97
Traceback (most recent call last):
File "bytesExample.py", line 4, in
bytestr[0] = 97
TypeError: 'bytes' object does not support item assignment
A bytestring is what it says it is simply a string of bytes, for example '© ? ?' in 'utf-8' is
b'\xC2\xA9\x20\xF0\x9D\x8C\x86\x20\xE2\x98\x83'
This presents another problem, we need to know the encoding of a binary string, because the same string in another encoding(latin-1) looks different.
© ð â
Example:
Python3
print(b'\xC2\xA9\x20\xF0\x9D\x8C\x86\x20\xE2\x98\x83'.decode('utf-8'))
print(b'\xC2\xA9\x20\xF0\x9D\x8C\x86\x20\xE2\x98\x83'.decode('latin-1'))
Output:
As seen above it is possible to encode or decode strings and binary strings using the encode() or decode() function. We need the encoding because in some encodings it is not possible to to decode the strings. This problem compounds when not using non Latin characters like Hebrew, Japanese and Chinese. Because in those languages more than one byte is assigned to each letter. But what do we use when we need to modify a set of bytes, we use a bytearray.
Example:
Python3
bytesArr = bytearray(b'\x00\x0F')
# Bytearray allows modification
bytesArr[0] = 255
bytesArr.append(255)
print(bytesArr)
Output:
bytearray(b'\xff\x0f\xff')
Bitwise Operations
In Python, bitwise operators are used to perform bitwise calculations on integers. The integers are first converted into binary and then operations are performed on bit by bit, hence the name bitwise operators. The standard bitwise operations are demonstrated below.
Note: For more information, refer to Python Bitwise Operators
Example:
Python3
# Code to demonstrate bitwise operations
# Some bytes to play with
byte1 = int('11110000', 2) # 240
byte2 = int('00001111', 2) # 15
byte3 = int('01010101', 2) # 85
# Ones Complement (Flip the bits)
print(~byte1)
# AND
print(byte1 & byte2)
# OR
print(byte1 | byte2)
# XOR
print(byte1 ^ byte3)
# Shifting right will lose the
# right-most bit
print(byte2 >> 3)
# Shifting left will add a 0 bit
# on the right side
print(byte2 << 1)
# See if a single bit is set
bit_mask = int('00000001', 2) # Bit 1
# Is bit set in byte1?
print(bit_mask & byte1)
# Is bit set in byte2?
print(bit_mask & byte2)
Output:
-241
0
255
165
1
30
0
1
Some Other Applications
Binary data provides several applications like we can check if the two files are similar or not using the binary data, we can also check for a whether a file is jpeg or not (or any other image format). Let's see the below examples for better understanding.
Example 1: Checking if the two files are same or not. Here two text files are used with the data as follows -
File 1:
File 2:
Python3
with open('GFG.txt', 'rb') as file1, open('log.txt', 'rb') as file2:
data1 = file1.read()
data2 = file2.read()
if data1 != data2:
print("Files do not match.")
else:
print("Files match.")
Output:
Files do not match.
Example 2: Checking if the given image is jpeg or not.
Image used:
Python3
import binascii
jpeg_signatures = [
binascii.unhexlify(b'FFD8FFD8'),
binascii.unhexlify(b'FFD8FFE0'),
binascii.unhexlify(b'FFD8FFE1')
]
with open('food.jpeg', 'rb') as file:
first_four_bytes = file.read(4)
if first_four_bytes in jpeg_signatures:
print("JPEG detected.")
else:
print("File does not look like a JPEG.")
Output:
JPEG detected.
Similar Reads
Working with MySQL BLOB in Python In Python Programming, We can connect with several databases like MySQL, Oracle, SQLite, etc., using inbuilt support. We have separate modules for each database. We can use SQL Language as a mediator between the python program and database. We will write all queries in our python program and send th
4 min read
Working with Binary Files in R Programming In the computer science world, text files contain data that can easily be understood by humans. It includes letters, numbers, and other characters. On the other hand, binary files contain 1s and 0s that only computers can interpret. The information stored in a binary file can't be read by humans as
5 min read
Reading binary files in Python Reading binary files means reading data that is stored in a binary format, which is not human-readable. Unlike text files, which store data as readable characters, binary files store data as raw bytes. Binary files store data as a sequence of bytes. Each byte can represent a wide range of values, fr
5 min read
Binary Tree in Python Binary Tree is a non-linear and hierarchical data structure where each node has at most two children referred to as the left child and the right child. The topmost node in a binary tree is called the root, and the bottom-most nodes are called leaves.Introduction to Binary TreeRepresentation of Binar
9 min read
Binary Heap in Python A Binary Heap is a complete Binary Tree that is used to store data efficiently to get the max or min element based on its structure. A Binary Heap is either a Min Heap or a Max Heap. In a Min Binary Heap, the key at the root must be minimum among all keys present in a Binary Heap. The same property
3 min read
Binary to decimal and vice-versa in python Write Python code for converting a decimal number to it's binary equivalent and vice-versa. Example: From decimal to binary Input : 8 Output : 1 0 0 0 From binary to decimal Input : 100 Output : 4 Decimal to binary Keep calling conversion function with n/2 till n > 1, later perform n % 1 to get M
4 min read
Convert binary to string using Python We are given a binary string and need to convert it into a readable text string. The goal is to interpret the binary data, where each group of 8 bits represents a character and decode it into its corresponding text. For example, the binary string '01100111011001010110010101101011' converts to 'geek'
3 min read
Python - Write Bytes to File Files are used in order to store data permanently. File handling is performing various operations (read, write, delete, update, etc.) on these files. In Python, file handling process takes place in the following steps:Open filePerform operationClose fileThere are four basic modes in which a file can
3 min read
Binning Data In Python With Scipy & Numpy Binning data is an essential technique in data analysis that enables the transformation of continuous data into discrete intervals, providing a clearer picture of the underlying trends and distributions. In the Python ecosystem, the combination of numpy and scipy libraries offers robust tools for ef
8 min read
numpy.binary_repr() in Python numpy.binary_repr(number, width=None) function is used to represent binary form of the input number as a string. For negative numbers, if width is not given, a minus sign is added to the front. If width is given, the twoâs complement of the number is returned, with respect to that width. In a twoâs-
3 min read