0% found this document useful (0 votes)
77 views

Chapter - 3 Binary Files: 3.1 Reading and Writing To A Binary File

The document discusses binary files and serialization in Python. It explains how to read and write binary files by opening them in binary mode ("rb" and "wb"). It also discusses the pickle module for serializing Python objects to binary streams for storage, including pickling and unpickling objects. Methods like pickle.dump() and pickle.load() are used to serialize and deserialize objects to and from files. The advantages of pickle include handling recursive and shared objects.

Uploaded by

caspindafnigovz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Chapter - 3 Binary Files: 3.1 Reading and Writing To A Binary File

The document discusses binary files and serialization in Python. It explains how to read and write binary files by opening them in binary mode ("rb" and "wb"). It also discusses the pickle module for serializing Python objects to binary streams for storage, including pickling and unpickling objects. Methods like pickle.dump() and pickle.load() are used to serialize and deserialize objects to and from files. The advantages of pickle include handling recursive and shared objects.

Uploaded by

caspindafnigovz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter - 3

BINARY FILES

3.1 Reading and Writing to a Binary File

The open() function opens a file in text format by default. To open a file in binary
format, add 'b' to the mode parameter. Hence the "rb" mode opens the file in binary format
for reading, while the "wb" mode opens the file in binary format for writing. Unlike text
mode files, binary files are not human readable. When opened using any text editor, the
data is unrecognizable.

The following code stores a list of numbers in a binary file. The list is first converted
in a byte array before writing. The built-in function bytearray() returns a byte
representation of the object.

Example: Write to a Binary File

f=open("binfile.bin","wb")
num=[5, 10, 15, 20, 25]
arr=bytearray(num)
f.write(arr)
f.close()

To read the above binary file, the output of the read() method is casted to a list using
the list() function.

Example: Reading a Binary File

f=open("binfile.bin","rb")
num=list(f.read())
print (num)
f.close()

All methods of file object is given below:

Method Description
file.close() Closes the file.
file.flush() Flushes the internal buffer.
next(file) Returns the next line from the file each time it is called.
file.read([size]) Reads at a specified number of bytes from the file.
file.readline() Reads one entire line from the file.
file.readlines() Reads until EOF and returns a list containing the lines.
file.seek(offset, from) Sets the file's current position.
file.tell() Returns the file's current position
file.write(str) Writes a string to the file. There is no return value.
3.2 Pickle Module

Python pickle module is used for serializing and de-serializing a Python object
structure. Any object in Python can be pickled so that it can be saved on disk. What pickle
does is that it “serializes” the object first before writing it to file. Pickling is a way to convert
a python object (list, dict, etc.) into a character stream. The idea is that this character
stream contains all the information necessary to reconstruct the object in another python
script.

Pickling: It is a process where a Python object is converted into a byte stream. We also call
this ‘serialization’, ‘marshalling’, or ‘flattening’.

Unpickling: It is the inverse of Pickling process where a byte stream is converted into an
object.

Data Serialization

Data serialization is the process of converting structured data to a format that


allows sharing or storage of the data in a form that allows recovery of its original structure.
In some cases, the secondary intention of data serialization is to minimize the data’s size
which then reduces disk space or bandwidth requirements.

Flat vs. Nested data

Before beginning to serialize data, it is important to identify or decide how the data
should be structured during data serialization - flat or nested. The differences in the two
styles are shown in the below examples.

Flat style:

{ "Type" : "A", "field1": "value1", "field2": "value2", "field3": "value3" }

Nested style:

{"A"

{ "field1": "value1", "field2": "value2", "field3": "value3" } }

3.3 Binary Files Modules

NumPy Array (flat data)

Python’s NumPy array can be used to serialize and deserialize data to and from byte
representation.
Example:

import NumPy as np

# Converting NumPy array to byte format

byte_output = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]).tobytes()

# Converting byte format back to NumPy array

array_format = np.frombuffer(byte_output)

Pickle (nested data)

The native data serialization module for Python is called Pickle.

Example:

import pickle

#Here's an example dict

grades = { 'Alice': 89, 'Bob': 72, 'Charles': 87 }

#Use dumps to convert the object to a serialized string

serial_grades = pickle.dumps( grades )

#Use loads to de-serialize an object

received_grades = pickle.loads( serial_grades )


Program 1: Serialization using Pickle

Output
Program 2:

Output

3.4 Module Interface

dumps() – This function is called to serialize an object hierarchy.

loads() – This function is called to de-serialize a data stream.

For more control over serialization and de-serialization, Pickler or an Unpickler


objects are created respectively.

Constants provided by the pickle module :

pickle.HIGHEST_PROTOCOL

This is an integer value representing the highest protocol version available. This is
considered as the protocol value which is passed to the functions dump(), dumps().

pickle.DEFAULT_PROTOCOL

This is an integer value representing the default protocol used for pickling whose
value may be less than the value of highest protocol.
3.5 Python Pickle dump

In this section, we are going to learn, how to store data using Python pickle. To do
so, we have to import the pickle module first.

Then use pickle.dump() function to store the object data to the file. pickle.dump()
function takes 3 arguments. The first argument is the object that you want to store. The
second argument is the file object you get by opening the desired file in write-binary (wb)
mode. And the third argument is the key-value argument. This argument defines the
protocol.

Program

Output

3.6 Python Pickle load

To retrieve pickled data, the steps are quite simple. You have to use pickle.load()
function to do that. The primary argument of pickle load function is the file object that you
get by opening the file in read-binary (rb) mode.

Simple! Isn’t it. Let’s write the code to retrieve data we pickled using the pickle
dump code. See the following code for understanding.
Program

Output

3.7 Exceptions provided by the pickle module :

1. exception pickle.PickleError

This exception inherits Exception. It is the base class for all other exceptions raised
in pickling.

2. exception pickle.PicklingError

This exception inherits PickleError. This exception is raised when an unpicklable


object is encountered by Pickler.

3. exception pickle.UnpicklingError

This exception inherits PickleError. This exception is raised when there is a problem
like data corruption or a security violation while unpickling an object.
3.8 Advantages of using Pickle Module

Recursive objects (objects containing references to themselves): Pickle keeps track of the
objects it has already serialized, so later references to the same object won’t be serialized
again. (The marshal module breaks for this.)

Object sharing (references to the same object in different places): This is similar to self-
referencing objects; pickle stores the object once, and ensures that all other references
point to the master copy. Shared objects remain shared, which can be very important for
mutable objects.

User-defined classes and their instances: Marshal does not support these at all, but
pickle can save and restore class instances transparently. The class definition must be
importable and live in the same module as when the object was stored.

3.9 Append binary file

Append method is used to add the new data to the end of the file, retaining the old data
also.

Program

with open("binary_file_1", "ab") as myfile, open("binary_file_2", "rb") as file2:


myfile.write(file2.read())

You might also like