0% found this document useful (0 votes)
16 views

Ch 2-Data Handling using Pandas - I NCERT Solved 2024-25

Informatics practices

Uploaded by

asiff.ansari0612
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Ch 2-Data Handling using Pandas - I NCERT Solved 2024-25

Informatics practices

Uploaded by

asiff.ansari0612
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Chapter 2

Data Handling using Pandas – I


Last update on 07 Apr 2024
1. Write the definition of Python Pandas.
Ans. Pandas is an open source software library which is built on top of NumPy
allows performing Data Analysis and Manipulation in Python.
Thus, PANDAS is a high-level data manipulation tool used for
analysing data.
 Pandas - a powerful data analysis and manipulation library for Python.
 Pandas has derived its name from “Panel Data System”
 Today, Pandas has become a popular choice for data analysis.

2. What do you meant by Data Analysis?


Ans. Data Analysis refers to process of evaluating big data sets using analytical
and statistical tools to discover useful information and conclusion to support
business decision-making.

3. Write the advantages of using of Pandas.


Ans. Pandas offer high-performance, easy-to-use data structures and data analysis
tools.
 Easily handles Missing Data.
 It uses Series for one-dimension for data structure and Data Frame for
multi-dimensional data structure.
 It provides an efficient way to Slice the data.
 It provides flexible way to merge, concatenate or reshape the data.
 It includes a powerful Time-Series tool to work with.
4. How to install Pandas.
Ans. To install Python Pandas, go to your command line/ terminal and type “pip
install pandas” or else, if you have anaconda installed in your system, just
type in “conda install pandas”. Once the installation is completed, go to
your IDE (Jupyter, PyCharm etc.) and simply import it by typing: “import
pandas as pd”. PIP stands for “PIP Installs Packages”.
C:\>pip install pandas
Class Notes : 2024-25

5. What is NumPy ? How to install it?


Ans.  NumPy stands for „Numerical Python‟. It is a package that can be used for
numerical data analysis and scientific computing with Python.
 NumPy uses a multidimensional array object, and has functions and tools
for working with these arrays.
 The powerful n-dimensional array in NumPy speeds-up data processing.
 NumPy can be easily interfaced with other Python packages and provides
tools for integrating with other programming languages like C, C++ etc.
Installing NumPy
NumPy can be installed by typing following command:
pip install NumPy
6. What is an Array? Write the important characteristic of an Array.
Ans. Array in general refers to a named group of homogeneous (same type) elements.
An array is a data type used to store multiple values using a single identifier
(variable name). An array contains an ordered collection of data elements where
each element is of the same type and can be referenced by its index (position).
The important characteristics of an array are:
 Each element of the array is of same data type, though the values stored in
them may be different.
 The entire array is stored contiguously in memory. This makes operations
on array fast.
 Each element of the array is identified or referred using the name of the
Array along with the index of that element, which is unique for each
element. The index of an element is an integral value associated with the
element, based on the element‟s position in the array. For example consider
an array with 5 numbers: [ 10, 9, 99, 71, 90 ]
Index 0 1 2 3 4
Values 10 9 99 71 90
 This is called zero based indexing.
7. Write the difference between List and Array.
Ans. List Array
List can have elements of different All elements of an array are of same
data types for example, data type for example, an array of
[1,3.4, „hello‟, „a@‟] floats may be: [1.2, 5.4, 2.7]
Elements of a list are not stored Array elements are stored in
contiguously in memory. contiguous memory locations. This
makes operations on arrays faster
than lists.

Chapter 2: Data Handling with Pandas-I 2|P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Lists do not support element wise Arrays support element wise
operations, for example, addition, operations. For example, if A1 is an
multiplication, etc. because elements array, it is possible to say A1/3 to
may not be of same type. divide each element of the array by 3.
Lists can contain objects of different NumPy array takes up less space in
datatype that Python must store the memory as compared to a list
type information for every element because arrays do not require to
along with its element value. Thus store datatype of each element
lists take more space in memory and separately.
are less efficient.
List is a part of core Python. Array (ndarray) is a part of NumPy
library.
8. What is numPy array and how is it different from a list? What is the
name of the built-in array class in NumPy?
Ans.  NumPy Array is simply a grid that contains values of the same /
homogeneous type.
 NumPy Array likes a Python List having all elements of similar type but
NumPy Arrays are different from Python Lists in functionality.
 The NumPy Array is officially called ndarray but commonly known as
Array.
Difference between Numpy Array and a List are-
 Unlike List, once Numpy Array is created, you cannot change its size.
 Every NumPy Array can contain elements of homogenous types i.e. all its
elements have same data type unlike Python Lists
 NumPy Array occupies much less space than a Python List
 NumPy Array support Vectorized Operation i.e. if you apply a function, it
is performed on every items in the array unlike list.
Example :
>>> import numpy as np
>>> a=np.array([1,2,3])
>>> a
array([1, 2, 3])
>>> L=[1,2,3]
>>> L
[1, 2, 3]
>>> L+2
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
L+2
TypeError: can only concatenate list (not "int") to list
>>> a+2
array([3, 4, 5])

9. Write the different forms of numpy array.

Chapter 2: Data Handling with Pandas-I 3|P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans.  NumPy Arrays come in two forms-
• Single Dimension Array (1-D) known as Vectors (Single
Row/Column only). An array with only single row of elements is
called 1-D array.

Index 0 1 2 3
10 23 56 17
>>>import numpy as np
>>>a=np.array([10,23,56,17])
>>>print(a)
Output : [10 23 56 17]
• Multi-dimensional Array (2-D) known as Matrics (Multiple
Rows/Columns). Two dimensional (2-D) arrays by passing nested
lists to the array() function.
Indices 0 1 2
0 23 56 17
1 41 67 34
2 45 72 45
3 65 37 56

2-D Accessing Elements -


0,0 0,1 0,2

1,0 1,1 1,2


2,0 2,1 2,2
3,0 3,1 3,2
>>>import numpy as np
>>>a=np.array([[23,56,17],[41,67,34]])
>>>print(a)
Output :
[[23 56 17]
[41 67 34]]
#Accessing the element in the 1st Row in the 3rd Column
>>> a[0][2]
Output:
17
10. Write the various attributes of NumPy Array.
Ans. Some important attributes of a NumPy ndarray object are:
i) ndarray.ndim: gives the number of dimensions of the array as an
integer value. Arrays can be 1-D, 2-D or n-D. In this chapter, we shall focus
on 1-D and 2-D arrays only. NumPy calls the dimensions as axes (plural of
Chapter 2: Data Handling with Pandas-I 4|P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
axis). Thus, a 2-D array has two axes. The row-axis is called axis-0 and the
column-axis is called axis-1. The number of axes is also called the array‟s rank.
Example :
>>> array1 = np.array([10,20,30])
>>> array2 = np.array([5,-7.4,'a',7.2])
>>> array3 = np.array([[2.4,3], [4.91,7],[0,-1]])
>>> array1.ndim
1
>>> array3.ndim
2
ii) ndarray.shape: It gives the sequence of integers indicating the size of
the array for each dimension.

Example :
# array1 is 1D-array, there is nothing # after , in sequence
>>> array1.shape
(3,)
>>> array2.shape
(4,)
>>> array3.shape
(3, 2)
The output (3, 2) means array3 has 3 rows and 2 columns.
iii) ndarray.size: It gives the total number of elements of the array. This is
equal to the product of the elements of shape.

Example:
>>> array1.size
3
>>> array3.size
6
iv) ndarray.dtype: is the data type of the elements of the array. All the
elements of an array are of same data type. Common data types are int32,
int64, float32, float64, U32, etc.

Example :
>>> array1.dtype
dtype('int32')
>>> array2.dtype
dtype('<U32>')
>>> array3.dtype
dtype('float64')
v) ndarray.itemsize: It specifies the size in bytes of each element of the
array. Data type int32 and float32 means each element of the array occupies 32
bits in memory. 8 bits form a byte. Thus, an array of elements of type int32 has
itemsize 32/8=4 bytes. Likewise, int64/float64 means each item has itemsize
64/8=8 bytes.

Example :
>>> array1.itemsize
4 # memory allocated to integer
>>> array2.itemsize
Chapter 2: Data Handling with Pandas-I 5|P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
128 # memory allocated to string
>>> array3.itemsize
8 #memory allocated to float type
11. Write the different or other ways to creating NumPy Array.
Ans. 1. We can specify data type (integer, float, etc.) while creating array using
dtype as an argument to array(). This will convert the data automatically
to the mentioned type. In the following example, nested list of integers are
passed to the array function. Since data type has been declared as float, the
integers are converted to floating point numbers.
>>> array4 = np.array( [ [1,2], [3,4] ], dtype=float)
>>> array4
array([[1., 2.],[3., 4.]])

2. We can create an array with all elements initialised to 0 using the


function zeros(). By default, the data type of the array created by
zeros() is float. The following code will create an array with 3 rows
and 4 columns with each element set to 0.
>>> array5 = np.zeros((3,4))
>>> array5
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])

3. We can create an array with all elements initialised to 1 using the


function ones(). By default, the data type of the array created by
ones() is float. The following code will create an array with 3 rows and
2 columns.

>>> array6 = np.ones((3,2))


>>> array6
array([[1., 1.],
[1., 1.],
[1., 1.]])

4. We can create an array with numbers in a given range and sequence using
the arange() function. This function is analogous to the range()
function of Python.

>>> array7 = np.arange(6)


# an array of 6 elements is created with start value 5 and step
size 1
>>> array7
array([0, 1, 2, 3, 4, 5])
# Creating an array with start value -2, end # value 24 and step
size 4
>>> array8 = np.arange( -2, 24, 4 )
>>> array8
array([-2, 2, 6, 10, 14, 18, 22])
12. What is Slicing?
Ans. Slicing is a powerful way to retrieve subsets of data from a pandas object. We
can define which part of the array to be sliced by specifying the start and end
Chapter 2: Data Handling with Pandas-I 6|P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
index values using [start : end: step] along with the array name.
>>> array8=np.array([-2, 2, 6, 10, 14, 18, 22])
>>> array8
array([-2, 2, 6, 10, 14, 18, 22])
# Excludes the value at the end index
>>> array8[3:5]
array([10, 14])
# Reverse the array
>>> array8[ : : -1]
array([22, 18, 14, 10, 6, 2, -2])

Slicing is done for 2-D arrays. For this, let us create a 2-D array called array9
having 3 rows and 4 columns.
>>> array9 = np.array([[ -7, 0, 10, 20],[ -5, 1, 40, 200],
[ -1, 1, 4, 30]])
# Access all the elements in the 3rd column
>>> array9[0:3,2]
array([10, 40, 4])
Note that we are specifying rows in the range 0:3 because the end value of the
range is excluded.
# Access elements of 2nd and 3rd row from 1st and 2nd column
>>> array9[1:3,0:2]
array([[-5, 1],
[-1, 1]])
If row indices are not specified, it means all the rows are to be considered.
Likewise, if column indices are not specified, all the columns are to be
considered. Thus, the statement to access all the elements in the 3rd column
can also be written as:
>>>array9[:,2]
array([10, 40, 4])

13. Write the basic Arithmetic Operation on NumPy Array with examples.
Ans. To perform a basic arithmetic operation like addition, subtraction,
multiplication, division etc. on two arrays, the operation is done on each
corresponding pair of elements.
>>> array1 = np.array([[3,6],[4,2]])
>>> array2 = np.array([[10,20],[15,12]])
>>> array1 + array2
array([[13, 26],
[19, 14]])
#Subtraction
>>> array1 - array2
array([[ -7, -14],
[-11, -10]])
#Multiplication
>>> array1 * array2
array([[ 30, 120],
[ 60, 24]])
#Matrix Multiplication
Chapter 2: Data Handling with Pandas-I 7|P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
>>> array1 @ array2
array([[120, 132],
[ 70, 104]])
#Exponentiation
>>> array1 ** 3
array([[ 27, 216],
[ 64, 8]], dtype=int32)
#Division
>>> array2 / array1
array([[3.33333333, 3.33333333],
[3.75 , 6. ]])
#Element wise Remainder of Division #(Modulus)
>>> array2 % array1
array([[1, 2],
[3, 0]], dtype=int32)
It is important to note that for element-wise operations, size of both arrays
must be same. That is, array1.shape must be equal to array2.shape.
14. What is Concatenating Arrays?
Ans.  Concatenation means joining two or more arrays.
 Concatenating 1-D arrays means appending the sequences one after another.
 NumPy.concatenate() function can be used to concatenate two or more 2-
D arrays either row-wise or column-wise.
 All the dimensions of the arrays to be concatenated must match exactly
except for the dimension or axis along which they need to be joined. Any
mismatch in the dimensions results in an error. By default, the
concatenation of the arrays happens along axis=0.
 Example :
>>> array1 = np.array([[10, 20], [-30,40]])
>>> array2 = np.zeros((2, 3), dtype=array1.dtype)
>>> array1
array([[ 10, 20],
[-30, 40]])
>>> array2
array([[0, 0, 0],
[0, 0, 0]])
>>> array1.shape
(2, 2)
>>> array2.shape
(2, 3)
>>> np.concatenate((array1,array2), axis=1)
array([[ 10, 20, 0, 0, 0],
[-30, 40, 0, 0, 0]])
>>> np.concatenate((array1,array2), axis=0)

Chapter 2: Data Handling with Pandas-I 8|P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

Traceback (most recent call last):


File "<pyshell#33>", line 1, in <module>
np.concatenate((array1,array2),axis=0)
ValueError: all the input array dimensions except for the
concatenation axis must match exactly

15. Write the differences between Pandas and Numpy?


Ans. i) A Numpy array requires homogeneous data, while a Pandas DataFrame
can have different data types (float, int, string, datetime, etc.).
ii) Pandas have a simpler interface for operations like file loading, plotting,
selection, joining, GROUP BY, which come very handy in data-processing
applications.
iii) Pandas DataFrames (with column names) make it very easy to keep track
of data.
iv) Pandas is used when data is in Tabular Format, whereas Numpy is used
for numeric array based data manipulation.
16. Write the differences between Pandas Series and NumPy Array?
Ans. Difference between Pandas Series and NumPy Arrays-
Pandas Series NumPy Array
i) In series we can define our own i) NumPy arrays are accessed by
labeled index to access elements of their integer position using
an array. These can be numbers or numbers only.
letters.
ii) The elements can be indexed in ii) The indexing starts with zero for
descending order also. the first element and the index is
fixed.
iii) If two series are not aligned, NaN iii) There is no concept of NaN values
or missing values are generated. and if there are no matching
values in arrays, alignment fails.
iv) Series require more memory. iv) NumPy occupies lesser memory.
17. What are the two important data structures in Pandas?
Ans. Two important Data Structure in Pandas are -
• Series
• Data Frame
i) Series - Series is a one-dimensional array containing a sequence of
values of holding data of any type (integer, string, float, python
objects, etc.) which by default have numeric data/axis labels starting
from zero. The data/axis labels are collectively called index. Series can
be created using-
 Array
 Dictionary
 Scalar value or constant
Chapter 2: Data Handling with Pandas-I 9|P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
ii) Data Frame - A Data Frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and columns.
DataFrame can be created using-
 Lists
 Dictionary
 Series
 Numpy ndarrays
 Another DataFrame

18. Write the basic features of Series.


Ans. Series is a one-dimensional array like structure with homogeneous data.
For example, the following series is a collection of integers 10, 23, 56, …

10 23 56 17 52 61

Basic Features of Series are-


 Homogeneous data
 Size Immutable
 Values/Data Mutable
19. Write the Basic features of Data Frame.

Ans. DataFrame is a two-dimensional array with heterogeneous data. For Example-

Emp No Employee Name Department Salary

1001 Amit Kumar Account 34000

1002 Rohit Singh Marketting 38000

1003 Sunil Arora HR 35000

Basic Features of DataFrame are-


 Heterogeneous data
 Size Mutable
 Value/Data Mutable
20. Write the difference between Series and DataFrame. [Most Important]

Chapter 2: Data Handling with Pandas-I 10 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans. Property Series DataFrame

Dimension 1-Dimensional 2-Dimensional

Type of Homogeneous i.e. all the Heterogeneous, i.e. a


Data elements must be of same DataFrame object can have
data type in a Series object. elements of different data
types.
Mutability Value Mutable i.e. their Value mutable, i.e. their
elements value can change. elements value can change.
Size-immutable i.e. size of Size-mutable, i.e. size of a
a Series object, once created, DataFrame object, once
cannot change. If you want created, can change in place.
to add/drop an element, That is, you can add/drop
internally a new Series elements in an existing
object will be created. dataframe object.

21. Write the different ways to create a Series in Pandas?


Ans. Series can be created using-
i) Scalar value or constant
ii) Numpy Array
iii) Dictionary
22. How to create an empty Series?
Ans. A basic series, which can be created is an Empty Series without parameter.
Syntax:
pandas.Series()

#import the pandas library and aliasing as pd


import pandas as pd
s = pd.Series()
print(s)

#OUTPUT
Series([], dtype: float64)
23. How to create non-empty series using range() method?
Ans. To create non-empty series, specify parameters for data and indexes.
Syntax:
pandas.Series(data, index=idx)

# range(n) generates sequences [0,1,2…] where n is length


import pandas as pd
Chapter 2: Data Handling with Pandas-I 11 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
s = pd.Series(range(5))
print(s)
#OUTPUT

# List with 4 values


import pandas as pd
s = pd.Series([3.5,5.,6.5,8.])
print(s)
#OUTPUT
0 3.5
1 5.0
2 6.5
3 8.0
dtype: float64
24. How to Create a Series with ndarray without index?
How to Create a Series with ndarray with index?
Ans. If data is an ndarray, then index passed must be of the same length. If no
index is passed, then by default index will be range(n).
Syntax:
pandas.Series(data,index=idx)
#Creating Series with ndarray without index
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)

#OUTPUT
0 a
1 b
2 c
3 d
dtype: object

Chapter 2: Data Handling with Pandas-I 12 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
#Creating Series with ndarray with index
import pandas as pd
import numpy as np
data = np.array(['a','b','c‟])
s = pd.Series(data,index=[100,101,102])
print (s)

#OUTPUT
100 a
101 b
102 c
dtype: object
25. How to Create a Series from dict?
Ans.  A dict can be passed as input and if no index is specified, then the dictionary
keys are taken in a sorted order to construct index.
 If index is passed, the values in data corresponding to the labels in the index
will be pulled out.
# Create a Series from dict without index
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print (s)

#OUTPUT
a 0.0
b 1.0
c 2.0
dtype: float64

# Create a Series from dict with index


import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print (s)

#OUTPUT
b 1.0
c 2.0
d NaN # Index order is persisted and the missing
a 0.0 element is filled with NaN (Not a Number).
dtype: float64

26. How to Create a Series from Scalar?


Ans. If data is a scalar value, an index must be provided. The value will be repeated
to match the length of index.

Chapter 2: Data Handling with Pandas-I 13 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
#Creating Series from Scalar value
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
t = pd.Series(10, index=range(0,2))
u = pd.Series(15, index=range(1,6,2))
v = pd.Series('Yet to start',
index=['Indore','Kolkata','Delhi'])
print (s)
print (t)
print (u)
print (v)
#OUTPUT
0 5
1 5
2 5
3 5
dtype: int64
0 10
1 10
dtype: int64
1 15
3 15
5 15
dtype: int64
Indore Yet to start
Kolkata Yet to start
Delhi Yet to start
dtype: object

# Specify index(es) as well as data with Series()


import pandas as pd
import numpy as np
arr=[31,28,31,30]
mon=['Jan','Feb','Mar','Apr']
obj = pd.Series(data=arr, index=mon)
print (obj)

#OUTPUT
Jan 31
Feb 28
Mar 31
Apr 30
dtype: int64

Chapter 2: Data Handling with Pandas-I 14 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
# Specify data type along with data and index
import pandas as pd
import numpy as np
arr=[31,28,31,30]
mon=['Jan','Feb','Mar','Apr']
obj = pd.Series(data=arr, index=mon,dtype=np.float64)
print (obj)

#OUTPUT
Jan 31.0
Feb 28.0
Mar 31.0
Apr 30.0
dtype: float64

# Using Mathematical Expression to create data array in Series()


import pandas as pd
import numpy as np
a=np.arange(9,13)
obj = pd.Series(data=a*2, index=a)
print (obj)

#OUTPUT # Try your-self


data=a**2
9 18
10 20
11 22
12 24
dtype: int32

# Duplicate entries can be entering in the index array and Python


won‟t raise any error.
import pandas as pd #The value will
import numpy as np be repeated to
arr=np.arange(2.75, 50, 9.75) match the
print(arr) length of index.
obj = pd.Series(arr,
index=['a','b','a','a','b']) # Duplicate
print (obj) entries in Index
Array
#OUTPUT
[ 2.75 12.5 22.25 32. 41.75]
a 2.75
b 12.50
a 22.25
a 32.00
b 41.75
dtype: float64

Chapter 2: Data Handling with Pandas-I 15 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

# Values would be generated as-


[ 2.75 12.5 22.25 32. 41.75] – How?

Solution:
2.75 + 9.75 = 12.5 + 9.75 = 22.25 + 9.75 = 32.0 + 9.75 =
41.75

27. How to access the elements of a Series?


A There are two common ways for accessing the elements of a series: Indexing
n and Slicing.
s. i) Indexing : Indexes are of two types: Positional index and Labelled
index.
Positional index takes an integer value that corresponds to its position in the
series starting from 0, whereas labelled index takes any user-defined label as
index.
Data in the Series can be accessed by giving its index position in square
brackets.
Position Label Index Data

0 a 1

1 b 2

2 c 3

3 d 4

4 e 5

#Example: Positional Index


import pandas as pd
s = pd.Series([1,2,3,4,5])
#Elements in a Series
print("Series Elements :")
print(s)
#retrieve the first element
print ("\nAccessing First Element : ",s[0])
#retrieve the Fifth element using index position
print ("\nAccessing Fifth Element : ",s[4])
#retrieve the Third element using index position
print ("\nAccessing Third Element : ",s[2])

Chapter 2: Data Handling with Pandas-I 16 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
#OUTPUT
Series Elements :
0 1
1 2
2 3
3 4
4 5
dtype: int64

Accessing First Element s[0] : 1


Accessing Fifth Element s[4] : 5
Accessing Third Element s[2] : 3
Labelled Index, When labels are specified, we can use labels as indices while
selecting values from a Series.
#Example: Labelled Index
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#Elements in a Series
print("Series Elements :")
print(s)
#retrieve the Third element using label index value
print ("\nAccessing Third Element : ",s['c'])

#OUTPUT
Series Elements :
a 1
b 2
c 3
d 4
e 5
dtype: int64
Accessing Third Element : 3
28. How to access the elements from Series using Slices?
Ans. Slice means subset of data.
 Slicing is a powerful way to retrieve subsets of data from a pandas object.
 We can define which part of the series is to be sliced by specifying the start
and end parameters [start :end] with the series name.
 Slicing takes place position wise and not the index wise in a series
object.
 To specify slices as [start : end : step]
 When we use positional indices for slicing, the value at the endindex
position is excluded, i.e., only (end - start) number of data values of the
series are extracted. But if labeled indexes are used for slicing, then value
at the end index label is also included.

Chapter 2: Data Handling with Pandas-I 17 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Position Index Data
0 or -5 a 1
1 or -4 b 2
2 or -3 c 3
3 or -2 d 4
4 or -1 e 5

Example of Positional Indices, endindex position is excluded -


#Retrieve First three elements - S[:3]
Output:
0 or -5 a 1
1 or -4 b 2
2 or -3 c 3
#Retrieve Last three elements - S[-3:]
Output:
2 or -3 c 3
3 or -2 d 4
4 or -1 e 5

#Retrieve elements in between - S[2:4]


Output:
2 or -3 c 3
3 or -2 d 4

#Retrieve step elements in between - S[0::2]


Output :
Position Index Data
0 or -5 a 1
2 or -3 c 3
4 or -1 e 5

Chapter 2: Data Handling with Pandas-I 18 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
#Retrieve elements in reverse - S[::-1]
Output:
Position Index Data
4 or -1 e 5
3 or -2 d 4
2 or -3 c 3
1 or -4 b 2
0 or -5 a 2

Example of Labelled Indices, end index label is also included -


# Retrieve elements using label index : S['b' : 'd']

1 or -4 b 2
2 or -3 c 3
3 or -2 d 4

29. How to use slicing to modify the values of series elements?


Ans. We can also use slicing to modify the values of series elements as shown in the
following example:
>>> import numpy as np
>>> seriesAlph = pd.Series(np.arange(10,16,1),
index = ['a', 'b', 'c', 'd', 'e', 'f'])
>>> seriesAlph
a 10
b 11
c 12
d 13
e 14
f 15
dtype: int32
>>> seriesAlph[1:3] = 50 #excludes the value at the end index position
>>> seriesAlph
a 10
b 50
c 50
d 13
e 14
f 15
dtype: int32

Chapter 2: Data Handling with Pandas-I 19 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
>>> seriesAlph['c':'e'] = 500 #include the end index label
>>> seriesAlph
a 10
b 50
c 500
d 500
e 500
f 15
dtype: int32

30. Write the purpose of using various attributes of Pandas Series.


Ans. We can access certain properties called attributes of a series by using that
property with the series name.
Attributes Purpose
name assigns a name to the Series
index.name assigns a name to the index of the series
values prints a list of the values in the series
size prints the number of values in the Series object
empty prints True if the series is empty, and False otherwise

Example :
>>> import pandas as pd
>>> seriesCapCntry = pd.Series(['New Delhi', 'Washington DC',
'London','Paris'], index=['India', 'USA', 'UK', 'France'])
>>> seriesCapCntry.name = 'Capitals'
>>> seriesCapCntry
India New Delhi
USA Washington DC
UK London
France Paris
Name: Capitals, dtype: object
>>> seriesCapCntry.index.name = 'Countries'
>>> seriesCapCntry
Countries
India New Delhi
USA Washington DC
UK London
France Paris
Name: Capitals, dtype: object
>>> seriesCapCntry.values
array(['New Delhi', 'Washington DC', 'London', 'Paris'],
dtype=object)
>>> seriesCapCntry.size
4
Chapter 2: Data Handling with Pandas-I 20 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
>>> seriesCapCntry.empty
False

31. Write the various operations that can be perform on Series object.
Ans. The various operation on Series object are –
 Modifying Elements of Series object
 Head() and Tail() functions
 Vector Operations on Series Object
32. Write a program to modify element(s) of series object in Python shell?
Ans. >>> import pandas as pd
>>> obj1=pd.Series([4,2,7]
>>> obj1
0 4
1 2
2 7
>>> obj1[1]=8
>>> obj1
0 4
1 8
2 7
33. Write the methods of using head(), tail() and count() function on
pandas series object.
Ans.  Head() function is used to fetch n rows from a pandas object.
 Tail() function returns last n rows from a pandas object
 Count() function returns the number of non-NaN values in the Series
 If you do not provide any value for n, than head() and tail() will return first
5 ad last 5 rows respectively of pandas object.
34. Write a python program where n value is not provided in head() in
pandas series object.
Ans. >>> import pandas as pd
>>> obj3=pd.Series[10,15,20,25,30,35,40,45,50,55,60,65,70,75]
>>> obj3.head() #returns first 5 rows

Output:
0 10
1 15
2 20
3 25
4 30

35. Write a python program where n value is provided in head() in pandas


series object.

Chapter 2: Data Handling with Pandas-I 21 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans. >>> import pandas as pd
>>> obj3=pd.Series[10,15,20,25,30,35,40,45,50,55,60,65,70,75]
>>> obj3.head(7) #returns first 7 rows

Output:
0 10
1 15
2 20
3 25
4 30
5 35
6 40
36. Write a python program where n value is not provided in tail() in
pandas series object.
Ans. >>> import pandas as pd
>>> obj3=pd.Series[10,15,20,25,30,35,40,45,50,55,60,65,70,75]
>>> obj3.tail() #returns last 5 rows

Output:
9 55
10 60
11 65
12 70
13 75
37. Write a python program where n value is provided in tail() in pandas
series object.
Ans. >>> import pandas as pd
>>> obj3=pd.Series[10,15,20,25,30,35,40,45,50,55,60,65,70,75]
>>> obj3.tail(3) #returns last 3 rows

Output:
11 65
12 70
13 75
38. Write a python program to count the number of elements in pandas
series object.
Ans. >>> import pandas as pd
>>> s1=pd.Series([12,15,10])
>>> s2=pd.Series([12,np.nan,10])
>>> s1.count()
3
>>> s2.count()
2
39. What do you meant by Vector or Mathematical Operation on Series
objects?
Ans. Vector Operation means that if you apply a function or expression then it
is individually applied on each item of the object.
The various Vector/Mathematical Operation are –

Chapter 2: Data Handling with Pandas-I 22 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
 Addition operation
 Multiplication operation
 Relational operation
 Square operation
Example 1: Mathematical operation on Pandas Series
>>> import pandas as pd
>>> obj1=pd.Series([4,2,7])
>>> obj2=pd.Series([6,3,5,4])
>>> obj3=pd.Series([8,5,9])
>>> obj1
0 4
1 2
2 7

>>> obj1+2 #Addition operation


0 6
1 10
2 9
dtype: int64

>>> obj1*3 #Multiplication operation


0 12
1 24
2 21
dtype: int64

>>> obj1>15 #Relational operation


0 False
1 False
2 False
dtype: bool

>>> obj1>5 #Relational operation


0 False
1 True
2 True
dtype: bool

>>> obj1**2 #Square operation


0 16
1 4
2 49
dtype: int64
>>> obj1
0 4
1 2
2 7
Arithmetic Operations on Series object

>>> obj1+obj2 #Addition of objects


0 10.0
Chapter 2: Data Handling with Pandas-I 23 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
1 5.0
2 12.0
3 NaN
dtype: float64

>>> obj1*obj3 #Multiplication of obj


0 12
1 7
2 16
dtype: int64

Example 2: Mathematical operation on Pandas Series

While performing mathematical operations on series, index matching is


implemented and all missing values are filled in with NaN by default.
>>>import pandas as pd
>>> seriesA = pd.Series([1,2,3,4,5], index =
['a', 'b', 'c', 'd', 'e'])
>>> seriesA
a 1
b 2
c 3
d 4
e 5
dtype: int64
>>> seriesB = pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e'])
>>> seriesB
z 10
y 20
a -10
c -50
e 100
dtype: int64

# Addition Operation of two Series


Method 1 : Output of addition is NaN if one of the elements or both elements
have no value.

>>> seriesA + seriesB


a -9.0
b NaN
c -47.0
d NaN
e 105.0
y NaN
z NaN
dtype: float64

Chapter 2: Data Handling with Pandas-I 24 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

Details of addition of two series


Index SeriesA SeriesB SeriesA+SeriesB
a 1 -10 -9.0
b 2 NaN
c 3 -50 -47
d 4 NaN
e 5 100 105.0
y 20 NaN
z 10 NaN

Method 2 : When we do not want to have NaN values in the output.


We can use the series method add() and a parameter fill_value to replace
missing value with a specified value.

>>> seriesA.add(seriesB, fill_value=0)


a -9.0
b 2.0
c -47.0
d 4.0
e 105.0
y 20.0
z 10.0
dtype: float64

Details of addition of two series


Index SeriesA SeriesB SeriesA+SeriesB
a 1 -10 -9.0
b 2 0 2.0
c 3 -50 -47
d 4 0 4.0
e 5 100 105.0
y 0 20 20.0
z 0 10 10.0

# Subtraction Operation of two Series


Method 1 : Output of subtraction is NaN if one of the elements or both
elements have no value.

>>> seriesA – seriesB


a 11.0
b NaN
c 53.0
d NaN
e -95.0
y NaN
z NaN
dtype: float64

Method 2 : When we do not want to have NaN values in the output.


We can use the series method sub() and a parameter fill_value to replace
Chapter 2: Data Handling with Pandas-I 25 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
missing value with a specified value.

# using fill value 1000 while making explicit call of the


method

>>> seriesA.sub(seriesB, fill_value=1000)


a 11.0
b -998.0
c 53.0
d -996.0
e -95.0
y 980.0
z 990.0
dtype: float64

# Multiplication Operation of two Series


Method 1 : Output of multiplication is NaN if one of the elements or both
elements have no value.

>>>seriesA * seriesB
a -10.0
b NaN
c -150.0
d NaN
e 500.0
y NaN
z NaN
dtype: float64

Method 2 : When we do not want to have NaN values in the output.


We can use the series method mul() and a parameter fill_value to replace
missing value with a specified value.

# using fill value 0 while making explicit call of the method

>>> seriesA.mul(seriesB, fill_value=0)


a -10.0
b 0.0
c -150.0
d 0.0
e 500.0
y 0.0
z 0.0
dtype: float64

# Division Operation of two Series


Method 1 : Output of division is NaN if one of the elements or both elements
have no value.

>>> seriesA/seriesB
a -0.10

Chapter 2: Data Handling with Pandas-I 26 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
b NaN
c -0.06
d NaN
e 0.05
y NaN
z NaN
dtype: float64

Method 2 : When we do not want to have NaN values in the output.


We can use the series method div() and a parameter fill_value to replace
missing value with a specified value.

>>> seriesA.div(seriesB, fill_value=0)


a -0.10
b inf
c -0.06
d inf
e 0.05
y 0.00
z 0.00
dtype: float64
40. What is DataFrame? Write the different ways to create DataFrame?
Ans. A Data Frame is a two-dimensional labelled data structure, i.e., data is
aligned in a tabular fashion in rows and columns. Each column can have a
different type of value such as numeric, string, boolean, etc.
Syntax:
pandas.DataFrame(data, index, column, dtype, copy)
DataFrame can be created using-
• Lists
• Dict
• Series
• Numpy ndarrays
Another DataFrame
41. How to create an empty dataframe?
Ans. A basic DataFrame, which can be created, is an Empty DataFrame without
parameter.
Syntax:
pandas.DataFrame()

#Empty dataframe without paramenter


import pandas as pd #import the pandas library and aliasing as pd
df = pd.DataFrame()
print(df)

#Output:
Chapter 2: Data Handling with Pandas-I 27 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Empty DataFrame
Columns: []
Index: []
42. How to Create a DataFrame from List?
Ans. A basic DataFrame, which can be created, is a DataFrame with parameter.
Syntax:
pandas.DataFrame(data)
#Create a DataFrame with parameter as list data
import pandas as pd
data = [1,2,3,4,5] # List
df = pd.DataFrame(data)
print(df)

#Output:
0
0 1
1 2
2 3
3 4
4 5
#Create a DataFrame with parameter as list data and column name
import pandas as pd
data = [['Freya',10],['Mohak',12],['Dwivedi',13]]
df = pd.DataFrame(data, columns=['Name','Age'])
print (df)

#Output:
Name Age
0 Freya 10
1 Mohak 12
2 Dwivedi 13
#Create a DataFrame from Dict of ndarrays with Column name
import pandas as pd
data = {'Name':['Freya', 'Mohak'],'Age':[9,10]}
df = pd.DataFrame(data)
print (df)

#Output:
Name Age
0 Freya 9
1 Mohak 10

Chapter 2: Data Handling with Pandas-I 28 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
#Create a DataFrame from Dict of ndarrays with Column and Row
Labels.
import pandas as pd
data = {'Name':['Freya', 'Mohak'],'Age':[9,10]}
df = pd.DataFrame(data, index=['Rank1','Rank2']))
print (df)
#Output:
Name Age
Rank1 Freya 9
Rank2 Mohak 10

#Create a DataFrame from List of Dictionaries


import pandas as pd
data = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z': 5}]
df = pd.DataFrame(data)
print (df)
#Output:
x y z
0 1 2 NaN
1 5 4 5.0
#Create a DataFrame from Dictionary of lists
import pandas as pd
data = {'Name':['Freya', 'Mohak'],'Age':[9,10]}
df = pd.DataFrame(data)
print (df)

#Output:
Name Age
0 Freya 9
1 Mohak 10
43. How to Create a DataFrame from Series?
Ans. Consider the following three Series:
>>> import pandas as pd
>>> seriesA = pd.Series([1,2,3,4,5],
index = ['a', 'b', 'c', 'd', 'e'])
>>> seriesB = pd.Series ([1000,2000,-1000,-5000,1000],
index = ['a', 'b', 'c', 'd', 'e'])
>>> seriesC = pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e'])

Chapter 2: Data Handling with Pandas-I 29 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
>>> dFrame6 = pd.DataFrame(seriesA) #DataFrame using a single series
>>> dFrame6
0
a 1
b 2
c 3
d 4
e 5
Here, the DataFrame dFrame6 has as many numbers of rows as the numbers of
elements in the series, but has only one column.

To pass multiple series in the list as shown below-


>>> dFrame7 = pd.DataFrame([seriesA, seriesB])
>>> dFrame7
a b c d e
0 1 2 3 4 5
1 1000 2000 -1000 -5000 1000

Here, labels in the series object become the column names in the DataFrame
object (dFrame7) and each series becomes a row in the DataFrame.

>>> dFrame8 = pd.DataFrame([seriesA, seriesC])


>>> dFrame8
a b c d e z y
0 1.0 2.0 3.0 4.0 5.0 NaN NaN
1 -10.0 NaN -50.0 NaN 100.0 10.0 20.0

Here, different series do not have the same set of labels. But, the number of
columns in a DataFrame equals to distinct labels in all the series. So, if a
particular series does not have a corresponding value for a label, NaN is inserted
in the DataFrame column.
44. How to Create of DataFrame from Dictionary of Series?
Ans. A dictionary of series can also be used to create a DataFrame.

Example :
>>> import pandas as pd
>>> ResultSheet={'Arnab': pd.Series([90, 91, 97],
index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96],
index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88],
index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67],
index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99],
index=['Maths','Science','Hindi'])}
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF

Chapter 2: Data Handling with Pandas-I 30 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

Arnab Ramit Samridhi Riya Mallika


Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

When a DataFrame is created from a Dictionary of Series, the resulting index


or row labels are a union of all series indexes used to create the DataFrame.

For example:

>>> dictForUnion = { 'Series1' : pd.Series([1,2,3,4,5],


index = ['a', 'b', 'c', 'd', 'e']) ,
'Series2' : pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e']),
'Series3' : pd.Series([10,20,-10,-50,100],
index = ['z', 'y', 'a', 'c', 'e']) }
>>> dFrameUnion = pd.DataFrame(dictForUnion)
>>> dFrameUnion
Series1 Series2 Series3
a 1.0 -10.0 -10.0
b 2.0 NaN NaN
c 3.0 -50.0 -50.0
d 4.0 NaN NaN
e 5.0 100.0 100.0
y NaN 20.0 20.0
z NaN 10.0 10.0

OPERATIONS ON ROWS AND COLUMNS IN DATAFRAME

A Data frame is a two-dimensional data structure, i.e., data is aligned in a


tabular format in rows and columns. We can perform basic operations on
rows/columns like selecting, deleting, adding, and renaming.
A. OPERATIONS ON COLUMNS IN PANDAS DATAFRAME
i) Column Selection:
 In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.
 Syntax 1 :
df[[„ColumnName1‟,ColumnName2,...,‟ColumnName_nth‟]
]
 Syntax 2 :
df[df.columns[start:end]]

Chapter 2: Data Handling with Pandas-I 31 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Example 1 :
# Import pandas package
import pandas as pd

# Define a dictionary containing employee data


data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Age':[27, 24, 22, 32], \
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],\
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Displaying all rows and columns of DataFrame


print(df)

Name Age Address Qualification


0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd

# select all rows with selected two columns


print(df[['Name', 'Qualification']])

Name Qualification
0 Jai Msc
1 Princi MA
2 Gaurav MCA
3 Anuj Phd

ii) Column Addition:


 In Order to add a column in Pandas DataFrame, we can declare a new
list as a column and add to an existing Dataframe.
 Assigning values to a new column label that does not exist will create a
new column at the end.
 Syntax 1:
df[‘ColumnName’]=value or [List of Values]
 Syntax 2:
df.insert(index,’ColumnName’,[List_Value])
 Syntax 3:
df.assign(ColumnName=[List_Value])

Chapter 2: Data Handling with Pandas-I 32 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Example 2 : Adding Column
# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Height': [5.1, 6.2, 5.1, 5.2], \
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# Declare a list that is to be converted into a column
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
# Using 'Address' as the column name and equating it to the
list
df['Address'] = address
# Observe the result
print(df)

Output:
Name Height Qualification Address
0 Jai 5.1 Msc Delhi
1 Princi 6.2 MA Bangalore
2 Gaurav 5.1 Msc Chennai
3 Anuj 5.2 Msc Patna

Example 3 : Adding Columns using insert() method


# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Height': [5.1, 6.2, 5.1, 5.2], \
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# Using DataFrame.insert() to add a column
df.insert(2,"Age", [21, 23, 24, 21])
# Observe the result
print(df)

Chapter 2: Data Handling with Pandas-I 33 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Output:
Name Height Age Qualification
0 Jai 5.1 21 Msc
1 Princi 6.2 23 MA
2 Gaurav 5.1 24 Msc
3 Anuj 5.2 21 Msc

Example 4: Adding Column using assign() method


# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Height': [5.1, 6.2, 5.1, 5.2], \
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# Using 'Address' as the column name and equating it to the
list
df2 = df.assign(Address = ['Delhi', 'Bangalore', 'Chennai',
'Patna'])
# Observe the result
print(df2)
Output:
Name Height Qualification Address
0 Jai 5.1 Msc Delhi
1 Princi 6.2 MA Bangalore
2 Gaurav 5.1 Msc Chennai
3 Anuj 5.2 Msc Patna

iii) Column value Updatation :


 If the column already exists in the DataFrame then the assignment
statement will update the values of the already existing column.
df [column_name]=[values]
Example :
df2 [„Height‟]=[5.3,6.4,5.2,5.3]
Output:
Name Height Qualification Address
0 Jai 5.3 Msc Delhi
1 Princi 6.4 MA Bangalore
2 Gaurav 5.2 Msc Chennai
3 Anuj 5.3 Msc Patna

iv) Column Deletion :


Chapter 2: Data Handling with Pandas-I 34 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
 In Order to delete a column in Pandas DataFrame, we can use
the drop() method. Columns are deleted by dropping columns with
column names.
 Syntax 1:
df.drop(labels=None, axis=0,inplace=False)
Here,
labels: String or list of strings referring row or column name.
axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
inplace: Makes changes in original Data Frame if True.
Example 5: Deleting Column
# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Height': [5.1, 6.2, 5.1, 5.2],\
'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],\
'Address': ['Delhi', 'Bangalore', 'Chennai',
'Patna']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print(df)
# dropping passed columns
df.drop(["Height"], axis = 1, inplace = True)
# Observe the result
print(df)
Output:
Name Qualification Address
0 Jai Msc Delhi
1 Princi MA Bangalore
2 Gaurav Msc Chennai
3 Anuj Msc Patna

v) Column Rename :
 In Order to rename a column in Pandas DataFrame, we can either
rename the column by using rename() function or assigning a list of
new column names to the Column attribute.
 Syntax 1 : Rename Single Column
df.rename(columns={„Old_Col_Name‟:‟New_col_Name‟}, inplace
= True)

Chapter 2: Data Handling with Pandas-I 35 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
 Syntax 2 : By assigning a list of new column names
df.columns=[„newColName1‟,‟newColName_nth‟]
Example 6 : Renaming Single Column
# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Height': [5.1, 6.2, 5.1, 5.2],\
'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],\
'Address': ['Delhi', 'Bangalore', 'Chennai',
'Patna']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# Before renaming the columns
print(df)
df.rename(columns = {'Qualification':'Qual'}, inplace = True)
# After renaming the columns
print("\nAfter modifying column:\n", df.columns)
print(df)
Output:
Name Height Qualification Address
0 Jai 5.1 Msc Delhi
1 Princi 6.2 MA Bangalore
2 Gaurav 5.1 Msc Chennai
3 Anuj 5.2 Msc Patna

After modifying column:


Index(['Name', 'Height', 'Qual', 'Address'], dtype='object')
Name Height Qual Address
0 Jai 5.1 Msc Delhi
1 Princi 6.2 MA Bangalore
2 Gaurav 5.1 Msc Chennai
3 Anuj 5.2 Msc Patna

Example 7 : Renaming Columns by assigning a list of new column


names.
# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\

Chapter 2: Data Handling with Pandas-I 36 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
'Height': [5.1, 6.2, 5.1, 5.2],\
'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],\
'Address': ['Delhi', 'Bangalore', 'Chennai',
'Patna']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# Before renaming the columns
print(df)
df.columns = ['Sname', 'Hgt', 'Qual','Addr']
# After renaming the columns
print("\nAfter modifying column:\n", df.columns)
print(df)
Output :
Name Height Qualification Address
0 Jai 5.1 Msc Delhi
1 Princi 6.2 MA Bangalore
2 Gaurav 5.1 Msc Chennai
3 Anuj 5.2 Msc Patna

After modifying column:


Index([„Sname', „Hgt', 'Qual', 'Addr'], dtype='object')
Sname Hgt Qual Addr
0 Jai 5.1 Msc Delhi
1 Princi 6.2 MA Bangalore
2 Gaurav 5.1 Msc Chennai
3 Anuj 5.2 Msc Patna

Example 8 : Renaming Columns and index by assigning a list of new


column names and new index values.
import pandas as pd
df = pd.DataFrame({"A": ["P01", "P02","P03"],
"B": ["Pen", "Pencil", "Eraser"]})
df=df.rename(columns={"A": "PID", "B": "PNAME"})
df=df.rename(index={0: 'A', 1: 'B', 2: 'C'})
print(df)

Output :
PID PNAME
A P01 Pen
B P02 Pencil
C P03 Eraser

Chapter 2: Data Handling with Pandas-I 37 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

B. OPERATIONS ON ROWS IN PANDAS DATAFRAME


i) Retrieve Rows
 Pandas provide a unique method to retrieve rows from a Data frame.
 DataFrame.loc[] method is used to retrieve rows from Pandas
DataFrame. Rows can also be selected by passing integer location to
an iloc[] function.
 Syntax 1 :
df.loc[df[„column name‟] condition]
Example 1 : Retrieving rows using loc[] and iloc[]
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# retrieving row by loc method
first = df.loc[1:3]
print(first, end='\n\n')
# retrieving row by iloc method
second = df.iloc[1:3]
print(second, end='\n\n')
Output:
# retrieving row by loc method
Name Age Address Qualification
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd

# retrieving row by iloc method


Name Age Address Qualification
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
Example 2 : Retrieving rows using loc[] based on Condition
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
Chapter 2: Data Handling with Pandas-I 38 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# retrieving row by loc method based on condition
third = df.loc[df['Age']!=22]
print(third)
Output:
Name Age Address Qualification
0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
3 Anuj 32 Kannauj Phd
ii) Row Addition or Concatenation:
 In Order to add a Row in Pandas DataFrame, we can concat the old
dataframe with new one by using concat() method or using
DataFrame.loc[] method.
 Syntax 1 :
df.concat([df_objs], axis=0, ignore_index=False)
 Syntax 2 :
df.loc[index/ColumnName]=[List_Values]
Example 3 : Adding or concatenating rows using concat().
import pandas as pd
df1= pd.DataFrame(["First","Second"],columns=['Col'])
df2= pd.DataFrame(["Third","Fourth"],columns=['Col'])
df = pd.concat([df2, df1], axis=0, ignore_index=True)
print(df)
Output :
Col
0 Third
1 Fourth
2 First
3 Second
df

Col
df1 df2
0 Third
Col Col
+ = 1 Fourth
0 First 0 Third
2 First
1 Second 1 Fourth
3 Second

Chapter 2: Data Handling with Pandas-I 39 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Example 4 : Adding rows using loc[] method.
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad',
'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
#Adding a row to an existing DataFrame
df.loc['A']=['Taposh',47,'Jorhat','MCA']
print(df)
Output :
Name Age Address Qualification
0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd
A Taposh 47 Jorhat MCA <- New Row Added

iii) Row Deletion :


 In Order to delete a row in Pandas DataFrame, we can use
the drop() method. Rows are deleted by dropping rows with index label.
 Syntax 1:
df.drop(labels=None, axis=0,inplace=False)
Here,
labels: String or list of strings referring row or column name.
axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
inplace: Makes changes in original Data Frame if True.

Example 5 : Deleting Row


# Import pandas package
import pandas as pd
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],\
'Height': [5.1, 6.2, 5.1, 5.2],\

Chapter 2: Data Handling with Pandas-I 40 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],\
'Address': ['Delhi', 'Bangalore', 'Chennai',
'Patna']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
#Deleting row
df.drop(['A'], axis = 0, inplace = True)
print(df)
Output:
Name Age Address Qualification
0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd
A Taposh 44 Jorhat MCA

#After Delete Row


Name Age Address Qualification
0 Jai 27 Delhi Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd

45. How to Accessing DataFrames Element through Indexing?


Ans. Data elements in a DataFrame can be accessed using indexing. There are two
ways of indexing Dataframes : Label based indexing and Boolean Indexing.
(a) Label Based Indexing
There are several methods in Pandas to implement label based indexing.
DataFrame.loc[ ] is an important method that is used for label based indexing
with DataFrames.
Example :
# A single row label returns the row as a Series.
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

Chapter 2: Data Handling with Pandas-I 41 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

>>> ResultDF.loc['Science']
Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95
Name: Science, dtype: int64
# When a single column label is passed, it returns the column as a
Series.
>>> ResultDF.loc[:,'Arnab']

Maths 90
Science 91
Hindi 97
Name: Arnab, dtype: int64

# To read more than one row from a DataFrame, a list of row labels is
used
>>> ResultDF.loc[['Science', 'Hindi']]
Arnab Ramit Samridhi Riya Mallika
Science 91 81 91 71 95
Hindi 97 96 88 67 99

(b) Boolean Indexing


Boolean means a binary variable that can represent either of the two states -
True (indicated by 1) or False (indicated by 0). In Boolean indexing, we can
select the subsets of data based on the actual values in the DataFrame rather
than their row/column labels. Thus, we can use conditions on column names to
filter data values.
Example:
# Consider the DataFrame ResultDF, the following statement displays
True or False depending on whether the data value satisfies the given
condition or not.
>>> ResultDF.loc['Maths'] > 90
Arnab False
Ramit True
Samridhi False
Riya False
Mallika True
Name: Maths, dtype: bool

# To check in which subjects „Arnab‟ has scored more than 90


>>> ResultDF.loc[:,„Arnab‟]>90
Maths False
Science True
Hindi True
Name: Arnab, dtype: bool

Chapter 2: Data Handling with Pandas-I 42 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
46. How to Accessing DataFrames Element through Slicing?
Ans. Slicing is used to select a subset of rows and/or columns from a DataFrame.
For example:
# To retrieve a set of rows, slicing can be used with row labels.
>>> ResultDF.loc['Maths': 'Science']
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95

# To use a slice of labels with a column name to access values of those


rows in that column only.
Q. Displays the rows with label Maths and Science, and column with label
Arnab.
>>> ResultDF.loc['Maths': 'Science', „Arnab‟]
Maths 90
Science 91
Name: Arnab, dtype: int64

# To use a slice of labels with a slice of column names to access values


of those rows and columns.
>>> ResultDF.loc['Maths': 'Science', „Arnab‟:‟Samridhi‟]
Arnab Ramit Samridhi
Maths 90 92 89
Science 91 81 91

# To use a slice of labels with a list of column names to access values of


those rows and columns.
>>> ResultDF.loc['Maths': 'Science',[„Arnab‟,‟Samridhi‟]]
Arnab Samridhi
Maths 90 89
Science 91 91

47. Name the method to Joining, Merging and Concatenation of


DataFrames.
Ans. Joining is use the pandas.DataFrame.append() method to merge two
DataFrames. It appends rows of the second DataFrame at the end of the first
DataFrame. Columns not present in the first DataFrame are added as new
columns.
Example : Consider the two DataFrames – DF1 and DF2
# To append DF2 to DF1.
>>> DF1=pd.DataFrame([[1, 2, 3], [4, 5],[6]],
columns=['C1', 'C2', 'C3'], index=['R1', 'R2', 'R3'])
>>> DF1

Chapter 2: Data Handling with Pandas-I 43 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
C1 C2 C3
R1 1 2.0 3.0
R2 4 5.0 NaN
R3 6 NaN NaN

>>> DF2=pd.DataFrame([[10, 20], [30], [40,50]],


columns=['C2', 'C5'], index=['R4', 'R2','R5'])

>>> DF2
C2 C5
R4 10 20.0
R2 30 NaN
R5 40 50.0

>>> DF1=DF1.append(DF2) #append DF2 to DF1


>>> DF1
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0

# To append DF1 to DF2, the rows of DF2 precedes the rows of DF1.
To get the column labels appear in sorted order we can set the parameter
sort=True.

# append DF1 to DF2 with sort=True


>>> DF2 =DF2.append(DF1, sort=‟True‟)
>>> DF2
C1 C2 C3 C5
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN

# append DF1 to DF2 with sort=False


>>> DF2 = DF2.append(DF1, sort=’False’)
>>> DF2
C2 C5 C1 C3
R4 10.0 20.0 NaN NaN
R2 30.0 NaN NaN NaN
R5 40.0 50.0 NaN NaN
R1 2.0 NaN 1.0 3.0
R2 5.0 NaN 4.0 NaN
R3 NaN NaN 6.0 NaN
# The parameter ignore_index of append() method may be set to

Chapter 2: Data Handling with Pandas-I 44 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
True, when we do not want to use row index labels.
By default, ignore_index = False.
>>> DF1 = DF1.append(DF2, ignore_index=True)
>>> DF1
C1 C2 C3 C5
0 1.0 2.0 3.0 NaN
1 4.0 5.0 NaN NaN
2 6.0 NaN NaN NaN
3 NaN 10.0 NaN 20.0
4 NaN 30.0 NaN NaN
5 NaN 40.0 NaN 50.0

48. Write the Attributes of DataFrames.


Ans. Like Series, we can access certain properties called attributes of a DataFrame
by using that property with the DataFrame name.
Attributes of Pandas DataFrame
Attributes Purpose
DataFrame.index to display row labels
DataFrame.columns to display column labels
DataFrame.dtypes to display data type of each column in the
DataFrame
DataFrame.values to display a NumPy ndarray having all
the values in the DataFrame, without the
axes labels
DataFrame.shape
to display a tuple representing the
dimensionality of the DataFrame
DataFrame.size to display a tuple representing the
dimensionality of the DataFrame
DataFrame.T to transpose the DataFrame. Means, row
indices and column labels of the
DataFrame replace each other‟s position

Example: Consider the following DataFrame ForestAreaDF


>>> ForestArea = {
'Assam' :pd.Series([78438, 2797,
10192, 15116], index = ['GeoArea', 'VeryDense',
'ModeratelyDense', 'OpenForest']),
'Kerala' :pd.Series([ 38852, 1663,
9407, 9251], index = ['GeoArea' ,'VeryDense',
'ModeratelyDense', 'OpenForest']),
'Delhi' :pd.Series([1483, 6.72, 56.24,
129.45], index = ['GeoArea', 'VeryDense',
'ModeratelyDense', 'OpenForest'])}

Chapter 2: Data Handling with Pandas-I 45 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
>>> ForestAreaDF = pd.DataFrame(ForestArea)
>>> ForestAreaDF
Assam Kerala Delhi
GeoArea 78438 38852 1483.00
VeryDense 2797 1663 6.72
ModeratelyDense 10192 9407 56.24
OpenForest 15116 9251 129.45

>>> ForestAreaDF.index
Index([‘GeoArea’, ‘VeryDense’, ‘ModeratelyDense’,
‘OpenForest’], dtype =’object’)

>>> ForestAreaDF.columns
Index(['Assam', 'Kerala', 'Delhi'], dtype='object')

>>> ForestAreaDF.dtypes
Assam int64
Kerala int64
Delhi float64
dtype: object

>>> ForestAreaDF.values
array([[7.8438e+04, 3.8852e+04, 1.4830e+03],
[2.7970e+03, 1.6630e+03, 6.7200e+00],
[1.0192e+04, 9.4070e+03, 5.6240e+01],
[1.5116e+04, 9.2510e+03, 1.2945e+02]])

>>> ForestAreaDF.shape
(4, 3)

>>> ForestAreaDF.size
12

>>> ForestAreaDF.T
GeoArea VeryDense ModeratelyDense OpenForest
Assam 78438.0 2797.00 10192.00 15116.00
Kerala 38852.0 1663.00 9407.00 9251.00
Delhi 1483.0 6.72 56.24 129.45

>>> ForestAreaDF.head(2)
Assam Kerala Delhi
GeoArea 78438 38852 1483.00
VeryDense 2797 1663 6.72
>>> ForestAreaDF.tail(2)
Assam Kerala Delhi
ModeratelyDense 10192 9407 56.24
OpenForest 15116 9251 129.45

49. What is CSV file?

Chapter 2: Data Handling with Pandas-I 46 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans. CSV (Comma Separated Values) is a simple file format used to store
tabular data, such as a spreadsheet or database.
A CSV file stores tabular data (numbers and text) in plain text. Each line of the
file is a data record. Each record consists of one or more fields, separated by
commas. The use of the comma as a field separator is the source of the name for
this file format.
For working CSV files in Python, there is an in-built module called CSV used to
exchange data between different applications.
50. Explain briefly the CSV format of storing files.
Ans. CSV format is a kind of tabular data separated by comma and is stored in the
form of plain text.
In CSV file format -
 Each row of the table is stored in one row.
 The field-values of a row are stored together with comma after every field
value.
#Illustration:

51. What are the advantages of CSV file formats?


Ans. Advantages of CSV file format are -
 A simple, compact and ubiquitous (universal) format for data storage.
 A common format for data interchange.
 It can be opened in popular spreadsheet packages like MS Excel, Open
Office-Calc, etc.
 Nearly all spreadsheets and databases support import/export to CSV
format.
52. Name the function to read to CSV file to Panda Dataframe.
Ans. read_csv()

53. What is the purpose of using read_csv() method in Python Pandas


dataframe?
Ans. The read_csv() function loads the data in a Pandas dataframe.

54. Write a program to create and open “Student.csv” file using Pandas.

Chapter 2: Data Handling with Pandas-I 47 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

File Name : Student.csv


A import pandas as pd
n df=pd.read_csv("D:\\Python\\student.csv" ,sep =",", header=0)
s. print(df)
RollNO Name Marks
0 1 Anil 89.0
1 2 Bunty 68.0
2 3 Harish 98.0
3 4 Gautom NaN
4 5 Krishna 91.0

Note :
 Missing values from the CSV file shall be treated as NaN (Not a Number)
in Pandas dataframe.
 The read_csv() method automatically takes the first row of the CSV
file and assigns it as the dataframe header.
 The parameter sep specifies whether the values are separated by
comma, semicolon, tab, or any other character. The default value for
sepis a space.
 The parameter header specifies the number of the row whose values are
to be used as the column names. It also marks the start of the data to be
fetched. header=0 implies that column names are inferred from the first
line of the file. By default, header=0.
We can exclusively specify column names using the parameter names while
creating the DataFrame using the read_csv() function.
df1=pd.read_csv("D:\\Python\\student.csv" ,sep =",", names=
[‘RNo’, ’StdName’,’Sub1’])
print(df1)
RNO StdName Sub1
0 1 Anil 89.0
1 2 Bunty 68.0
2 3 Harish 98.0
3 4 Gautom NaN
4 5 Krishna 91.0

55. What is the use of SHAPE command in Pandas Dataframe?

Chapter 2: Data Handling with Pandas-I 48 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans. Shape command: To display the shape (number of rows and columns) of the
CSV file. Example -
>>> import pandas as pd
>>> df=pd.read_csv("D:\\Python\\student.csv")
>>> df.shape
(5,3)
>>> row,column = df.shape
>>> row
5
>>> column
3

56. What is the purpose of using usecols attribute in read_csv() method?


Ans. usecols attribute along with read_csv() method is used to reading CSV file
with specific/selected columns. Example –
>>> import pandas as pd
>>> df=pd.read_csv("D:\\Python\\student.csv",
usecols=['Name','Marks'])
>>> df
Name Marks
0 Anil 89.0
1 Bunty 68.0
2 Harish 98.0
3 Gautom NaN
4 Krishna 91.0
57. What is the purpose of using nrows attribute in read_csv() method?
A nrows attribute along with read_csv() method is used to display CSV file
n with specific/selected rows. Example -
s. >>> import pandas as pd
>>> df=pd.read_csv("D:\\Python\\student.csv“,nrows=3)
>>> df
RollNO Name Marks
0 1 Anil 89.0
1 2 Bunty 68.0
2 3 Harish 98.0

58. How to read CSV file without header in Pandas Dataframe?


Ans. Reading CSV file without header can be done by specifying None argument
for „header‟ option or „skiprows‟ option using read_csv() method. Example -
>>> import pandas as pd
>>> df=pd.read_csv("D:\\Python\\student.csv", header=none)
>>> df
0 1 2
0 1 Anil 89.0
1 2 Bunty 68.0
2 3 Harish 98.0
3 4 Gautom NaN
4 5 Krishna 91.0

Chapter 2: Data Handling with Pandas-I 49 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
59. How to read CSV file without index in Pandas Dataframe?
Ans. Reading CSV file without index can be done by specifying the attribute
index_col = 0 using the read_csv() method. Example -
>>> import pandas as pd
>>> df=pd.read_csv("D:\\Python\\student.csv", index_col=0)
>>> df
RollNO Name Marks
1 Anil 89.0
2 Bunty 68.0
3 Harish 98.0
4 Gautom NaN
5 Krishna 91.0
60. Write a program to create a CSV file from dataframe using dictionary.
Ans. #To create a StudEnrollment CSV file from dataframe.
import pandas as pd
StudAdm ={'AdmNo':[101,102,103,104],\
'StudName':['Anita','Anil','Bijay','Chiran'],\
'dob':['12-05-2002', '22-09-2001', '15-04-2002','25-01-
2001'],\
'Class':['XI','XII','XI','XII']}
df = pd.DataFrame(StudAdm, columns=['AdmNo','StudName','dob',
'Class'])
print(df)

Output:
AdmNo StudName dob Class
0 101 Anita 12-05-2002 XI
1 102 Anil 22-09-2001 XII
2 103 Bijay 15-04-2002 XI
3 104 Chiran 25-01-2001 XII
#Create csv file using to_csv()
df.to_csv(“D:\\Python\\StudEnrollment.csv”)
or,
df.to_csv(path_or_buf='D:/Python/StudEnrollment.csv',
sep=',')
This creates a file by the name StudEnrollment.csv in the folder D:/Python on
the hard disk. When we open this file in any text editor or a spreadsheet, we
will find the above data along with the row labels and the column headers,
separated by comma.
Output :
# To open StudEnrollment.csv in Excel Spreadsheet –

Chapter 2: Data Handling with Pandas-I 50 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

# To open StudEnrollment.csv in Text Editor i.e. Notepad or Notepad++

In case, we do not want the column names to be saved to the file we may use
the parameter header=False.
Another parameter index=False is used when we do not want the row labels
to be written to the file on disk. For Example -
df.to_csv( 'd:/python/Student.txt', sep = '@', header =
False, index= False)

61. Which package is required to establish connection between Pandas


and SQL?
Ans. Before establishing the connection between Pandas and SQL, we have to
install any of the following packages in a virtual environment:
• MySQLdb – MySQLdb is the Python module to work with MySQL
databases. It is one of the most commonly used Python packages for
MySQL.
• MySQL-connector-python – This package contains the mysql.connector
module, which is entirely written in Python.
• PyMySQL – This package contains PyMySQL module. It has been
designed as a replacement for MySQLdb.
62. How to install mysql-connector with Python?

Chapter 2: Data Handling with Pandas-I 51 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans.  To connect Python to MySQL, we have to install mysql-connector using
„pip‟ command on the command prompt (cmd) as-
 C:\Users\your name\AppData\Local\Programs\ Python\Python37-
32\python–m pip install mysql-connector
 Once done, we need to check whether it has been properly installed or not.
63. How you can determine that the mysql python connector or driver is
installed successfully?
Ans. Type import mysql-connector, using Python shell. If no error message gets
displayed, then this signifies that the driver has been successfully installed.
64. Write the steps to install MySQL-Python connector.
Ans. In a nutshell, three things are to be kept in mind for the successful installation
of mysql connector:
• Download Python 3.x and then install it.
• Download MySQL API, exe file will be downloaded; install it.
• Install MySQL-Python Connector
• Now connect MySQL Server using Python.
65. Which method is used to read SQL table as DataFrame using pandas?
Ans. Reading MySQL Table as DataFrame :
 Like CSV file, a database table can be read as a dataframe using pandas
read_sql() method. The syntax is:
pandas.read_sql(sql command, connection_obj)
 read_sql() method returns a Pandas dataframe object.
Example –
#Reading MySql Table as DataFrame
import pandas as pd
import mysql.connector
con=mysql.connector.connect(host="localhost",user="root",
passwd="root", database="studentattendance")
df=pd.read_sql("Select * from stud",con)
print(df)

Chapter 2: Data Handling with Pandas-I 52 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

66. How to write data into SQL table using Dataframe using sqlalchemy?
Ans. Writing data into an sql table using dataframe :
 .to_sql() method is used to write data from a dataframe into an sql
table.
67. Write Menu-driven program to demonstrate four major operations
performed on a table through MySQL-Python connectivity.
Ans. def menu():
c='y'
while (c=='y'):
print ("1. Add record")
print ("2. Update record ")
print ("3. Delete record")
print("4. Display records")
print("5. Exiting")
choice=int(input("Enter your choice: "))
if choice == 1:
adddata()
elif choice== 2:
#updatedata()
udata()
elif choice== 3:
deldata()
elif choice== 4:
fetchdata()
elif choice == 5:
print("Exiting")
break
else:
print("wrong input")
c=input("Do you want to continue or not: ")

def fetchdata():
import mysql.connector
try:
Chapter 2: Data Handling with Pandas-I 53 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
db = mysql.connector.connect(user='root',
password='root', \

host='localhost',database='test')
print("Database Connected")
cursor = db.cursor()
sql = "SELECT * FROM student"
#cursor.execute(sql)
#results = cursor.fetchall()
#for x in results:
# print(x)

cursor.execute(sql)
results = cursor.fetchall()
print ("Name","\t","Stipend","\t","Stream","\t",
"Average Marks","\t",
"Grade","\t", "Class")
print ("~~~~","\t","~~~~~~~","\t","~~~~~~","\t",
"~~~~~~~~~~~~~","\t",
"~~~~~","\t", "~~~~~")
for cols in results:
nm = cols[0]
st = cols[1]
stream =cols[2]
av=cols[3]
gd=cols[4]
cl=cols[5]
print
(nm,"\t",st,"\t\t",stream,"\t",av,"\t\t",gd,"\t",cl)
print
("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n
")
except:
print ("Error: unable to fetch data\n")
db.close()

def adddata():
import mysql.connector
nm=input("Enter Name : ")
stipend=int(input('Enter Stipend : '))
stream=input("Stream: ")
avgmark=float(input("Enter Average Marks : "))
grade=input("Enter Grade : ")
cls=int(input('Enter Class : '))
db = mysql.connector.connect(user='root',
password='root',\
host='localhost',
database='test')
cursor = db.cursor()
sql="INSERT INTO student VALUES ( '%s'
,'%d','%s','%f','%s','%d')"\

Chapter 2: Data Handling with Pandas-I 54 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
%(nm, stipend, stream, avgmark, grade, cls)
try:
cursor.execute(sql)
db.commit()
print("Record Inserted !\n")
except:
db.rollback()
db.close()

def updatedata(): #Method 1


import mysql.connector
try:
db = mysql.connector.connect(user='root',
password='root', host='localhost',database='test')
cursor = db.cursor()
sql = "Update student set stipend=%d where name='%s'"
% (9500,'Taposh')
cursor.execute(sql)
db.commit()
except Exception as e:
print (e)
db.close()

def udata(): #Method 2


import mysql.connector
try:
db = mysql.connector.connect(user='root',
password='root', host='localhost', database='test')
cursor = db.cursor()
sql = "SELECT * FROM student"
cursor.execute(sql)
results = cursor.fetchall()
for cols in results:
nm = cols[0]
st = cols[1]
stream =cols[2]
av=cols[3]
gd=cols[4]
cl=cols[5]
print ("Name =%s, Stipend=%f, Stream=%s, Average
Marks=%f, Grade=%s, Class=%d" % (nm,st,stream,av,gd,cl ))
except:
print ("Error: unable to fetch data\n")

temp=input("Enter Student Name to Updated : ")


tempst=int(input("Enter New Stipend Amount : "))

try:
sql = "Update student set stipend=%d where name='%s'"
% (tempst,temp)
cursor.execute(sql)
db.commit()

Chapter 2: Data Handling with Pandas-I 55 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
print("Record Updated !\n")
except Exception as e:
print (e)
db.close()

def deldata():
import mysql.connector
try:
db = mysql.connector.connect(user='root',
password='root', host='localhost',database='test')
cursor = db.cursor()
sql = "SELECT * FROM student"
cursor.execute(sql)
results = cursor.fetchall()
for cols in results:
nm = cols[0]
st = cols[1]
stream =cols[2]
av=cols[3]
gd=cols[4]
cl=cols[5]
print ("Name =%s, Stipend=%f, Stream=%s, Average
Marks=%f, Grade=%s, Class=%d" % (nm,st,stream,av,gd,cl ))
except:
print ("Error: unable to fetch data")

temp=input("Enter Student Name to deleted : ")

try:
sql = "delete from student where name='%s'" % (temp)
ans=input("Are you sure you want to delete the record
: ")
if ans=='yes' or ans=='YES':
cursor.execute(sql)
db.commit()
except Exception as e:
print (e)

try:
db = mysql.connector.connect(user='root',
password='root', host='localhost',database='test')
cursor = db.cursor()
sql = "SELECT * FROM student"
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
nm = row[0]
st = row[1]
stream =row[2]
av=row[3]

Chapter 2: Data Handling with Pandas-I 56 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
gd=row[4]
cl=row[5]
print (nm,st,stream,av,gd,cl)
except:
print ("Error: unable to fetch data")

menu()

Chapter 2: Data Handling with Pandas-I 57 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
NCERT TEXT BOOK QUESTIONS (NEW)

1.
What is a Series and how is it different from a 1-D array, a list and a
dictionary?
Ans. A Series is a one-dimensional array having a sequence of values of any
data type (int, float, list, string, etc). By default series have numeric data
labels starting from zero.

Series vs 1-D array

 Series can have default as well as pre-defined index labels whereas a


numpy 1-d array has only default indexes.
 Series can contain values of any datatype whereas arrays can contain
elements of the same data type.

Series vs List

 Series can have default as well as pre-defined index labels whereas a


list has only default indexes.

Series vs Dictionary

 Series elements can be accessed using default indexes as well as its


row labels Whereas dictionary elements cannot be accessed using
default indexes. They have to be accessed using the pre-defined keys.

2. What is a DataFrame and how is it different from a 2-D array?


Ans.  DataFrame can store data of any type whereas a NumPy 2D array
can contain data of a similar type.
 DataFrame elements can be assessed by their default indexes for row
and cols along with the defined labels.
 NumPy 2D array elements can be assessed using default index
specifications only.
 DataFrame is data structure from the Pandas library having a
simpler interface for operations like file loading, plotting, selection,
joining, GROUP BY, which come very handy in data-processing
applications
3. How are DataFrames related to Series?
Ans.  DataFrame and Series both are data structures from the Pandas
library.
 Series is a one-dimensional structure whereas DataFrame is a two-
dimensional structure.
 Series can only contain single list with index, whereas Dataframe
can be made of more than one series or we can say that a DataFrame
is a collection of series that can be used to analyse the data.
Chapter 2: Data Handling with Pandas-I 58 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
4. What do you understand by the size of (i) a Series, (ii) a DataFrame?
Ans. Size attribute gives the number of elements present in Series or
DataFrames.
5. Create the following Series and do the specified operations:
a. EngAlph, having 26 elements with the alphabets as values and
default index values.
b. Vowels, having 5 elements with index labels „a‟, „e‟, „i‟, „o‟ and „u‟ and
all the five values set to zero. Check if it is an empty series.
c. Friends, from a dictionary having roll numbers of five of your friends
as data and their first name as keys.
d. MTseries, an empty Series. Check if it is an empty series.
e. MonthDays, from a numpy array having the number of days in the
12 months of a year. The labels should be the month numbers from 1
to 12.
Ans. a.
import pandas as pd
EngAlph=pd.Series(['a','b','c','d','e','f','g','h','i','j','k','
l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'])
print(EngAlph)
b.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])
print(vow)

c.
d={"Samir":1,"Manisha":2,"Dhara":3,"Shreya":4,"Kusum":5}
friends=pd.Series(d)
print(friends)

d.
import pandas as pd

#Method 1
MTseries=pd.Series(dtype=int)
print(MTseries)

#Method 2
MTseries.empty
print(MTseries)

e.
import numpy as np
import pandas as pd

MonthDays=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
month=pd.Series(MonthDays,index=np.arange(1,13))
print(month)

6. Using the Series created in Question 5, write commands for the following:
a. Set all the values of Vowels to 10 and display the Series.
b. Divide all values of Vowels by 2 and display the Series.
c. Create another series Vowels1 having 5 elements with index labels
Chapter 2: Data Handling with Pandas-I 59 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
„a‟, „e‟, „i‟, „o‟ and „u‟ having values [2,5,6,3,8] respectively.
d. Add Vowels and Vowels1 and assign the result to Vowels3.
e. Subtract, Multiply and Divide Vowels by Vowels1.
f. Alter the labels of Vowels1 to [„A‟, „E‟, „I‟, „O‟, „U‟].
Ans. a.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])

#Method 1
vow.loc['a':'u']=10
print(vow)

#Method 2
vow.iloc[0:5]=10
print(vow)

b.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])
vow.iloc[0:5]=10
print(vow/2)

c.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])
vow1=pd.Series([2,5,6,3,8],index=['a','e','i','o','u'])
print(vow1)

d.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])
vow1=pd.Series([2,5,6,3,8],index=['a','e','i','o','u'])
vow3=vow+vow1
print(vow3)

e.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])
vow1=pd.Series([2,5,6,3,8],index=['a','e','i','o','u'])
print(vow-vow1)
print(vow*vow1)
print(vow/vow1)

f.
import pandas as pd
vow=pd.Series(0,index=['a','e','i','o','u'])
vow.index=['A','E','I','O','U']
print(vow)

7. Using the Series created in Question 5, write commands for the following:
a. Find the dimensions, size and values of the Series EngAlph, Vowels,
Friends, MTseries, MonthDays.
b. Rename the Series MTseries as SeriesEmpty.
c. Name the index of the Series MonthDays as monthno and that of

Chapter 2: Data Handling with Pandas-I 60 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Series Friends as Fname.
d. Display the 3rd and 2nd value of the Series Friends, in that order.
e. Display the alphabets „e‟ to „p‟ from the Series EngAlph.
f. Display the first 10 values in the Series EngAlph.
g. Display the last 10 values in the Series EngAlph.
h. Display the MTseries.
Ans. a.
import numpy as np
import pandas as pd

#EngAlph
EngAlph=pd.Series(['a','b','c','d','e','f','g','h','i','j','k','l','m
','n','o','p','q','r','s','t','u','v','w','x','y','z'])
print("Size:",EngAlph.size)
print("Dimension:",EngAlph.ndim)
print("Values:",EngAlph.values)

#vowels
vow=pd.Series(0,index=['a','e','i','o','u'])
print("Size:",vow.size)
print("Dimension:",vow.ndim)
print("Values:",vow.values)

#Friends
d={"Samir":1,"Manisha":2,"Dhara":3,"Shreya":4,"Kusum":5}
friends=pd.Series(d)
print("Size:",friends.size)
print("Dimension:",friends.ndim)
print("Values:",friends.values)

#Method 1
MTseries=pd.Series(dtype=int)

#Method 2
MTseries.empty

print("Size:",MTseries.size)
print("Dimension:",MTseries.ndim)
print("Values:",MTseries.values)

#MonthDays
MonthDays=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
month=pd.Series(MonthDays,index=np.arange(1,13))
print("Size:",month.size)
print("Dimension:",month.ndim)
print("Values:",month.values)

b.
import pandas as pd
MTseries=pd.Series(dtype=int)
print(MTseries)
MTseries=MTseries.rename("SeriesEmpty")
print(MTseries)

c.
import numpy as np

Chapter 2: Data Handling with Pandas-I 61 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
import pandas as pd

#Friends
d={"Samir":1,"Manisha":2,"Dhara":3,"Shreya":4,"Kusum":5}
friends=pd.Series(d)
friends.index.name="Fname"
print(friends.index)

#MonthDays
MonthDays=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
month=pd.Series(MonthDays,index=np.arange(1,13))
month.index.name="MonthNo"
print(month.index)

d.
import numpy as np
import pandas as pd

#Friends
d={"Samir":1,"Manisha":2,"Dhara":3,"Shreya":4,"Kusum":5}
friends=pd.Series(d)
friends.index.name="Fname"
print(friends.iloc[2:0:-1])

e.
import numpy as np
import pandas as pd

#EngAlph
EngAlph=pd.Series(['a','b','c','d','e','f','g','h','i','j','k','l','m
','n','o','p','q','r','s','t','u','v','w','x','y','z'])
print(EngAlph.iloc[4:16])

f.
import numpy as np
import pandas as pd

#EngAlph
EngAlph=pd.Series(['a','b','c','d','e','f','g','h','i','j','k','l','m
','n','o','p','q','r','s','t','u','v','w','x','y','z'])
print(EngAlph.head(10))

g.
import numpy as np
import pandas as pd

#EngAlph
EngAlph=pd.Series(['a','b','c','d','e','f','g','h','i','j','k','l','m
', 'n','o','p','q','r','s','t','u','v','w','x','y','z'])
print(EngAlph.tail(10))

h.
Refer Answer (b)
8. Using the Series created in Question 5, write commands for the following:
a. Display the names of the months 3 through 7 from the Series

Chapter 2: Data Handling with Pandas-I 62 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
MonthDays.
b. Display the Series MonthDays in reverse order.
Ans. a.
import numpy as np
import pandas as pd

#MonthDays
MonthDays=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
month=pd.Series(MonthDays,index=np.arange(1,13))
month=pd.Series(MonthDays,index=["Jan","Feb","Mar","Apr","May","June"
,
"July","Aug","Sept","Oct","Nov","Dec"])
#Method 1
print(month.iloc[2:7])

#Method 2
print(month.loc['Mar':'July'])

#Method 3
print(month['Mar':'July'])
b.
import numpy as np
import pandas as pd

#MonthDays
MonthDays=np.array([31,28,31,30,31,30,31,31,30,31,30,31])
month=pd.Series(MonthDays,index=np.arange(1,13))
month=pd.Series(MonthDays,index=["Jan","Feb","Mar","Apr","May","June"
,
"July","Aug","Sept","Oct","Nov","Dec"])
#Method 1
print(month[::-1])

#Method 2
print(month.iloc[::-1])

#Method 3
print(month.loc["Dec":"Jan":-1])

9. Create the following DataFrame Sales containing year wise sales figures
for five sales persons in INR. Use the years as column labels, and sales
person names as row labels.
2014 2015 2016 2017

Madhu 100.5 12000 20000 50000

Kusum 150.8 18000 50000 60000


Kinshuk 200.9 22000 70000 70000
Ankit 30000 30000 100000 80000
Shruti 40000 45000 25000 90000

Chapter 2: Data Handling with Pandas-I 63 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
Ans. import pandas as pd
d = {2014:[100.5,150.8,200.9,30000,4000],
2015:[12000,18000,22000,30000,45000],
2016:[20000,50000,70000,10000,125000],
2017:[50000,60000,70000,80000,90000]}
Sales=pd.DataFrame(d,index=['Madhu',"Kusum","Kinshuk","Ankit","Shruti
"])
print(Sales)

10. Use the DataFrame created in Question 9 above to do the following:


a) Display the row labels of Sales.
b) Display the column labels of Sales.
c) Display the data types of each column of Sales.
d) Display the dimensions, shape, size and values of Sales.
e) Display the last two rows of Sales.
f) Display the first two columns of Sales.
g) Create a dictionary using the following data. Use this dictionary to
create a DataFrame Sales2.
2018
Madhu 160000
Kusum 110000
Kinshuk 500000
Ankit 340000
Shruti 900000
Check if Sales2 is empty or it contains data.
Ans. a.
import pandas as pd
d = {2014:[100.5,150.8,200.9,30000,4000],
2015:[12000,18000,22000,30000,45000],
2016:[20000,50000,70000,10000,125000],
2017:[50000,60000,70000,80000,90000]}
Sales=pd.DataFrame(d,index=['Madhu',"Kusum","Kinshuk","Ankit","Shruti
"])
print(Sales.index)
b. sales.columns
c. Sales.dtypes
d.
print("Dimensions:",Sales.ndim)
print("Shape:",Sales.shape)
print("Size:",Sales.size)
print("Values:",Sales.values)
e.
#Method 1
print(Sales.tail(2))

#Method 2
print(Sales.iloc[-2:])

#With Specific Columns, I have printed two columns


print(Sales.iloc[-2:,-2:])

f.
#Method 1
print(Sales[[2014,2015]])

Chapter 2: Data Handling with Pandas-I 64 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25

#Method 2
print(Sales[Sales.columns[0:2]])

#Method 3
print(Sales.iloc[:, 0:2] )
g.
import pandas as pd
dict1={2018:[160000,110000,500000,340000,900000]}
sales2=pd.DataFrame(dict1,index=["Madhu","Kusum","Kinshuk","Ankit","S
hruti"])
print(sales2)

h.
print(sales2.empty)
11.Use the DataFrame created in Question 9 above to do the following:
a) Append the DataFrame Sales2 to the DataFrame Sales.
b) Change the DataFrame Sales such that it becomes its transpose.
c) Display the sales made by all sales persons in the year 2017.
d) Display the sales made by Madhu and Ankit in the year 2017 and
2018.
e) Display the sales made by Shruti 2016.
f) Add data to Sales for salesman Sumeet where the sales made are
[196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016,
2017, 2018] respectively.
g) Delete the data for the year 2014 from the DataFrame Sales.
h) Delete the data for sales man Kinshuk from the DataFrame Sales.
i) Change the name of the salesperson Ankit to Vivaan and Madhu to
Shailesh.
j) Update the sale made by Shailesh in 2018 to 100000.
k) Write the values of DataFrame Sales to a comma separated file
SalesFigures.csv on the disk. Do not write the row labels and column
labels.
l) Read the data in the file SalesFigures.csv into a DataFrame
SalesRetrieved and Display it. Now update the row labels and column
labels of SalesRetrieved to be the same as that of Sales.
Ans. a.
import pandas as pd

#Sales1
d = {2014:[100.5,150.8,200.9,30000,4000],
2015:[12000,18000,22000,30000,45000],
2016:[20000,50000,70000,10000,125000],
2017:[50000,60000,70000,80000,90000]}
Sales1=pd.DataFrame(d,index=['Madhu',"Kusum","Kinshuk","Ankit","Shrut
i"])

#Sales 2
dict1={2018:[160000,110000,500000,340000,900000]}
Sales2=pd.DataFrame(dict1,index=["Madhu","Kusum","Kinshuk","Ankit","S
Chapter 2: Data Handling with Pandas-I 65 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
hruti"])

#Appending Dataframes
Sales1=Sales1.append(Sales2)
print(Sales1)

b.
import pandas as pd

d = {2014:[100.5,150.8,200.9,30000,4000],
2015:[12000,18000,22000,30000,45000],
2016:[20000,50000,70000,10000,125000],
2017:[50000,60000,70000,80000,90000]}
Sales=pd.DataFrame(d,index=['Madhu',"Kusum","Kinshuk","Ankit","Shruti
"])

print(Sales.T)

c.
#Method 1
print(Sales[2017])

#Method 2
print(Sales.loc[:,2017])

d.
import pandas as pd

d = {2014:[100.5,150.8,200.9,30000,4000],
2015:[12000,18000,22000,30000,45000],
2016:[20000,50000,70000,10000,125000],
2017:[50000,60000,70000,80000,90000]}

Sales=pd.DataFrame(d,index=['Madhu',"Kusum","Kinshuk","Ankit","Shruti
"])

#Add 2018 Data


Sales[2018]=[160000,110000,500000,340000,900000]

#Method 1
print(Sales.loc[['Madhu','Ankit'], [2017,2018]])

#Method 2
print(Sales.loc[Sales.index.isin(["Madhu","Ankit"]),[2017,2018]])

e.
print(Sales.loc[Sales.index=='Shruti',2016])

f.
Sales.loc["Sumeet"]=[196.2,37800,52000,78438,38852]
print(Sales)

g.

Temporary deletion

Sales.drop(columns=2014)

Permanent deletion

Sales.drop(columns=2014, inplace=True)
Chapter 2: Data Handling with Pandas-I 66 | P a g e
Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24
Class Notes : 2024-25
print(Sales)

h.

Sales.drop(“kinshuk”,axis=0)
Sales.drop(“kinshuk”)

i.

Sales=sales.rename({“Ankit”:”Vivaan”,”Madhu”:”Shailesh”},
axis=”index”)
print(Sales)

j.

Sales.loc[Sales.index==”Shailesh”,2018]=100000
print(Sales)

k.

Sales.to_csv(“d:\salesFigures.csv”,index=False,header=Fal
se)

l.

salesretrieved=pd.read_csv(“d:\salesFigures.csv”,names=[„
2015′,‟2016′,‟2017′,‟2018‟])
print(salesretrieved)

salesretrieved.index=[„Madhu‟, „Kusum‟, „Kinshuk‟,


„Ankit‟, „Shruti‟,‟Sumeet‟]
print(salesretrieved)

Chapter 2: Data Handling with Pandas-I 67 | P a g e


Pr ep ar ed b y : Tap o sh Ka rm akar AI R FO R CE SC HO OL JO RH A T | M ob i l e : 7 00 20 70 3 13
Lat e st Up d a te on : 7 Ap r il 20 24

You might also like