0% found this document useful (0 votes)
21 views

L1_DataFrames_I

Uploaded by

priyanshu9107
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

L1_DataFrames_I

Uploaded by

priyanshu9107
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

1.

11 DATAFRAME
In the previous sections, we have discussed series in detail. One limitation of
not able to handle 2D or multidimensional data related to real time. series is that it is
For such tasks, Python Pandas provides another data structure called
dataframe.
Dataframe object of Pandas can store 2D heterogeneous data. It is a
structure, just like any table (with rows &columns). Dataframe is similar to
SQL tables. While working with Pandas, datora
dataframe is the most commonly-used data structure
two-dimspreadsheet
ensional s
The basic features of
dataframe are: Columns
() Columns can be of different types, i.e., it is A B C D
possible to have any kind of data in columns, E
olo 0 0
.e., numeric, string or floating point, etc. SMOM 0 0 0
(ii) Size of dataframe is mutable, i.e., the
of rows and columns can be
number 0 0

increased or
decreased any time. 0
(iii) Its data/values are also mutable and can
be changed any time.
(iv) Labelled axes
(rows/columns).
(v) Arithmetic operations can be
performed on rows and columns.
(vi) Indexes may constitute
numbers, strings or letters.
1.11.1 Creation of Dataframe
Before creating a dataframe, we must
understand the layout of dataframe.
The table given below
represents the data of students with their marks along with
grades. The data is represented in rows and columns. Each column the respective
(like Name, Marks, Grade) and each represents an attribute
row represents record of a
student.
column names
Name Marks Grade
Vijaya 80 B1
1 Rahul 92
Row/Record
A2
2
Index label Meghna 67 C
3 Radhika 95 A1
4 Shaurya 97 A1
Creating an Empty Dataframe
Creatingadataframe begins with the creation of anempty
DataFrame) method is used. To display a dataframe,the dataframe. To create an empty dataframe,
is used. command print (<dataframe name>)
prog_dfl.py-CUsers/preeti/AppData/loc
File Edit Format Run
Options Window Help
#Creating an Empty dataframe
import pandas as pd
dfl =pd. DataFrame ()
print (dfl)
RESTART: C:/Users/preeti
Empty DataFrame
Columns: )
As is evident from the output, columns and index are empty since no argument has been passed
to the method DataFrame().
Its syntax is:
pandas. DataFrame (data, index, columns)
Here,
data: The data can be taken in the form of a ndarray, series, lists, dict and even another
dataframe.
index: For the row labels, it defines the index to be usedfor the resulting frame which is optional.
If the index is not specified, by default, it willtake the value of the index from 0to n-1, where nis
the total number of rows in the given data set.
columns: This attribute is used to provide column names in the dataframe. If the column name
is not specified, by default, it will take the value of the index from 0 to n-1, where n is the total
number of columns in the given data set.
Let us see an example to understand the basic components of a Pandas dataframe.
Example: Carefully observe the following code:
import pandas as pd
Yearl={'Q1':5000, 'Q2':8000, 'Q3':12000, 'Q4': 18000}
Year2={'A' :13000, 'B':14000, 'C':12000}
totSales= {1:Yearl,2:Year2}
df=pd. DataFrame (totSales)
print (df)

(i) List the index of the dataframe df.


Ans. The index labels of df will include Q1, Q2, Q3, Q4, A, B, C.
(i) List the column names of dataframe df.
Ans. The column names of df will be: 1,2.

Creating Dataframe in Pandas


Dataframecan be created with the following constructs:
1. Lists

2. Series
3. Dictionary
4. NumPy ndarrays
1. Creating Dataframe from Lists
using lists. The list is passed as an argument
The simplest form of generating dataframe is displayed as
converted into a dataframe with elements
to DataFrame() method and gets The default column label is 0. We can
automatically created by Pandas.
columns and index
column index values for arranging elements in the form of rows and
also provide row and
columns.
Example 1: To create a dataframe from list.
Python 3.7.0 Shell
File Edit Shell Debug Options
Window Help
>>> import pandas as pd
>>> listl (10, 20,30, 40,50)
>>> dfl
-pd.DataFrame
>>> print (dfl) (1ist1)

0) 10 Column label
1 20
2 30
3 40 + Index value generated
4 50 automatically
Ln: 31 Col: 4

Example 2: Tocreate a dataframe from nested Lists by passing


column names and index values
prog_d list.py - CAUsers\preetiAppData\Loca
Fite Ed Fomat Run Options Window Help Programs\Python\ Python37-32\prog_dfJistpy (3.7.0)- O
$Creating ata frame from student list
import pandas as pd1
datal = [' Shreya',20),
('Rakshit',22), (' Srijan', 18])
idx=('sStudl', 'Stud2',' Stud3']
dfl = pdi.DataFrame (datal,
index=idx, columns=(" Name', 'Age'])
#column names #to be displayed #Defining
print (df1) as headings
RESTART: C:\Users\preetil\
-Py
Name Age
Stud1 Shreya 20
Stud2 Rakshit 22
Stud3 Srijan 18

Example 3:To create a dataframe with appropriate column


headings from the list given below:
[[101, 'Gurman', 98],[102, 'Rajveer', 95],[103,
File Edt Formt Run Ogtions
Window Help
'Samar', 96],[104, 'Yuvraj', 88]]
import pandas as pd
data=[ [101, ' Gurman',98), (102, A

'Rajveer',95), [103, 'Sanar


[204, 'Yuvraj',881] ,96),
df=pd.DataFrame
print (df)
(data, columns=('Rno',Name', 'Marks'])
>>>
RESTART: C:/Users/preeti,
022-231.py
Rno Name Marks
0 101 Gurman 98
1 102 Rajveer 95
2 103 Samar 96
|3 104 Yuvraj 88

2. Creating Dataframe from Series


Dataframe is two-dimensional
series in the form of rows and representation of series. When we
represent twO or more
better, let us understand how columns,
it becomes a dataframe. To understand this
are created using multiple
to create a
dataframe from other dataframes which, concept
in turn,
inputted series.
(i) To create a
dataframe from a single series.
>>> iport pandas as pd
>>> smpd. Series ( [1, 2, 3, 4, 5))
>>> dl-pd.DataFrame (s)
>>> d1

1 2
2 3
3 4
4 5

1.24 Informatics Practices with Python-XI|


(ii) To create a dataframe from two series, pass them as a list to DataFrame) method.
>>> s2=pd sories((16, 26, 36, 46, 301)
>>» d2-pd. DataFrame ((5,821)
1 3
1
1 10 20 30 40 s0

(ii) To create a resultant dataframe from the above two created series, pass them as a
dictionary to DataFrame() method.
>>> df-pd. Data Frame (( Ser1':3, 'Ser2' 132) )
>>> df
Serl Ser2
1 10
1 2 20
2 3 30
3 4 40
5 50

Thus, it can be concluded:


become
(a) When asingle series is used to create a dataframe, the elements of the series
the elements of the column in a dataframe.
the elements of
(b) When two or more series are provided as an argument to a dataframe,
the series shall become the rows in the resultant dataframe.
series created in
(c) To create a resultant dataframe from series, we need to convert the
DataFrame)
above two steps intoa dictionary which is passed as an argument to the
method along with the column labels.
Practical Implementation-22
To create dataframe from two series of student data.
Python3.7.0 Shell
File Edt Shell Debug Options Window Help

>>> import pandas as pd


>>> student marks "pd.Series ({Vijaya':80, 'Rahul':92, 'Meghna':67,
'Radhika':95,'Shaurya ':97))
>>> student age pd.series ({ 'vijaya':32, 'Rahul' :28, 'Meghna':30,
'Radhika' :25,'Shaurya':20))
df-pd.DataFrame ({'Marks' :student marks, 'Age':student age))
>>>
Marks Age
80 32
vijaya
Rahul 92 28
67 30
Meghna
Radhika 95 25
Shaurya 97 20
>>>
Ln 21 Cok 4

3. Creating Dataframe from Dictionary


dictionary keys by default
Dictionaries can be passed as an input data to create a dataframe. The dictionary are:
create a dataframe using
are taken as column names. The different ways to
(b) Dictionary of series
(a) Dictionary of lists
(c) List of dictionaries
lists
(a) Creating a Dataframe using Dictionary of
elements representing student
First of all, we shall create a dictionary of lists with its
names and marks in their respective subjects.
dataframeCh-1(2),py -C/Users/User2/AppData/local/Programs/Python/Python39/dataframeCh-1(2).py(3. X

File Edit Format Run Options Window Help


lists
#Creating a DataFrame from Dictionary of

import pandas as pd 'Pankaj','Aditya'), 'English': (67,78,75, 88,92), )


studentu{'Name ':'Rinku','Ritu','Ajay','IP':(78,88,98,90,
'Bconomics': (78, 67,89,90,56], 87), 'Accounts': (77,70, 80,67, 86)
df=pd.DataFrame (student)
print (df)
Ln: 8 Col: 0
Python 3.9,0 Shell
File Edit Shell Debug Options Window Help
Python 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) (MSC v.1927 64 bit (AM
D64)) on win32
|TYpe
>>>
"help", "copyright", "credits" or "license () " for more information .
RESTART : C:/Users/User2/AppData/Loca l/ Programs/Python/Python39/data frameCh-1 (2
Dpy
Name English Economics IP Accounts
Rinku 67 78 7 77
Ritu 78 67 88 70
2 Ajay 75 89 98 80
3 Pankaj 88 90 90 67
4 Aditya 92 56 87 86
>>>

Lx 11 Col: 4
As is evident from the output obtained, since no
index values have been
DataFrame) method, therefore, all the keys of the dictionary shall becomepassed
to the
names of the dataframe and the values represented in the the column
shall become the values of the dataframe columns. form of lists in the dictionary
(b) Creating a Dataframe from Dictionary of series
A dictionary of series can also be
used to create a dataframe. For example, Stud
isa dictionaryof series _Result
containing names and marks of four students in three subjects.
Le prog
#Creating dictseriesdf1.py-CUsers/pre ti/AppDadictionary
Fle Edt Formut Run Options Window Help
a
import numpy as np
dataframe from ta/local/Programs/Python/Python37-32/prog_dict seriesdf1.py
of series
(3.7.0)

import pandas as pd
n=pd.Series (['Rinku', 'Deep', ' Shaurya , "Radhika'))
Eng=pd.Series ( (89, 78,89, 90])
Eco=pd.Series ( (87,80,60,84))
IP=pd.series ([89, 78,67,90])
#Creating dictionary using series variables as
stud Result= ("Name":n, values
Creating dataframe "English" : Eng, "Economics" :Eco, "Informatics
df=pd.DataFrame
print (df)
(Stud Result) RESTART:
Practices":IP)
riesdll .py
Name
Rinku
Deep
C:/Users/preeti/AppDatInformatics
English
89
78
a/Local/ProgrPractices
Economics87
80
ams / Python

Shaurya
Radhika
89
90
60 78
>>> 84 67
90
Alternatively, we can also create a
dataframe from. a
dictionary using Series() methoddictionary
the values to the of
as shown inseries by directly passing
Implementation: the following
Practical Implementation-23
Practical
To create a
dataframe from adictionary of series by
using Series() method. directly passing the values to the
prog_dict seriesd12.py- dictionary
File Edk Format Run
import pandas as pd C/Users/pre ti/AppData/LocaV/Programs/Python/Python37-32/prog_dict.
Options Window Help

stud Result= ("Name" : pd.Series(('Rinku' 'Deep', 'Shaurya',


"English" ipd. Series ( [89, 78,89, 90)), 'Radhika ')),
"Economics" ipd.. Series ((87,80, 60,84)),
$Creating
"Informatics Practices" ipd.Series
data frame ((89, 78,67,90)))
df=pd. Data Frame (Stud Result)
print (df)
>>>

RESTART: C:/Users/preeti/AppData/Local/ Programs / Python


riesdf2.py
Name English Economics Informatics Practices
Rinku 89 87 89
1 Deep 78 80
2 Shaurya 89 60 67
3 Radhika 90 84 90
>>>

(c) Creating a Dataframe using List of dictionaries


dataframe can be created using list of dictionaries where keys in the dictionary will
A
be converted to column names and values as rows of the dataframe

Practical Implementation-24
To create a dataframe by passing a list of dictionaries.
df_dictlist. D
L prog_df dictlistpy-C/Users/preeti/AppData/Local/Programs/Python/Python37-32/prog
File Edit Format Run Options Window Help
#Creating a dataframe by passing a list of dictionaries

import pandas
newstudent as pd :67, 'Ritu':78, 'Ajay':75, 'Pankaj':88, 'Aditya' :92),
=[{'Rinku'
{'Rinku ':77, 'Ritu':58,'Ajay' :87, 'Pankaj' :65},
{'Rinku':88, 'Ajay':67, 'Pankaj' :74, 'Aditya':70}]
newdf pd.DataFrame (newstudent)
print (newdf)
RESTART: C:\Users\ User2\AppData\Local\Programs\Python\Python39\datafra
me chlinserneW.py
Rinku Ritu Ajay Pankaj Aditya
67 78.0 75 88 92.0
1 77 58.0 87 65 NaN
2 88 NaN 67 74 70.0

Ln: 12 Col: 4

places.
As shown in the output window, NaN (Not a Number) is automatically added in missing
4. Creating a Dataframe using NumPy ndarray
We can create a dataframe using NumPy ndarray by importing NumPy module in our program.
Practical Implementation-25
To create a dataframe from NumPy ndarray.
DataframeCh-1.py - CAUsers\User2\AppData\Local\ Programs\Python\Python39\DataframeC. X

File Edit Format Run Options Window Help


#Creating DataFrame from ndarray
lmport pandas as pd
import numpy as np
# creating the Nunpy array
array np.array ([(67, 78,75,78], (67, 78,75,88),
[78,67, 89,90), (78,88,98,90]])
# creating a list of column names
column values = ('English', 'Economics', 'IP', 'Accounts']
#creating the dataframe
df = pd. DataErame (data = array, columns = colmn values)
#displaying the dataframe
print (df)
Ln:6 Col: 0
Bython 3.9.0 Shell X

File Edit Shell Debug Options Window Help


.
Python 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) (MSC v. 1927 64 bít (AM
D64)) on win32
information.
Type "help", "copyright", "credits" or "license () " for more
>>>
= RESTART: C:/Users/User2/AppData/Local/Programs/Python/Python39/DataframeCh-1.p

English Economics IP ACcounts


0 67 78 75 78
1 67 78 75 88
2 78 67 89 90
3 78 88 98 90

>>>

Ln 11 Cl: 4

1.11.2 Attributes of a Dataframe


Index
Like series, we can access certain properties called
Axes
attributes of a dataframe by using that property with Empty
the dataframe name, separated by dot () operator.
Columns
Ndim
Syntax:
<DataFrameObject>. <attribute name> Dataframe
Attributes
T
Let us understand all the attributes while considering (Transpose) Count

the below dataframe as an example which holds


QPI (Quality Performance Index) in the four subjects Dtypes Shape
for three successive years. Size

eprog.dfi ributespy-CUsers/preeti/AppData/Local/Programs/Python/Python37-32/prog.d
Fle Edt Fomt Run Opions Wndow Hep
import pandas as pd

dict={" 2018': [85.4,88.2, 80.3, 79.0], '2019': [77.9,80.5, 78.6, 76.2],


2020: [86.5,90.0,77.5,80.5])

df-pd. DataFrame (dict, index=['Accountancy','IP','Economics','English')


print (df) |>>>
RESTART: C:/Users/preeti/AppData/Local
ibutes.py 2018 2019 2020
Accountancy 85.4 77.9 86.5
IP 88.2 80.5 90.0
Economics 80.3 78.6 77.5
English 79.0 76.2 80.5

1. index

Thisattribute is used to fetch the index's names as the index could be 0, 1, 2, 3


and so on. Also,
it could be some names as in our example indexes are:
Accountancy, IP, Economics, English.
Syntax:
<DataFrameObject>. <index>
>>>print (df. index)
| >>>
RESTART :
C:/Users/User2/AppData/Local/Programs/
Index(('Accountancy' , 'IP', 'Economics, English'J,Python/ Python39/df_attributel·PY
dtype=' object')
2. columns
This attribute is used to fetch the column's names as in our case it
should give column name
as: 2018, 2019, 2020.
Syntax:
<DataFrameObject>. <columns>
>>>print (df. columns)
>>> print (df . columns)
|Index (['2018', '2019', '2020'), dtype='object')

3. axes
This attribute is used to fetch both index and column names.
Syntax:
<DataFrameObject>. <axes>
>>>print (df. axes)
>>> df. axes
[Index('Accountancy', 'IP', 'Economics', 'English'], dtype=' object'),
Index (['2018', '2019', '2020'J, dtype=' object')]

4. dtypes
This attribute is used to fetch the data type of each column in the dataframe.
Syntax:
<DataFrame Object>, <dtypes>
>>>print (df. dt ypes)
>>> df.dtypes
2018 float64
2019 float64
2020 float64
dtype: object

5. size
This attribute is used to fetch the size of the dataframe, which is the product of the number
of rows and columns. It represents the number of elements in the dataframe and always
returns an integer value.
of our
Here, in our example, we have 4 rows and 3 columns, So 4*3, ie., 12 is the size
dataframe.
Syntax:
<DataFrameObject>. <size>
>>>print (df.size)
>>> df. size
12

6. shape
i.e., the number of rows
This attribute alsogives youthe size but it also mentions its shape,
and number of columns of the dataframe.
Syntax:
<DataFrameObject>. <shape>
>>>print (df . shape)
>>> df. shape
(4, 3)

Data Handling using Pandas 1.29


7. ndim

This attribute is used to fetch the dimensions (number of axes) of the given
i.e., whether it is 1D, 2D, or 3D. Thus, fora 2D dataframe, it will return the output as 2 dataframe,
Syntax:
<DataFrame Obj ect>. <ndim>
print (df. ndim)
>>> df. ndim
2

8. empty
This attribute gives you a boolean output in the form of True or False by which we can fnd
out if there is any empty or missing value present in the dataframe.
Syntax:
<DataFrameObject>. <empty>
>>> print (df.empty)
False

Apart from the above attribute, we have a method called isna () in Pandas that can check the
presence of NaNs (Not a Number) in dataframes.
Syntax:
<DataFrameObject>. isna()>
e prog dfi_tributes,py
-CUsers/preeti/AppData/loca\/Programs/Python/Python37-32/prog_.
File Edit Format Run Options Window Help
import pandas as pd
dict={ "' 2018': (85.4,88.2,80.3, 79.0], '2019: [77.9, 80.5,78.6,
'2020:(86.5,90.0, 77.5,80.5]) 76.2] ,

df-pd. DataFrame (dict, index=(


"Accountancy', 'IP','Economics ', 'English'])
print ("Using 'Empty ' on DataFrame:",
print ('\n') df.empty)
print ('Finding NaN RESTART: C:/Users/preeti/AppData/Local
print ("NOT FOUND") values....','\n', df.isna () ) ibutes .py
print ("'\n') Using " Empty on Dataframe: False
DataFrame is not Empty

Finding NaN values..


2018 2019
2020
Accountancy
IP
False False False
False Fal3e False
Economics False False False
English False False False
NOT FOUND!!
9. T (Transpose)
This attribute is used
to transpose the dataframe, i.e., rows
become rows. become columns and
Syntax:
columns
<DataFrameObject>. <T>
à prog dfl_ttributespy -
Userypreeti/AppData/Loca\/Programs/ython/Python37-32/prog,dl.
Fle Ed Femt Run Options Window Help
O
import pandas as pd
dict ('2018':(85.4,88.2, 80.3, 79.0), 2019':(77.9, 80.5, 78.6,76.2),|| >>>
'2020: (86.5,90.0,77.5, 80.5)) RESTART : C:/Users/preeti/AppData/ Local
ibutes.py
df-pd.Data Frame (dict, index=('Accountancy','IP', 'Eco', English') 2018 2019 2020
print (df) Accountancy 85.4 77.9 86.5
print ("\n') IP 88.2 80.5 90.0
dfl=df.T Eco 80.3 78.6 77.5
print ("After Transpose: ") English 79.0 76.2 80.5
print (dfl)
Le 14
Alter Transpose:
Accountancy IF Eco English
2018 85.4 88.2 80.3 79.0
2019 77.9 80.5 78.6 76.2
|2020 86.5 90.0 77.5 80.5

10. Count

This attribute calculates the number of non-null or non-missing values in each column.
Syntax:
<DataFrame Object>.<Count>
>>>print (df. count ())
- RESTART: C:/Users/Editor AI and CS 1/AppData/ Local/Programs/Python/Python31l/d
fdict.py 4
2018
2019 4
2020 4
dtype: int64

1.11.3 Setting/Resetting index of Dataframe


We can set index of dataframe by providing values to index attribute.

Practical Implementation-26
To create an indexed dataframe using lists.
format:
You can give index to the dataframe explicitly in the following
['Sno1','Sno2', 'Sno3','Sno4', 'Sno5'])
df = pd. DataFrame (student, index=
'Sno2','Sno3', 'Sno4, 'Sno5')
>>> df =pd.DataFrame (student, index=['Snol',
>>> df
Name English Economics IP Accounts
78 78 77
Snol Rinku 67
67 70
Sno2 Ritu 78
89 98 80
Sno3 Ajay 75
90 90 67
Sno4 Pankaj 88
56 87 86
Sno5 Aditya 92
>>>
as
the default index value starting from 0 has been replaced by serial number
In the given case, dataframe.
the
the index for all the records in
Practical Implementation-27
To change the Index Column. Pandas
index is displayed from 0 to n-1, where n is the total number of records.
By default,the dataframe as the
provides us with the flexibility to select the other column present in the index() method
also 0,1,2..and n. This can be done using set
index column instead of the traditional
in the following format:
Syntax: inplace=True)
df.set index (<column name>, 1.31
Data Handling using Pandas
Here,
column_name: is the name of the column which is to be set as the index of the
Suppose we have to take Name as the index column. This is done as: dataframe.
DataFrame for student
Name English Economics IP
|Default index Rinku 67 78
Accounts
78 77
value Ritu 78 67 88 70
Ajay 75 89 98 80
Pankaj 88 90 90 67
Aditya 92 56 87 86
>>> df.set_index ('Name', inplace=True)
>>> df
Index column changed
English Economics IP Accounts
to Name Name
Rinku 67 78 78 77
Ritu 78 67 88 70
Ajay 79 89 98 80
Pankaj 88 90 90 67
Aditya
L>>>
92 56 87 86

As observed from the output, the default index has been


replaced by the field
attribute has been set to True. This is so because a dataframe by default is not 'Name' inplace
as
and on changing the index, a new dataframe with modified values is replaced/changed
returned. Therefore, in order
to make changes in the original dataframe, inplace attribute is used and
set to True.
Practical Implementation-28
To reset Index column.
To undo the above operation, i.e., allocating
default values according to student names, we can
get the original index values. Thus, we can get our
the implementations using
original table with default integer index as in
reset_index() method.
The command is:
df.reset index (inplace= True)
The inplace' keyword is made "True' and shall
index value. result in replacing the default integers as the
>>> df.reset index (inplace
=True)
>>> df

Name English Economics IP Accounts


Rinku 67 78 78
Ritu 77
78 67 88
2 Ajay 79
70
3 Pankaj 89 98 80
88 90 90
4 Aditya 67
92 56 87
>>> 86

It is clear from the above


words, in their original output that the default integers are
format. displayed in place or, in otner
1.11.4 Retrieving and Accessing
You can access and retrieve the Rows/Columns
from Dataframe
the display of retrieved records from dataframe through slicing.
a
records as per the range defined with Slicing shall result n
the dataframe objeet
Practical Implementation-29
To display records from the first to the third row.
import pandas as pd
|student -{'Name':('Rinku','Ritu', 'Ajay','Pankaj', 'Aditya'), 'English': (67, 78, 75, 88, 92],
'Bconomics': (78, 67, 89, 90,56),'IP':(78,88,98, 90,87), 'Accounts' : (77, 70, 80, 67, 86])
#Adding the above data in a dataframe
df -pd.DataFrame (student)
print ("DataFrame for student")
print (df)
print (df( 1:4))
DataFrame for student
Name English Economics IP Accounts
Slicing Operation Rinku 67 78 77
on dataframe Ritu 78 67 88 70
75 89 98 80
2 Ajay 67
3 Pankaj 88 90 90
92 56 87 86
4 Aditya
>>df (1:4] Accounts
Name English Economics IR
67 88 70
Ritu
75 89 98 80
2 Ajay 67
Records from the 1st to 3 Pankaj 88 90 90
the 3rd row are displayed >>>

Accessing single value (Use of attributes .at and .iat)


dataframe at the passed location, Pandas provides the
In order to access and return data in a
following two attributes:
property iat[] to access the single values from
(i) df.iat: In Pandas, the dataframe provides a
to iloc function that provides index
dataframe by their row and column number. It is similar
only difference is that it returns a single
based access to the elements in a dataframe. The
the given row and column number from the dataframe. Whereas, if any column or
value at
raise IndexError.
row number is out of bounds, it will
Its syntax is:
column number]
DataFrame. iat [row number
row, the row
toselect the value from the nth
row_number:As indexing starts from 0,so
number should be n-1.
select the value from the nth column,
column_number: As indexing starts from 0, so to
be n-1.
the column number should
For example,
>>> C:\Users\preeti\AppData\Local\Programs\ Python
RESTART:
riesdf2.py Informatics Practices
Name English Economics 89
87
89 78
Rínku 80
78 67
Deep 60
2 Shaurya 89 90
90 84
3 Radhika
>> df.iat [1,3]
78
provides label-based accessing of the elements
to the locfunction
which row/column
(i)) df.at: It is similar to access a single value only based on
is that it allows
but the only difference
dataframe.
label pair from a
1.33
Pandas
Its syntax is:
DataFrame. at [row label, 'column name']
For example,
>>>

RESTART: C:\Users\preeti\AppData\Local\ Programs\ Python'


riesdf2.py
Name English Economics Informatics Practices
0 Rinku 89 87 89
1 Deep 78 80 78
2 Shaurya 89 60 67
3 Radhi ka 90 84 90
>>> df.at [2, 'Economics']
60

1.11.5 Adding/Modifying a Row in a Dataframe


We can add a new row to a dataframe using the
dataframe newstudent that has three rows for the DataFrame.loc[] method. Consider the
five students. We need to add a new
record at location 3rd in the dataframe.
Lè prog dfdictlistpy - CAUsers\preeti\App Data\ Locan\
File Edt Form Run Options Window Hlp
Programs\Python\ Python37-32\prog.dfdictlistp
import pandas as pd
newstudent -({'Rinku':67, 'Ritu':78, 'Ajay':75, ' Pankaj':88,
('Rinku':77, 'Ritu':58, 'Ajay':87, ' Pankaj':65),'Aditya':92),
('Rinku':88, 'Ajay' :67, 'Pankaj' :74, 'Aditya ' :70) )
newdf pd.DataFrame (newstudent)
print (newdf) |>>>
print ("\n Dataframe After RESTART: C:\Users\preeti\AppData\Local
inserting new record") list.py
newdf.loc('3']=(45,
print (newdf)
56, 67,88,99) #New record added at location 3 Aditya Ajay Pankaj Rinku Ritu
92.0 75 88 67 78.0
1 NaN 87 65 77 58.0
2 70.0 67 74 88 NaN

Dataframe After inserting new record


Aditya Ajay Pankaj Rinku Ritu
92.0 75 88 67 78.0
NaN 87 65
2 77 58.0
70.0 67 74 88 NaN
45.0 56
We cannot use this method to 67 88 99.0

add a
(label). In such a case, a row with thisrow of data with already
index label will be existing (duplicate) index value
>>> newdf.loc ['3']=[55, 66, 77,98,39]
>>> print (newdf)
updated/modified as shown below:
Aditya
92.0
Ajay Pankaj Rinku Ritu
75 88 67
1 NaN 87 78.0
65 77
2 70.0 67 58.0
74 88
55.0 66 NaN
77 98 39.0
If we try to add a row with
lesser values than the number of columns in the
in aValueError, with the
error message:
ValueError: Cannot set a row with mismatched columns. dataframe, it results
Similarly, if we try to adda column with lesser values
it results in a than the number of
ValueError, with the error message: rows in the
ValueError: Length of values does not match leng th of index.
An alternate method to add a new
DataFrame,
row in a dataframe is using .at
Syntax: attribute.
<df>, at [<row label>,:]=<new value (s)>

1.34 Informatics Practices with Python-XI|


Example 3:
>>> df. at (4,:)=80
>>> df
Name English Economics Informatics Practices
Rinku 89.0 87.0 89.0
1 Deep 78.0 80.0 78.0
2 Shaurya 89.0 60.0 67.0
3 Radhika 90.0 84.0 90.0
80 80.0 80.0 80.0

Example 4:
Name English Economics Informatics Practices
) Rinku 89 87 89
1 Deep 80
2 Shaurya 8 60 67
3 Radhi ka 90 84 90
>>> df.at [4,:]=("Geet",80,60,90]
>>> df
Name English Economics Informatics Practices
Rinku 89.0 87.0 89.0
1 Deep 78.0 80.0 78.0
2 Shaurya 89.0 60.0 67.0
3 Radhika 90.0 84.0 90.0
Geet 80.0 60.0 90.0

1.11.6 Renaming Column Name in Dataframe


By default, the column label given to dataframe is range index, i.e., 0 to n-1, where n is the total
number of columns, but Pandas provides us with the flexibility to even change or rename any
column inside a dataframe.
Let us create a dataframe from the given list:
al = [101, 102, 103, 104,105]
We shall be renaming this default column name '0' to Admno.
Syntax to change the name of a single column:
df.columns= [<new column name>]
>>> inport pandas as pd
>>> al=(101, 102, 103, 104,105)
>>> df=pd. DataFrame (a1)
>>> al
[101, 102, 103, 104, 105)
>>> df

101
1 102
103 Default column name
104 as 0
4 105

Practical Implementation-30
To rename a column name.
>> df. columns=['Admno'] #Renaming the col umn name to Admno
>> df
Admno
101
102
103
104
105

In the above Practical Implementation, we have learnt to rename the only given column in
In Pandas, this iss done by using the function rename() as exhibited in the Practical
below.

Practical Implementation-31
Implementa iy
Create a dataframe student with columns such as name of the student, student's marks in
IP and BST. Also, rename the column Name as Nm and IP as Informatics Practices
progrenme dfipy- CAUsers preet\ AppData|Loca\ Programs\Python Python37-32\prog renme_ df,py (3.7) - ox
subieco
File Edt Fomat Run Options Window Help
#Renaming columns in a dataframe
import pandas as pd
s=[['Rinku',79, 72], ('Ritu',75, 73], ('Ajay', 80,76])
print ('Series Generated As:")
print (s)
df-pd. DataFrame (s, columns=('Name', 'IP', 'BST')
print (df)
f.rename (columns={'Name':'Nm','IP':'Informatics Practices'), inplace=True)
print (df) >>>
RESTART : C:\Users\preeti\AppData\Local\Programs\Python\
f-py L 14 Col:0
Series Generated As:
Rinku', 79, 72], ('Ritu', 75, 73], ('Ajay', 80, 76]]
Name IP BST
Rinku 79 72
1 Ritu 75 73
2 Ajay 80 76
Nm Informatics Practices BST
0 Rinku 79 72
Ritu 75 73
Ajay 80 76

As is evident from the ab0ve code,


rename) method has been used to rename acolumn in
Pandas dataframe.
Another way to change column names in Pandas is to use
to change column names is a much better way. One rename() function. Using rename(0
and not all column names need to be
can change names of specific columns easily
changed.
One of the biggest advantages of using
as many
rename() function is that we can use rename to chang
column names as we want.
Its syntax is:
df.rename (columns=d, inplace=True)
where d is adictionary and the keys are the columns you
want to change. The values are ne
new names for these columns. Also, inplace=True is given as the
column names in place. Thus, as shown in the given output attribute of rename) to chang"
window, the
dataframe, Name and IP have been renamed as Nm and Informatics column names of stude
Practices respectively.
1.11.7 Adding Column to a
Dataframe
You can add new columns to an already
existing dataframe.
Syntax add or change a
to
column:
<afobject>.<New Col Name> [<row label>] =<new value (s) >
Practical Implementation-32
To adda new column to a dataframe.
In Practical we generated a dataframe for AdmNo of all
Suppose
Implementation-30,
we have to add a new for Name, Physics,
column
students.
dataframe.
Chemistry, Maths into the sane
It will be done as under:
df['Name']= ['Shruti', ' Gun jan',Tanya' ,'Kirti','Vineet']
df['Physics']=pd. Series ( [89, 78,65,45, 55])
df ['Chemistry']=pd. Series ([77, 89, 74,60, 56]
df['Maths']=pd. Series ([88, 65, 79, 78, 58] )
df ['Total']=df["Physics']+df ['Chemistry' ]+df ['Maths ']
Let us implement this in Pandas code.
>> df["Name' )=['Shruti', 'Gunj an','Tanya', 'Kirti', 'Vineet ')
>>> df
Admno Name
101 Shruti
1 102 Gunjan
2 103 Tanya
3 104 Kirti
4 105 Vineet
>>> df ['Physics' ]=pd. Series ([89, 78, 65, 45, 55] )
>>> df['Chemistry']=pd. Series ( [77,89, 74, 60,56)
>>> df 'Maths']=pa. Series([88,65,79, 78, 58])
>>> df["Total']=df|' Physics'] +df ['Chemistry']+df (' Maths']
|>>> df
Admno Nane Physics Chemistry Maths Total
101 Shruti 89 77 88 254
0
78 89 65 232
1 102 Gunjan
65 74 79 218
2 103 Tanya 78 183
3 104 Kirti 45 60
55 56 58 169
4 105 Vineet
>>>
columns in addition to the column Admno
Now in the above code, we have created five new
have passed a list of Names for the new
which we renamed in Practical Implementation-30. We columns,
by Pandas. For the next
column Name.This shall be automatically copied to all the rows
their associated values. Then the sixth
Physics, Chemistry and Maths, we have passed series with
(Physics, Chemistry, Maths) and the total of
column comprises the total of these three columns
student.
allthe three subjects is displayed row-wise, i.e., for every
operators in any of the columns of the
Therefore, we can update the, column values by arithmetic
dataframe or its specific columns with
dataframe. Also, we can assign or copy the values of the
the help of assignment operator (-).
Average of total marks in three subjects for all the
Suppose we have to adda new column for
students. It will be done as:

df ['Average'] = df [ 'Total']
/3
>>> df ['Average']=df ('Total' ]/3
>>> df
Admno Name Physics Chemistry Maths Total Average
84.666667
89 77 88 254
101 Shruti 77.333333
78 89 65 232
102 Gunjan 79 218 72.666667
65 74
2 103 Tanya 183 61.000000
45 60 78
3 104 Kirti 58 169 56.333333
105 Vineet 55 56

POINT TO REMEMBER should


column toan already created dataframe, the length of values of new column
While adding anew
index column.
match and be equal to the length of the

sing Pandas 1.37


Practical Implementation-33
Create the following dataframe by the name (test_result) consisting of
students in the test. scores obtained
Add a new column 'qualify' to define the status of the participants in the
là prog,dict setesdfspy- CUses/preeti/AppDalalocalArograms/pnthon/ty thon37.-32/prog,dict seriesdf3 py (3.70),- o
qualifying exam.
File Ed Formt Run Options Window Help
#Creating a dataframe and adding a new column to it
import pandas as pd
test= ("Name" :('Ritu', 'Ajay', 'Manu', 'Rohit', 'Reema'),
"Score" : (67, 78,89, 56, 90),
"NO attempts" : (4,2, 1,1, 3))
test_result=på. DataFrame (test,index=("Respondento', 'Respondent1', 'Respondent2',
'Respondent3', 'Respondent4')
print (test_result)
#Adding a new column :qualify to test_result
test result ['qualify' ) =['NO, 'YES','NO', 'YES','YES')
print ("dataframe after adding new column ")
>>>
print (test_result)
RESTART : C:/Users/preeti/AppData/Local/Programg
riesdf3.pY Name Score No attempts
Respondent0 Ritu 67
Respondent1 Ajay 78 2
Respondent2 Manu 89 1
Respondent3 Rohit 56 1
Respondent4 Reema 90 3
dataframe after adding new column
Name Score No_ attempts qualify
Respondent0 Ritu 67 4 NO
Respondentl Ajay 78 2 YES
Respondent2 Manu 89 1 NO
Respondent3 Rohit 56 1 YES
|Respondent4 Reema 90 3 YES

Alternatively, we can also add a new column using insert() method.


By using insert() function, we can add a new column to the existing dataframe at any position/
column index.
Syntax to add a new column to the existing dataframe:
<dfobject>.insert (n, <new column name>, [data], allow duplicates=False)
Here,
n: is the index of the column where the new column is to be inserted
<new column name>: new columnto be inserted
[data]: list of valuesto be added to the new column
With respect to the above syntax, we can add new column 'qualify' at index '3' in the dataframe
using insert() function.
prog dict seriesd13.py -
File Edt Format Run CUsers/preeti/AppData/Local/Programs/Python/Python37-32/prog_dict seriesdf3,py (3.7.0)
Options Window Help

Modification to Practical Implementation-30


import pandas as pd
test ("Name":|'Ritu', 'Ajay','Manu', 'Rohit ','Reema').
"Score" : (67, 78,89, 56,901,
"No attempts" : (4, 2, 1, 1,3))
test result=pd.DataFrame (test, index-('Respondent0', 'Respondent1', 'Respondent2',
print (test result) 'Respondent3', 'Respondent 4'))
#Adding a new column :qualify to test_result
#using insert ()
test result . insert (3, 'qualify', ('NO','YES','NO", 'YES', YES))
print (" dat aframe after adding new column ")
print (test result)
Le 15 Cot 0

as ith Putho
>>>

RESTART : C:\Users\preeti\AppData\Local\ Programs\


riesdf3.py
Name Score No attempts
Respondent0 Ritu 67 4
Respondent1 Ajay 78 2
Respondent2 Manu 89 1
Respondent3 Rohit 56 1
Respondent4 Reema 90 3
dataframe after adding new column
Name Score No attempts qualify
Respondent0 Ritu 67 4 NO
78 2 YES
Respondent1 Ajay
Respondent2 Manu 89 1 NO
Respondent3 Rohit 56 1 YES
Respondent4 Reema 90 3 YES

1.11.8Selecting a Column from aDataframe


to slicing
The method of selecting/accessing a column or column(s) of a dataframe is similar
column(s).
using series. Pandas provides three methods to access a dataframe
string
Using the format of square brackets followed by the name of the column passed as a
value, like df-object['column_ name'].
Using the dot notation, like df-object.column_name.
using iloc and loc attribute of a
Selecting or Accessing Rows/Columns from a DataFrame
Dataframe.
loc: label-based indexing
iloc: index-based or integer-location-based indexing
df-object.iloc[;,<column_number>].
Using numeric indexing and the iloc attribute, like
return a numeric value
Here, istands for integer, which signifies that this command shall
denoting the row and the column range.
simply giving the name of
We can select and display any column from the dataframe by
the column to dataframe object, df, like this:
implemented in
df["Total'] (square bracket format) or df. Total (using dot notation), and
Pandas as follows:

Square Bracket Format


>>> af Chemistry Maths Total Average
Admno Name Physics 254 84.666667
89 77 88
101 Shruti
89 65 232 77.333333
78
102 Gunjan 218 72.666667
65 74 79
2 103 Tanya 183 61.000000
45 60 78
3 104 Kirti 169 56.333333 dot notation
55 56 58
4 105 Vineet
>>> df.Total
b>> df ['Total')
0 254
254 1 232
232 218
218 183
183 169
169 Name: Total, dtype: int64
Name: Total, dtype: int64

column(s) can be displayed as a


Using iloc-In the above command, single column or multiple dataframe, this can be done
columns of the
series but if we intend to display all the rows and
using iloc.
Syntax:
df.iloc[row-indexes, column-indexes]
1.39
Data Handling using Pandas
For example,
If you want to access the first and the fifth columns from the given
using iloc like: dataframe, this can be
dr
>>> df.iloc[: , (1,5))
Name Total
Shruti 254
1 Gunjan 232
2 Tanya 218
Kirti 183
4 Vineet 169

As observed from the output displayed, all the rows of the first
(Name) and fifth
are shown respectively. Here, [:] signifies all rows and [1,5] indicates 1* and 5th colur (Total)
index
This was for selective columns. If you wish to
extract a range of columns, suppose the firet columns
columns of the dataframe df, then the command shall be [0:5] as shown below:
>>> df.iloc[:,0:5)
Admno Name Physics Chemistry Maths
101 Shruti 89 77 88
1 102 Gunjan 78 89 65
2 103 Tanya 65 74 79
3 104 Kirti 45 60 78
4 105 Vineet 55 56 58

Similarly, for the first two columns, it would be:


>>> df.iloc[:,0:2)
Admno Name
101 Shruti
102 Gunjan
2 103 Tanya
3 104 Kirti
4 105 Vineet

Practical Implementation-34
Implementation of iloc and loc attribute with respect to 'student' dataframe.
Là prog _selectdf_yowscolspy-
C/Users/preeti/AppData/local/Programs/Python/Python37-32/prog_selec. -
File Edt Format Run Optio Window Help
#Selecting/Accessing Rows/Column s from a DataFrame
import pandas as pd
student= ("Roll No": [1, 2, 3,4,5),
"Name":['Rinku', 'Deep', 'Shaurya','Radhika', 'Rohit'),
"English" :(89, 78,89,90,79),
"Economics": [87,80,60,84,77),
"IP" : (89, 78,67,90,92))
df-pd.DataFrame (student)
print (df)
#Displaying row index 0 to 2
print (df.iloc [0:3])
#Displaying the contents from roW 1ndex 0 to 2 and column index 0 to 2
print (df.iloc [0:3,0:3])
#Displaying the contents of DataFrame using loc
#location name from 1 to 3
8.
print (df. loc [1 :3])
Lr 20 Cot 0
RESTART: C:/Users/preeti/AppDat a/ Local/Programs
f_rowscols .py
Roll No Name Engl ish Economics IP
1 Rinku 87
1 2 Deep 78 80
2 3 Shaurya 60 67
3 4 Radhika 90 84 90
4 5 Rohit 79 77 92
Roll NO Name English Economics I
0 1 Rinku 89 87 89
1 2 Deep 78 80 78
3 89 60 67
Shaurya
Roll No Name English
1 Rinku 89
2 Deep 78
3 Shaurya 89
RollNo Name English Economics IP
1 2 Deep 80 78
2 3 Shaurya 89 60 67
3 4 Radhika 90 84 90
>>>

1.11.9 Deleting a Column from aDataframe


Like adding a new column to a dataframe, you can also delete a column from a dataframe.
Columns can be deleted from an existing dataframe in three ways:
> Using the del keyword: It will simply delete the entire column and its contents from the
dataframe (in-place).
Using the pop() method: pop)method will delete the column from a dataframe by providing
values.
the name of the column as an argument. It will return the deleted column along with its
Using the drop() method: drop) method willdelete the values from a dataframe. The values
can be of either row or column.

Itssyntax is:
drop (1abels, axis-1 ,inplace-True)
axis=1 means
It will return a new dataframe with the column(s) deleted or removed.
Column and 0stands for Row. By default, the value of axis is 0.
|>>> df
Adnno Name Physics Chemistry Maths Total Average
89 77 254 84.666667
101 Shruti
78 89 65 232 77.333333
102 Gunjan 218 72.666667
65 74 79
103 Tanya 183 61.000000
Kirti 45 60 78
104 169 56.333333
105 Vineet 55 56 58
>>> del df[ 'Average']
>>> df
Admno Name Physics Chemistry Maths Total
89 77 88 254
101 Shruti
78 89 65 232
102 Gunjan 79 218
65 74
2 103 Tanya
45 60 78 183
3 104 Kirti
55 56 58 169
105 Vineet
>>> df.pop('Total')
254
232
2 218
3 183
169
Name: Total, dtype: int64
>>> df.drop('Admno',axis=1)
Name Physics Chemistry Maths
0 Shruti 77 89
7 89 65
1 Gunjan
65 74 79
Tanya
3 45 60 78
Kirti
55 56 58
4 Vìneet

4
1.11.10Sorting Data in Dataframes*
We can sort the data present inside the dataframe by using sort_values()
arguments are passed--the first is the sorting field and the second is order of function. IfHeere, tm
not providing anything then, by default, the data shall be sorted inthe ascending sorting
order or it
explicitly defined as ascending =True; for descending, you have to give ascending =Fale can b
Practical Implementation-35
To sort the data of student dataframe based on marks.
prog. dt_sort.py - Users/preeti/AppData/Loca/Programs/Python/Python37-32/prog.df sortpy (3
Fie Edit Fomat Run Options Window Help
#Sorting the data in a dataframe
import pandas as pd
student marks -pd. series ({ 'Vijaya' :80, "'Rahul' :92, 'Meghna':67,
'Radhika':95, 'Shaurya' :97))
student age pd. Series ({'Vijaya':32, 'Rahul' :28, 'Meghna':30,
'Radhika' :25,' Shaurya':20))
student df =pd. DataFrame ({'Marks' :student markS,
print (student df) 'Age':student age))
#Sorting the data on the basis of marks in ascending order
print (student df. sort_values
(by=['Marks'))) #by keyword
#the field on the basis of which the
data is to be sorteddefines
print (student df . sort values(by= ('Marks'],
#in descending order of Marks ascending=False) ) #Sorted
Ln 19 Cok 0

'by' keyword defines the name of the


field or
Here, we have provided 'Marks' column and column based on which the data is to be sorted.
dataframe on the basis of marks in descendingascending = False that willsort the
order as displayed in the output
data inside the
>>> window.
RESTART : C:/Users/preeti/AppData/Local/1
grams/Python/Python37-32/prog_df sort.py
Marks Age
vijaya
Rahul
80
92
32
28
Meghna
Radhika
67
95
30
25
Unsorted Data
|Shaurya 97 20
Marks Age
Meghna 67 30
|vijaya 80 32
Rahul 92 28 Sorting data on the basis of
Radhika
Shaurya
9525
97
Marks in ascending order
20
Marks97 Age
Shaurya 20
Radhika 95 25
Rahul
|Vijaya
92 28 Sorting data on the basis of
8032 Marks in
Meghna 67 30 descending order
1.12 ITERATIONS IN
Sometimes we need to
DATAFRAME
each record perform
one by one in a iteration on a complete
dataframe.
values separately. Therefore, it is In such cases, it dataframe, i.e.,
using any of the two methods: necessary to perform
is
difficult accessing
to write a
iteration on dataframe which
and retrieving
code to access the
<DFObject>.iterrows()-It represents dataframe can be done

*Not in
<DFObject>.iteritems()-It represents dataframe column-wise
syllabus but retained for extra learning.
row-wise, record by record.
1.42 Informatics Practices with Python-XIl
Let us see how the above two methods work. Consider three data series for yearly sales of
ABC Ltd.

2015-Qtr1: 34500, Qtr2: 45000, Qtr3: 50000, Qtr4: 39000


2016--Qtr1: 44500, Qtr2: 65000, Qtr3: 70000,Qtr4: 49000
2017--Qtr1: 54500, Qtr2: 42000, Qtr3: 40000, Qtr4: 89000
The first step is to represent these data series into a dataframe and then to perform iteration
(repetition) for accessing and displaying each record one by one.
t prog.df iter.py - C/Users/preeti/App Data/local/Programs/Python/ Python37-32/prog.dfter.py (3/7.0) -D
Fle Edit Fomat Run Options Window Help
#Implementing iterrows ()
import pandas as pd
total sales -(2015:('Qtr1':34500, 'Qtr2' :45000, 'Qtr3' :50000, "Qtr4':39000).
2016:('Qtrl':44500, 'Qtr2':65000, ' Qtr3':70000, 'Qtr4':49000) ,
df=
2017:{'Qtr1':54500,
på.DataFrame (total_sales)
'Qtr2':42000, 'Qtr3':40000, 'Qtr4': 89000}}
#Converting data series into Dataframe
print (df)
Lz 10 Cot 0

>>>
RESTART: C:/Users/preeti/AppData/Local
rl.py
2015 2016 2017
Qtri 34500 44500 54500
Qtr2 45000 65000 42000
Qtr3 50000 70000 40000
Qtr4 39000 49000 89000
>>>

Using iterrows():
series. The
The first step has been completed by creating a dataframe from the quarterly sales
adding the following
next step is to display the record in the created dataframe one by one by
code in the previous code:
for (row, rowSeries) in df.iterrows () :
print ("RoWIndex :",row)
print ("Containing :")
print (rowseries)

dataframe one by one.


This code on execution shall display the records of the
rl.py
ROwIndex : Qtrl
Containing :
2015 34500
2016 44500
2017 54500
Name: Qtrl, dtype: int64
ROwIndex : 0tr2
Containing These are the values of df
2015 45000
2016 65000 which are processed one
2017 42000 by one.
Name: Qtr2, dtype: int64
RowIndex : Qtr3
Containing :
2015 50000
2016 70000
2017 40000
Name: Qtr3, dtype: int64
RowIndex : Qtr4
Containing :
2015 39000
2016 49000
2017 89000
Name: Qtr4, dtype: int64
|>>>

Data Handling using Pandas 1.43


Using iteritems():
This method shall display the data from the dataframe column-wise. After
dataframe, which we have done in the above example, write the code for the creatiothen of tha
column-wise as shown below:
for (col,colseries) in df.iteritems 0: displ aying column-wise data
displaying
print ("Column Index :",col)
print ("Cont aining :") >>>

print (colseries) RESTART: C:/Users/preeti/AppData/Local


ise.py
Column Index : 2015
Containing :
Qtrl 34500
Qtr2 45000
Qtr3 50000
Qtr4 39000
Name: 2015, dtype: int64
Column Index : 2016
Containing :
Qtrl 44500
Qtr2 65000
Qtr3 70000
Qtr4 49000
Name: 2016, dtype: int64
Column Index : 2017
Containing :
Qtri 54500
Qtr2 42000
Qtr3 40000
Qtr4 89000
Name: 2017, dtype: int64
|>>>

Practical Implementation-36
Write a program to iterate over a
as per marks (as per the dataframe containing names and marks, then calculate
following criteria) and add them to the grade grades
Marks >=90 Grade A+ Name Mais column:
Marks 70 - 90 Grade A
Marks 60 - 70 Grade B
Soma 74

Marks 50 - 60 Grade C
Marks 40 - 50 Grade D Ab:
Marks < 40 Grade F
import pandas as pd
import numpy as np
names =
marks = pd.Series
(
['Sanjeev',
pd.Series ([76, 86, 55, 'Rajeev', 'Sanjay', 'Abhay')
stud =
df = {'Name': names, 'Marks': 54])
marks)
pd.DataFrame
df ['Grade'] (stud, columns= ['Name',
= np.NaN 'Marks )
print
print ("Initial
(df) values in
DataFrame:")
1stMarks
for
= [) #
Initialize an empty list
index, row in to store grades
df.iterrows
Mmarks = row['Marks']
if
():
marks >= 90:
elif 1stMarks.append
marks >= 70: ('A+')
elif 1stMarks.
Imarks append('A')
60:

1stMarks.append('
elif marks 50: B')
lstMarks.append
elif marks >= 40: ('C)
else 1stMarks.append('D') Initial values in
Name Marks DataFrame:
Grade
df ['Grade']
1stMarks.append('F)
1stMaarks
calculated grades to the Grade
print Assign the
0 Sanjeev
1 Rajeev
2 Sanjay
76
86
55
NaN
NaN

("\nDataFrame
Grades:")
print (df)
after column" Of
Calculation
3
Abhay 54
NaN
NaN

DataFraie after calculation of Grades:


Nane Marks Grade
0 San jeev 76
A
2 Rajeev
Sanjay
86
55
3
Abhay 54

1.44
Informatics Practices with Python-Xi
1.13 BINARY OPERATIONS
In Mathematics, a binary operator is a function that combines two values to produce a new value.
The binary operator function could perform addition,subtraction and so on to return the new value.
Pandas Dataframe has several binary operator functions defined for combining two dataframes.
The binary operator functions return a new dataframe as a result of combining the two dataframes.
It is possible to perform add, subtract, multiply and divide operations on dataframe. Pandas
provides the methods add(), sub(), mul), div() and related functions radd(), rsub() for
carrying out binary operations on dataframe.
Out of these operations, add(), sub(), mul(0 and div() methods perform the basicmathematical
operations for addition, subtraction, multiplication and division of two dataframes.
The functions rsub and radd stand for right-side subtraction and right-side addition. rsub() function
subtracts each element in adataframe with the corresponding element in the other dataframe.
radd() function adds each element in a dataframe with the corresponding element in the other
dataframe. Since all these operations involve two dataframes to act upon,they are known as Binary
("bi' means 'two' and 'ary' means digits or Python dataframes in this case) operations.
Practical Implementation-37
To perform binaryoperations on two dataframes. We will perform operations on the following
two dataframesmarks obtained by two students in a class test:
student = {'Unit Test-1':(5,6, 8,3, 10], 'Unit Test-2' :[7,8,9, 6,15])
studentl ={"Unit Test-1':(3,3, 6,6, 8], 'Unit Test-2': [5,9,8,10, 5]}
G prog_binaryoptpy - C/Users/preeti/AppData/Loca/Programs/Python/Python37-32/progbina.
File Edit Fomat Run Options Window Help
$Binary operations on Dataframes
import pandas as pd
student ={'Unit Test-1':(5,6, 8,3,10], 'Unit Test-2': (7,8,9,6, 15] }
studentl ={'Unit Test-1':[3,3, 6, 6, 8], 'Unit Test-2': [5,9,8, 10,5])
ds pd. DataFrame (student)
As1 as). DataFrame (student1)
print
print (ds1)
print ("Subtraction")
print (ds.sub (ds1) )
print ("rsub")
print (ds.rsub (dsl) )
prínt ("Addition ")
print (ds.add (ds1) )
print ("radd")
print (ds.radd (ds1) )
print ("Multiplication")
print (ds.mul (ds1) )
print ("Division")
print (ds.div (dsl))
L 23 Cot 0

>>> rsub
RESTART: C:/Users/preeti/AppData/Local) Unít Test-1 Unit Test-2
-2 -2
opt
Unit Test-1 Unit Test-2 1 -3 1
5 2 -2 -1
6 3 3 4
Multiplication
8 4 -10 Unit Test-1 Unit Test-2
3 6 Addition 15 35
10 15 Unit Test-1 Unit Test-2 1 18 72
Unit Test-1 Unit Test-2 8 12
0 2 48 72
3 1 9 17
3 18 60
2 17
4 80 75
3 16
3 6 10 18 20
Division
4
8
radd
Unit Test-1 Unit Test-2
Subtraction 1.666667 1.400000
Unít Test-1 Unit Test-2
Unit Test-1 Unit Test-2 1 2.000000 0.888889
8 12
2
17 2 1.333333 1.125000
-1 1
3 0.600000
2 14 17 3 0.500000
2 2 1 3.000000
3 16 4 1.250000
-3 -4
4 10 18 20
2

*Not in syllabus but retained for extra learning


Data Handling using Pandas 1.45

You might also like