0% found this document useful (0 votes)
13 views

DataFrame Notes1

Uploaded by

sakitya j
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DataFrame Notes1

Uploaded by

sakitya j
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

DataFrame

A DataFrame is a two-dimensional labelled data structure


similar to spreadsheet or table of MySQL. It contains rows and
columns, and therefore has both row and column index. Each
column can have a different type of value such as numeric,
string, boolean, etc.
NOTE: Number of rows and columns can be increased or
decreased in DataFrame.
How to create DataFrame in Python?

There are many ways to create DataFrame in Python. Let we


discuss few of them

1. Creation of an empty DataFrame:


Code to create an empty DataFrame is given below
import pandas as pd
DF = pd.DataFrame( )
print(DF)

OUTPUT:
Empty DataFrame
Columns: [ ]
Index: [ ]
2. Creation of DataFrame from numpy arrays:
Let us create DataFrame from the numpy arrays
import numpy as np
import pandas as pd
ar1 = np.array([1, 2, 3, 4]) #First array created containing 4
integers
ar2 = np.array([10, 20, 30, 40]) #Second array created
containing 4 integers
ar3 = np.array([-23, -43, 67, 90]) #Third array created
containing 4 integers

#Let us create DataFrame using first array only and observe


the output
DF = pd.DataFrame(ar1)
print(DF)

OUTPUT:
0
01
12
23
34
---------------------
import numpy as np
import pandas as pd
ar1 = np.array([1, 2, 3, 4]) #First array created containing 4
integers
ar2 = np.array([10, 20, 30, 40]) #Second array created
containing 4 integers
ar3 = np.array([-23, -43, 67, 90]) #Third array created
containing 4 integers

#Let us create DataFrame using first and second array only


and observe the output

DF = pd.DataFrame([ar1,ar2]) #Creating dataframe using first


and second array
print(DF)

OUTPUT:

0 1 2 3
0 1 2 3 4
1 10 20 30 40

---------------------

import pandas as pd
ar1 = np.array([1, 2, 3, 4]) #First array created containing 4
integers
ar2 = np.array([10, 20, 30, 40]) #Second array created
containing 4 integers
ar3 = np.array([-23, -43, 67, 90]) #Third array created
containing 4 integers

#Let us create DataFrame using all the three arrays and


observe the output
DF = pd.DataFrame([ar1, ar2, ar3]) #Creating dataframe using
all three arrays
print(DF)

OUTPUT:

0 1 2 3
0 1 2 3 4
1 10 20 30 40
2 -23 -43 67 90

3. Creation of DataFrame from Lists:

We can create dataframe from list by passing list to


DataFrame( ) function. All the elements of list will be displayed
as columns. The default label of column is 0. for example
Practical 1: To create dataframe from simple list.
import pandas as pd
df = pd.DataFrame([11, 22, 33, 44, 55])
print(df)

OUTPUT:

0
0 11
1 22
2 33
3 44
4 55
Practical 2: To create dataframe from simple list by passing
appropriate column heading and row index.
import pandas as pd
df = pd.DataFrame([11, 22, 33, 44, 55], index=['R1',
'R2','R3','R4','R5'], columns=['C1'])
print(df)

OUTPUT:

C1
R1 11
R2 22
R3 33
R4 44
R5 55
Practical 3: To create dataframe from nested list.
import pandas as pd
df = pd.DataFrame([[21, 'X', 'A'], [32, 'IX', 'B'], [23, 'X', 'A'],
[12, 'XI','A']])
print(df)

OUTPUT:

0 1 2
0 21 X A
1 32 IX B
2 23 X A
3 12 XI A
Practical 4: To create dataframe from nested list by passing
appropriate column heading and row index.
import pandas as pd
df = pd.DataFrame([[21, 'X', 'A'], [32, 'IX', 'B'], [23, 'X', 'A'],[12,
'XI','A']], index= ['Rec1', 'Rec2', 'Rec3', 'Rec4'], columns =
["Rno", "Class", "Sec"])
print(df)

OUTPUT:

Rno Class Sec


Rec1 21 X A
Rec2 32 IX B
Rec3 23 X A
Rec4 12 XI A
4. Creation of DataFrame from Dictionary of lists: We can
create dataframe from dictionaries of list as shown below. for
example
Practical 1: To create dataframe using dictionaries of list.
import pandas as pd
df = pd.DataFrame({'Rno' : [21, 28, 31], 'Class' : ['IX', 'X', 'XI'],
'Sec' : ['B', 'A','C']})
print(df)
OUTPUT:

Rno Class Sec


0 21 IX B
1 28 X A
2 31 XI C
Practical 2: To create dataframe using dictionaries of list with
appropriate row index.
import pandas as pd
df = pd.DataFrame({'B_id' : ['B1', 'B8', 'B5'], 'Sub' : ['Hindi',
'Math', 'Science'], 'Cost' : [450, 520, 400]}, index=['R1', 'R2',
'R3'])
print(df)

OUTPUT:

B_id Sub Cost


R1 B1 Hindi 450
R2 B8 Math 520
R3 B5 Science 400
Note: Dictionary keys become column labels by default in a
DataFrame, and the lists become the rows
5. Creation of DataFrame from List of Dictionaries : We can
create dataframe from list of dictionaries. for example
import pandas as pd
df = pd.DataFrame([{'Ram' : 25, 'Anil' : 29, 'Simple' : 28},
{'Ram' : 21, 'Anil' : 25, 'Simple':23}, {'Ram' : 23, 'Anil' : 18,
'Simple' : 26}], index=['Term1', 'Term2', 'Term3'])
print(df)

OUTPUT:

Ram Anil Simple


Term1 25 29 28
Term2 21 25 23
Term3 23 18 26
Here, the keys of dictionaries are taken as column labels, and
the values corresponding to each key are taken as rows. There
will be as many rows as the number of dictionaries present in
the list.
NOTE: NaN (Not a Number) is inserted if a corresponding value
for a column is missing as shown in the following example.
import pandas as pd
df = pd.DataFrame([{'Ram' : 25, 'Anil' : 29, 'Simple' : 28},
{'Ram' : 21, 'Anil' : 25, 'Simple':23}, {'Ram' : 23, 'Anil' : 18}],
index=['Term1', 'Term2', 'Term3'])
print(df)

OUTPUT:

Ram Anil Simple


Term1 25 29 28
Term2 21 25 23
Term3 23 18 NaN
6. Creation of DataFrame from Series : We can create
dataframe from single or multiple Series. for example
Example 1: Creation of DataFrame from Single Series.
import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54, 24])
df = pd.DataFrame(S1)
print(df)

OUTPUT:

0
0 10
1 20
2 30
3 40
Here, the DataFrame has as many numbers of rows as the
numbers of elements in the series, but has only one column.
Example 2: Creation of DataFrame from two Series.
import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54, 24])
df = pd.DataFrame([S1, S2], index = ['R1', 'R2'])
print(df)

OUTPUT:

0 1 2 3
R1 10 20 30 40
R2 11 22 33 44
Example 3: Creation of DataFrame from three Series.
import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54, 24])
df = pd.DataFrame([S1, S2, S3],index = ['R1', 'R2', 'R3'])
print(df)

OUTPUT:

0 1 2 3
R1 10 20 30 40
R2 11 22 33 44
R3 34 44 54 24
To create a DataFrame using more than one series, we need to
pass multiple series in the list as shown above
NOTE: if a particular series does not have a corresponding
value for a label, NaN is inserted in the DataFrame column. for
example
import pandas as pd
S1 = pd.Series([10, 20, 30, 40])
S2 = pd.Series([11, 22, 33, 44])
S3 = pd.Series([34, 44, 54])
df = pd.DataFrame([S1, S2, S3],index = ['R1', 'R2', 'R3'])
print(df)

OUTPUT:

0 1 2 3
R1 10.0 20.0 30.0 40.0
R2 11.0 22.0 33.0 44.0
R3 34.0 44.0 54.0 NaN
Operations on rows and columns in
DataFrames
We can perform some basic operations on rows and columns of
a DataFrame like
1. Adding a New Column to a DataFrame:
We can easily add a new column to a DataFrame. Lets see the
example given below
import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28},
{'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18,
'Simple':26}],index=['R1','R2','R3'])
print(df)
df['Amit']=[18, 22, 25] #Adding column to DataFrame
print(df)
df['Parth']=[28, 12, 30] #Adding column to DataFrame
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
Ram Anil Simple Amit
R1 25 29 28 18
R2 21 25 23 22
R3 23 18 26 25
Ram Anil Simple Amit Parth
R1 25 29 28 18 28
R2 21 25 23 22 12
R3 23 18 26 25 30
NOTE: If we try to add a column with lesser/more values than
the number of rows in the DataFrame, it results in a
ValueError, with the error message: ValueError: Length of
values does not match length of index. for example
import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28},
{'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18,
'Simple':26}],index=['R1','R2','R3'])
print(df)
df['Amit']=[18, 22]
print(df)

OUTPUT:
ValueError: Length of values does not match length of index
2. Adding a New Row to a DataFrame:
We can add a new row to a DataFrame using the
DataFrame.loc[ ] method. Lets see the example given below
import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28},
{'Ram':21, 'Anil':25, 'Simple':23}, {'Ram':23, 'Anil':18,
'Simple':26}], index=['R1', 'R2', 'R3'])
print(df)
df.loc['R4']=[12, 22, 10] #Adding new row
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
Ram Anil Simple
R1 25 29 28
R2 21 25 23
R3 23 18 26
R4 12 22 10
NOTE: If we try to add a row with lesser/more values than the
number of columns in the DataFrame, it results in a
ValueError, with the error message: ValueError: Cannot set a
row with mismatched columns. for example
import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28},
{'Ram':21, 'Anil':25, 'Simple':23}, {'Ram':23, 'Anil':18,
'Simple':26}], index=['R1', 'R2', 'R3'])
print(df)
df.loc['R4']=[12, 22] #Adding new row with less number of
values
print(df)

OUTPUT:

ValueError: cannot set a row with mismatched columns


3. Deleting a Row from a DataFrame:
We can use the DataFrame.drop() method to delete rows. To
delete a row, the parameter axis is assigned the value 0. Lets
see the examples given below
Example 1: To delete a single row from a Dataframe.
import pandas as pd
df = pd.DataFrame([{'Ram':25, 'Anil':29, 'Simple':28},
{'Ram':21, 'Anil':25, 'Simple':23},{'Ram':23, 'Anil':18,
'Simple':26}],index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop('R2', axis = 0) #Deleting a row from datafarame
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
----------------------------------------------------
Ram Anil Simple
R1 25 29 28
R3 23 18 26
Example 2: To delete a multiple rows from a Dataframe.
import pandas as pd
df = pd.DataFrame({'Ram' : [25, 21, 23], 'Anil' : [29, 25, 18],
'Simple' : [28, 23, 26]}, index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop(['R2', 'R1'], axis = 0) #deleting multiple rows from
dataframe
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
----------------------------------------------------
Ram Anil Simple
R3 23 18 26
4. Deleting a Column from a DataFrame:
We can delete the columns from a dataframe by using the
following methods
1. pop( ): This method deletes the column from a dataframe
and also return the values of deleted column. for example:
import pandas as pd
df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18],
'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3'])
print(df.pop('Simple')) #Deleting a particular Column and
returning the value.
print("----------------------------------------------------")
print(df)

OUTPUT:

R1 28
R2 23
R3 26
Name: Simple, dtype: int64
----------------------------------------------------
Ram Anil
R1 25 29
R2 21 25
R3 23 18
2. drop( ): This method deletes the entire column from a
dataframe. To delete a column, the parameter axis is assigned
the value 1. Lets see the examples given below
import pandas as pd
df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18],
'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop('Simple', axis=1) #Deleting column from dataframe
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
----------------------------------------------------
Ram Anil
R1 25 29
R2 21 25
R3 23 18
To delete multiple columns
import pandas as pd
df = pd.DataFrame({'Ram': [25, 21, 23], 'Anil':[29, 25, 18],
'Simple':[28, 23, 26]},index=['R1', 'R2', 'R3'])
print(df)
print("----------------------------------------------------")
df=df.drop(['Simple', 'Ram'], axis=1) #deleting multiple
columns
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
----------------------------------------------------
Anil
R1 29
R2 25
R3 18
5. Renaming Row Labels of a DataFrame :
We can change the labels of rows in a DataFrame using the
DataFrame.rename() method. for example to rename the row
indices R1 to Maths, we can write the following code.
Example 1: To change row index ‘R1’ to ‘Maths’
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],
index = ['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'R1' : 'Maths'}) #Statement to change 'R1' to
'Maths'
print(df)
OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
Ram Anil Simple
Maths 25 29 28
R2 21 25 23
R3 23 18 26
Example 2: To change row index ‘R1’ to ‘Maths’, ‘R2’ to
‘Science’ and ‘R3’ to ‘English’
import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23,
18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
df=df.rename({'R1' : 'Maths', 'R2' : 'Science', 'R3' : 'English'},
axis = 'index')
print("-----------------------------------------------------")
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26

-----------------------------------------------------
Ram Anil Simple
Maths 25 29 28
Science 21 25 23
English 23 18 26

NOTE: The parameter axis='index' is used to specify that the


row label is to be changed. We can skip this also as bydefault
rename() function changes the row indices.
6. Renaming Column Labels of a DataFrame :
To alter the column names of a DataFrame we can use the
rename() method, as shown below. The parameter
axis=’columns’ implies we want to change the column labels:
Example 1: To change the column heading from ‘Ram’ to ‘Ravi’
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18,
26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'Ram' : 'Ravi'}, axis = 'columns')
print("-----------------------------------------------------")
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
-----------------------------------------------------
Ravi Anil Simple
R1 25 29 28
R2 21 25 23
R3 23 18 26
Example 2: To change the column heading from ‘Ram’ to
‘Ravi’ and from ‘Simple’ to ‘Sumit’
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18,
26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
df=df.rename({'Ram' : 'Ravi', 'Simple' : 'Sumit'}, axis =
'columns')
print("-----------------------------------------------------")
print(df)

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
-----------------------------------------------------
Ravi Anil Sumit
R1 25 29 28
R2 21 25 23
R3 23 18 26
Accessing DataFrames Element
through Indexing
Data elements in a DataFrame can be accessed using indexing.There
are two ways of indexing Dataframes :

1. Label based indexing


There are several methods in Pandas to implement label
based indexing. DataFrame.loc[ ] is an important method that
is used for label based indexing with DataFrames.
Example 1: To display single row from a dataframe using loc( )
method.
import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23,
18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc['R2']) #row label indexing

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram 21
Anil 25
Simple 23
Name: R2, dtype: int64
Example 2: To display multiple rows from a dataframe.
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],
index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[['R1', 'R3']]) #Multiple rows from dataframe

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram Anil Simple
R1 25 29 28
R3 23 18 26
Example 3: To display the values of single column label without
using loc( ) method.
import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23,
18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df['Ram']) #Column label indexing

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
R1 25
R2 21
R3 23
Name: Ram, dtype: int64
Example 4: To display the values of multiple columns from
dataframe without using loc( ) method.
import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23,
18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df[['Ram', 'Anil']]) #Multiple Column label indexing

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram Anil
R1 25 29
R2 21 25
R3 23 18
Example 5: To display the values of single column label using
loc( ) method.
import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23,
18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[: , 'Ram']) #Column label indexing using loc( )

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
R1 25
R2 21
R3 23
Name: Ram, dtype: int64
Example 6: To display the values of multiple columns from
dataframe using loc( ) method.
import pandas as pd
df = pd.DataFrame([[25, 29, 28],[21,25,23],[23,
18,26]],index=['R1','R2','R3'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[:, 'Ram' : 'Anil']]) #Multiple Column label indexing

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram Anil
R1 25 29
R2 21 25
R3 23 18
To access/display columns or rows from a dataframe using
positional indexing then iloc( ) method will be used.
Example 7: To display first column from a dataframe
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18,
26]],index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[:, 0 : 1])

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram
R1 25
R2 21
R3 23
Example 8: To display first and second column from a
dataframe
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23],[23, 18, 26]],
index=['R1', 'R2', 'R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[:, 0 : 2]) # print(df.iloc[:, [0,1]]) can also be used

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram Anil
R1 25 29
R2 21 25
R3 23 18
Example 9: To display only second row from a dataframe
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],
index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[1 : 2])

OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram Anil Simple
R2 21 25 23
Example 10: To display first and second row from a dataframe
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],
index=['R1','R2','R3'], columns = ['Ram', 'Anil', 'Simple'])
print(df)
print("---------------------------------------------------")
print(df.iloc[0:2]) # print(df.iloc[[0,1]]) can also be used
OUTPUT:

Ram Anil Simple


R1 25 29 28
R2 21 25 23
R3 23 18 26
---------------------------------------------------
Ram Anil Simple
R1 25 29 28
R2 21 25 23
Example 11: To display first, second and third row from a
dataframe.
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15]], index=['R1', 'R2', 'R3', 'R4'], columns
= ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc[['R1', 'R2', 'R4']]) # print(df.iloc[[0, 1, 3]]) or
print(df.loc[[True,True, False, True]]) can also be used

OUTPUT:

Ram Anil Simple Anuj


R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
---------------------------------------------------
Ram Anil Simple Anuj
R1 25 29 28 17
R2 21 25 23 20
R4 20 18 30 15
Example 12: To display marks of subject Math, English and
Science of ‘Anil’ from a dataframe.
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science',
'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math' : 'Science', 'Anil'])

OUTPUT:

Ram Anil Simple Anuj


Math 25 29 28 17
English 21 25 23 20
Science 23 18 26 23
Hindi 20 18 30 15
---------------------------------------------------
Math 29
English 25
Science 18
Name: Anil, dtype: int64
Example 13: To display marks of subject Math, English and
Science of ‘Ram’ and ‘Anil’ from a dataframe.
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science',
'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math' : 'Science','Ram' : 'Anil'])

OUTPUT:

Ram Anil Simple Anuj


Math 25 29 28 17
English 21 25 23 20
Science 23 18 26 23
Hindi 20 18 30 15
---------------------------------------------------
Ram Anil
Math 25 29
English 21 25
Science 23 18
Example 14: To display marks of subject Math, English and
Science of ‘Ram’, ‘Anil’ and ‘Anuj’ from a dataframe.
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15]], index=['Math', 'English', 'Science',
'Hindi'], columns = ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math' : 'Science', ['Ram', 'Anil', 'Anuj']])

OUTPUT:

Ram Anil Simple Anuj


Math 25 29 28 17
English 21 25 23 20
Science 23 18 26 23
Hindi 20 18 30 15
---------------------------------------------------
Ram Anil Anuj
Math 25 29 17
English 21 25 20
Science 23 18 23
2. Boolean indexing
In Boolean indexing, we can select the data based on the
actual values in the DataFrame rather than their row/column
labels. we can use conditions on column names to filter data
values.
Example 1: Who scored more than 25 marks in Math
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],
index=['Math','English','Science'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc['Math']>25)

OUTPUT:

Ram Anil Simple


Math 25 29 28
English 21 25 23
Science 23 18 26
---------------------------------------------------
Ram False
Anil True
Simple True
Name: Math, dtype: bool
Example 2: To check in which subjects ‘Anil’ has scored more
than 25
import pandas as pd
df = pd.DataFrame([[25, 29, 28], [21, 25, 23], [23, 18, 26]],
index=['Math','English','Science'], columns = ['Ram', 'Anil',
'Simple'])
print(df)
print("---------------------------------------------------")
print(df.loc[:,'Anil']>25)

OUTPUT:

Ram Anil Simple


Math 25 29 28
English 21 25 23
Science 23 18 26
---------------------------------------------------
Math True
English False
Science False
Name: Anil, dtype: bool
Merging of DataFrames
We can use the pandas.DataFrame.append() method to merge
two DataFrames. It appends rows of the second
DataFrame at the end of the first DataFrame. Columns not
present in the first DataFrame are added as new
columns. for example
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]],
index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil',
'Simple', 'Anuj'])
print(df)
print("-------------------------------------------------")
df1 = pd.DataFrame([[10, 12, 8, 7], [1, 5, 3, 2], [2, 1, 2, 2],[0,
1, 3, 5]], index=['R1', 'R2', 'R5', 'R6'], columns = ['Ram', 'Anil',
'Ravi', 'Ashish'])
print(df1)
print("-------------------------------------------------")
df = df.append(df1) #merging two data frames
print(df)

OUTPUT:

Ram Anil Simple Anuj


R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
R5 12 15 20 3
R6 23 12 16 30
-------------------------------------------------
Ram Anil Ravi Ashish
R1 10 12 8 7
R2 1 5 3 2
R5 2 1 2 2
R6 0 1 3 5
-------------------------------------------------
Ram Anil Simple Anuj Ravi Ashish
R1 25 29 28.0 17.0 NaN NaN
R2 21 25 23.0 20.0 NaN NaN
R3 23 18 26.0 23.0 NaN NaN
R4 20 18 30.0 15.0 NaN NaN
R5 12 15 20.0 3.0 NaN NaN
R6 23 12 16.0 30.0 NaN NaN
R1 10 12 NaN NaN 8.0 7.0
R2 1 5 NaN NaN 3.0 2.0
R5 2 1 NaN NaN 2.0 2.0
R6 0 1 NaN NaN 3.0 5.0
To get the column labels appear in sorted order we can set the
parameter sort=True. for example
df = df.append(df1, sort=True)
print(df)
The output of above code will be
Anil Anuj Ashish Ram Ravi Simple
R1 29 17.0 NaN 25 NaN 28.0
R2 25 20.0 NaN 21 NaN 23.0
R3 18 23.0 NaN 23 NaN 26.0
R4 18 15.0 NaN 20 NaN 30.0
R5 15 3.0 NaN 12 NaN 20.0
R6 12 30.0 NaN 23 NaN 16.0
R1 12 NaN 7.0 10 8.0 NaN
R2 5 NaN 2.0 1 3.0 NaN
R5 1 NaN 2.0 2 2.0 NaN
R6 1 NaN 5.0 0 3.0 NaN

NOTE: Observe the column names which are alphabetically


arranged
Attributes of DataFrames
Like Series, we can access certain properties called attributes
of a DataFrame. Some Attributes of Pandas DataFrame are
1. DataFrame.index: This attribute display all the row labels of
dataframe.
2. DataFrame.columns: This attribute display all the column
labels of the dataframe.
3. DataFrame.dtypes: This attribute display data type of each
column in the dataframe.
4. DataFrame.shape: This attribute display a tuple
representing the dimensions of the dataframe. In other words
it simply displays the number of rows and columns in the
dataframe.
5. DataFrame.size: This attribute simply displays total number
of values in the dataframe.
6. DataFrame.T: This attribute transpose the DataFrame.
Means, row indices and column labels of the DataFrame
replace each other’s position.
7. DataFrame.values: This attribute display a NumPy ndarray
having all the values in the DataFrame, without the axes
labels.
8. DataFrame.empty: This attribute returns the value True if
DataFrame is empty and False otherwise.
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15]], index=['R1', 'R2', 'R3', 'R4'], columns
= ['Ram', 'Anil', 'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.index)
print("---------------------------------------------------")
print(df.columns)
print("---------------------------------------------------")
print(df.dtypes)
print("---------------------------------------------------")
print(df.shape)
print("---------------------------------------------------")
print(df.size)
print("---------------------------------------------------")
print(df.T)
print("---------------------------------------------------")
print(df.values)
print("---------------------------------------------------")
print(df.empty)

OUTPUT:

Ram Anil Simple Anuj


R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
---------------------------------------------------
Index(['R1', 'R2', 'R3', 'R4'], dtype='object')
---------------------------------------------------
Index(['Ram', 'Anil', 'Simple', 'Anuj'], dtype='object')
---------------------------------------------------
Ram int64
Anil int64
Simple int64
Anuj int64
dtype: object
---------------------------------------------------
(4, 4)
---------------------------------------------------
16
---------------------------------------------------
R1 R2 R3 R4
Ram 25 21 23 20
Anil 29 25 18 18
Simple 28 23 26 30
Anuj 17 20 23 15
---------------------------------------------------
[[25 29 28 17]
[21 25 23 20]
[23 18 26 23]
[20 18 30 15]]
---------------------------------------------------
False
Methods of DataFrames
1. head( ): This method display the first n rows in the
DataFrame. If the parameter n is not specified by default it
gives the first 5 rows of the DataFrame. for example
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23], [20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]],
index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil',
'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.head(2)) #display first two rows
print("---------------------------------------------------")
print(df.head(1)) #display only first row
print("---------------------------------------------------")
print(df.head()) #display first five rows as value of n not
specified.
print("---------------------------------------------------")

OUTPUT:

Ram Anil Simple Anuj


R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
R5 12 15 20 3
R6 23 12 16 30
---------------------------------------------------
Ram Anil Simple Anuj
R1 25 29 28 17
R2 21 25 23 20
---------------------------------------------------
Ram Anil Simple Anuj
R1 25 29 28 17
---------------------------------------------------
Ram Anil Simple Anuj
R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
R5 12 15 20 3
---------------------------------------------------
2. tail( ): This method display the last n rows in the
DataFrame. If the parameter n is not specified by default it
gives the last 5 rows of the DataFrame. for example
import pandas as pd
df = pd.DataFrame([[25, 29, 28, 17], [21, 25, 23, 20], [23, 18,
26, 23],[20, 18, 30, 15], [12, 15, 20, 3], [23, 12, 16, 30]],
index=['R1', 'R2', 'R3', 'R4', 'R5', 'R6'], columns = ['Ram', 'Anil',
'Simple', 'Anuj'])
print(df)
print("---------------------------------------------------")
print(df.tail(2)) #display last two rows
print("---------------------------------------------------")
print(df.tail(3)) #display last three rows
print("---------------------------------------------------")
print(df.tail()) #display last five rows as value of n not
specified.
print("---------------------------------------------------")

OUTPUT:

Ram Anil Simple Anuj


R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
R5 12 15 20 3
R6 23 12 16 30
---------------------------------------------------
Ram Anil Simple Anuj
R5 12 15 20 3
R6 23 12 16 30
---------------------------------------------------
Ram Anil Simple Anuj
R4 20 18 30 15
R5 12 15 20 3
R6 23 12 16 30
---------------------------------------------------
Ram Anil Simple Anuj
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
R5 12 15 20 3
R6 23 12 16 30
---------------------------------------------------
Importing a CSV file to a DataFrame
In order to practice the code , you are suggested to create this
csv file using a spreadsheet and save in your computer by
name “data.csv”. (Save your file in the same folder where
python is installed in your computer or give complete path in
the code)
Rollno Name Class Sec
1 Anil X A
2 Anuj XI B
3 Ravi XII B
4 Ananya VI A
5 Sumit VI C
6 Deepak VIII D
7 Parth X A
We can load the data from the data.csv file into a DataFrame,
say “stud” using Pandas read_csv() function as shown below:
import pandas as pd
stud = pd.read_csv("data.csv", sep=",", header=0)
print(stud)

OUTPUT:

Rollno Name Class Sec


0 1 Anil X A
1 2 Anuj XI B
2 3 Ravi XII B
3 4 Ananya VI A
4 5 Sumit VI C
5 6 Deepak VIII D
6 7 Parth X A
Line by Line Explanation of above code
1. The first parameter to the read_csv() is the name of
the csv file along with its path.
2. The parameter sep specifies whether the values are
separated by comma, semicolon, tab, or any other
character. The default value for sep is a space.
3. header=0 implies that column names are inferred
from the first line of the file. By default, header=0.
We can exclusively specify column names using the parameter
names while creating the DataFrame using
the read_csv() function. For example
import pandas as pd
m = pd.read_csv("data.csv", sep=",", header=0, names=['Rno',
'S_Name', 'S_Class', 'Section'])
print(m)

OUTPUT:

Rno S_Name S_Class Section


0 1 Anil X A
1 2 Anuj XI B
2 3 Ravi XII B
3 4 Ananya VI A
4 5 Sumit VI C
5 6 Deepak VIII D
6 7 Parth X A
Exporting a Dataframe to a CSV file
We can use the to_csv() function to save a DataFrame to a csv
file. Let we have a dataframe named “df_stud” contains the
following data.
Ram Anil Simple Anuj
R1 25 29 28 17
R2 21 25 23 20
R3 23 18 26 23
R4 20 18 30 15
R5 12 15 20 3
R6 23 12 16 30
We want to store the data of “df_stud” in a csv file named
“data.csv”. For this we will write te following code
df_stud.to_csv(‘C:\Users\abc\Desktop\data.csv’, sep=’ , ‘)#path
will be according to your choice
The above code will create a file “data.csv” on the desktop.
When we open this file in any text editor or a spreadsheet, we
will find the above data along with the row labels and the
column headers, separated by comma.
In case we do not want the column names to be saved to the
file we may use the parameter header=False.
Another parameter index=False is used when we do not want
the row labels to be written to the file on disk. For example:
df_stud.to_csv(‘C:\Users\abc\Desktop\data.csv’, sep=’ , ‘,
header = False, index = False)
Difference between Pandas Series
and NumPy Arrays
Pandas Series NumPy Arrays

In series we can define our own labeled index to


NumPy arrays are accessed by their in
access elements of an array. These can be numbers
position using numbers only.
or letters.

The elements can be indexed in descending order The indexing starts with zero for the fi
also. element and the index is fixed.

If two series are not aligned, NaN or missing values


There is no concept of NaN values
are generated.

Series require more memory. NumPy occupies lesser memory.


SUMMARY

1. A DataFrame is a two-dimensional labeled data structure


like a spreadsheet. It contains rows and columns and therefore
has both a row and column index.

2. When using a dictionary to create a DataFrame, keys of the


Dictionary become the column labels of the DataFrame. A
DataFrame can be thought of as a dictionary of lists/ Series (all
Series/columns sharing the same index label for a row).

3. Data can be loaded in a DataFrame from a file on the disk by


using Pandas read_csv function.

4. Data in a DataFrame can be written to a text file on disk by


using the pandas.DataFrame.to_csv() function.

5. DataFrame.T gives the transpose of a DataFrame.

6. Pandas haves a number of methods that support label based


indexing but every label asked for must be in the index, or a
KeyError will be raised.
7. DataFrame.loc[ ] is used for label based indexing of rows in
DataFrames.

8. Pandas.DataFrame.append() method is used to merge two


DataFrames.

9. Pandas supports non-unique index values. Only if a


particular operation that does not support duplicate index
values is attempted, an exception is raised at that time.

You might also like