Dataframe
Dataframe
Problem statement: Create the following dataframe ‘Sport’ containing sport wise marks for
five students. Use 2D dictionary to create dataframe.
Student Sport Marks
I Jai Cricket 80
II Raj Football 76
III John Tennis 89
IV Karan Kabaddi 92
V Chandu Hockey 97
Solution:
Source Code:
import pandas as pd
D = {'student':['jai','raj','john','karan','chandu'],
'sport':['cricket','football','tennis','kabaddi','hockey'],
'marks':[80,76,89,92,97]}
sport = pd.DataFrame(D, ['I','II','III','IV','V'])
print(sport)
Screenshot:
Output
Problem statement: Create a dataframe from list containing dictionaries of most
economical bike with its name and rate of three companies. Company name should be the
row labels.
Solution:
Source Code:
import pandas as pd
L1 = {'Name':'Sports','Cost':60000}
L2 = {'Name':'Discover','Cost':62000}
L3 = {'Name':'splendor','Cost':63000}
Bike = [L1,L2,L3]
df = pd.DataFrame(Bike, ['TVS','Bajaj','Hero'])
print(df)
Screenshot:
Output
Problem statement: Consider two series object staff and salaries that stores the number of
people in various office branches and salaries distributed in these branches respectively.
Write a program to create another Series object that stores average salary per branch and
then create a dataframe object from these Series object.
After creating dataframe rename all row labels with Branch name.
Solution:
Source Code:
import pandas as pd
staff = pd.Series([20,24,30,18])
salary = pd.Series([240000,336000,450000,270000])
avg = salary/staff
org = {'Employees':staff,'Amount':salary,'Average':avg}
df = pd.DataFrame(org)
print("Without Row Label")
print(df)
df.index = ['sale','store','marketing','maintenence']
print("With Row Label")
print(df)
Screenshot:
Output
Problem statement: Create the following dataframe ‘sales’ containing year wise sales figure
for five sales persons in INR. Use the year as column labels, and sales person names as row
labels.
2014 2015 2016 2017
Madhu 1000 2000 2400 2800
Kusum 1500 1800 5000 6000
Kinshuk 2000 2200 7000 7000
Ankit 3000 3000 1000 8000
Shruti 4000 4500 1250 9000
Source Code:
import pandas as pd
D = {2014:[1000,1500,2000,3000,4000],2015:[2000,1800,2200,3000,4500],
2016:[2400,5000,7000,1000,1250],2017:[2800,6000,7000,8000,9000]}
sale = pd.DataFrame(D,['Madhu','Kusum','Kinshuk','Ankit','Shruti'])
print("----DataFrame ---")
print(sale)
print("----Row Labels----")
print(sale.index)
print("----Column Labels ----")
print(sale.columns)
print("----Bottom two Rows ----")
print(sale.tail(2))
print("----Top two Rows ----")
print(sale.head(2))
Screenshot:
Output
Problem statement: Create a dataframe ‘sales2’ using dictionary as given below and write a
program to append ‘sales2’ to the dataframe ‘sales’ created in previous practical 14.
2018
Madhu 1600
Kusum 1100
Kinshuk 5000
Ankit 3400
Shruti 9000
Solution:
Source Code:
import pandas as pd
D = {2014:[1000,1500,2000,3000,4000],2015:[2000,1800,2200,3000,4500],
2016:[2400,5000,7000,1000,1250],2017:[2800,6000,7000,8000,9000]}
sale = pd.DataFrame(D,['Madhu','Kusum','Kinshuk','Ankit','Shruti'])
print("----DataFrame ---")
print(sale)
sale2 =
pd.DataFrame({2018:[1600,1100,5000,5400,9000]},['Madhu','Kusum','Kinshuk
','Ankit','Shruti'])
print(sale2)
sale = sale.join(sale2)
print(sale)
Screenshot:
Output
Problem statement: Create a dataframe ‘cloth’ as given below and write program to do
followings:
Check ‘cloth’ is empty or not
Change ‘cloth’ such that it becomes its transpose
Display no of rows and columns of ‘cloth’
Count and display Non NA values for each column
Count and display Non NA values for each row
CName Size Price
C1 Jeans L 1200
C2 Jeans XL 1350
C3 Shirt XL 900
C4 Trouser L 1000
C5 T-Shirt XL 600
Solution:
Source Code:
import pandas as pd
D = {'CName':['Jeans','Jeans','Shirt','Trouser','T-Shirt'],
'Size':['L','XL','XL','L','XL'],
'Price':[1200,1350,900,1000,600]}
cloth = pd.DataFrame(D)
print("----Dataframe ---")
print(cloth)
print("----checking dataframe is empty or not ----")
if cloth.empty:
print("Cloth is Empty")
else:
print("Cloth is not Empty")
print("----Transpose Dataframe ----")
print(cloth.T)
print("----Total no of rows and columns ----")
print(cloth.shape)
print("----No of Non NA elements in each column ----")
print(cloth.count())
print("----No of Non NA elements in each row ----")
print(cloth.count(1))
Screenshot:
Output
Problem statement: Create a dataframe ‘cloth’ as given below and write program to do
followings:
Change the name of ‘Trouser’ to ‘pant’ and Jeans to ‘Denim’
Increase price of all cloth by 10%
Rename all the indexes to [C001, C002, C003, C004, C005]
Delete the data of C3 (C003) from the ‘cloth’
Delete size from ‘cloth’
CName Size Price
C1 Jeans L 1200
C2 Jeans XL 1350
C3 Shirt XL 900
C4 Trouser L 1000
C5 T-Shirt XL 600
Solution:
Source Code:
import pandas as pd
D = {'CName':['Jeans','Jeans','Shirt','Trouser','T-Shirt'],
'Size':['L','XL','XL','L','XL'],
'Price':[1200,1350,900,1000,600]}
cloth = pd.DataFrame(D,['C1','C2','C3','C4','C5'])
print("----Dataframe ---")
print(cloth)
print("----Change name of Trouser to Pant ----")
cloth.at['C4','CName']='Pant'
#cloth.CName['C4']='Pant'
#cloth.loc['C4','CName']='Pant'
print(cloth)
print("----Increase price of all cloth by 10% ----")
cloth['Price'] = cloth['Price'] + cloth['Price']*10/100
print(cloth)
print("----Rename all the indexes to [C001,C002,C003,C004,C005] ----")
cloth.rename(index = {'C1':'C001','C2':'C002','C3':'C003'
,'C4':'C004','C5':'C005'},inplace = 'True')
print(cloth)
print("----Delete the data of C003 ----")
cloth = cloth.drop(['C003'])
print(cloth)
print("----Delete column 'Size' ----")
del cloth['Size']
print(cloth)
Screenshot:
Output
Problem statement: Create a dataframe ‘aid’ as given below and write program to do
followings:
1. Display the books and shoes only
2. Display toys only
3. Display quantity in MP and CG for toys and books.
4. Display quantity of books in AP
Toys Books Shoes
MP 7000 4300 6000
UP 3400 3200 1200
AP 7800 5600 3280
CG 4100 2000 3000
Solution:
Source Code:
import pandas as pd
D = {'Toys':{'MP':7000,'UP':3400,'AP':7800,'CG':4100},
'Books':{'MP':4300,'UP':3200,'AP':5600,'CG':2000},
'Shoes':{'MP':6000,'UP':1200,'AP':3280,'CG':3000},}
aid = pd.DataFrame(D)
print('----DataFrame ---')
print(aid)
print('----Display the books and shoes only ----')
print(aid.loc[:,['Books','Shoes']])
print('----Display toys only ----')
print(aid['Toys'])
print('----Display quantity in MP and CG for toys and books ----')
print(aid.loc[['MP','CG'],['Toys','Books']])
print('----Display quantity of books in AP ----')
print(aid.at['AP','Books'])
Screenshot:
Output
Problem statement: Create a dataframe ‘aid’ as given below and write program to write
the values of ‘aid’ to a comma separated file ‘aidfigures.csv’ on the disk. Do not write the
row labels and column labels.
Toys Books Shoes
MP 7000 4300 6000
UP 3400 3200 1200
AP 7800 5600 3280
CG 4100 2000 3000
Solution:
Source Code:
import pandas as pd
D = {'Toys':{'MP':7000,'UP':3400,'AP':7800,'CG':4100},
'Books':{'MP':4300,'UP':3200,'AP':5600,'CG':2000},
'Shoes':{'MP':6000,'UP':1200,'AP':3280,'CG':3000},}
aid = pd.DataFrame(D)
print(aid)
aid.to_csv(path_or_buf = 'C:/sample/aidfigures.csv',
header = False, index = False)
Screenshot:
Output
Problem statement: Read the data in the file ‘aidfigure.csv’ into a dataframe ‘aidretrieved’
and display it. Now update the row labels and column labels of ‘aidretrieved’ to be the
same as that of ‘aid’ of practical 19.
Toys Books Shoes
MP 7000 4300 6000
UP 3400 3200 1200
AP 7800 5600 3280
CG 4100 2000 3000
Solution:
Source Code:
import pandas as pd
aidretrieved = pd.read_csv('C:/sample/aidfigures.csv',
names=['Toys','Books','Shoes'],)
aidretrieved.index = ['MP','UP','AP','CG']
print(aidretrieved)
Screenshot:
Output