WORKSHEET — Data Handling Using Pandas
V be the output of following code-
import pandas as pd
‘pd.Series([1,2,2,7,’Sachin’,77.5])
print(s1.headQ)
print(s1.head(3))
A
7
4 Sachin
atype: object
OF 1
te 2) @
22
dtype: object
Write a program in python to find maximum value over index in Data frame.
Ans:
# importing pandas as pd
import pandas as pd
| # Creating the dataframe
| df = pd.DataFrame({"A":[4, 5, 2, 6],
oB7-[, 2,5, 8),
‘C":[1, 8, 66, 4]})
# Print the dataframe
df
# applying idxmax() function,
dfidxmax(axis = 0)
What are the purpose of following statements-
1. df.columns
5, dfiloc[ : -4,
Ans:
1. It displays the names of columns of the Dataframe.
2. It will display all columns except the last 5 columns.
T]Pagei Tow index 2 to7,
It will display entire dataframe with all rows and columns,
Tewill display all rows except the last 4 four rome :
[Sanjeev
[Keshav
Rahul
[Accountant
Ans:
import pandas as pd
name=pd Series(['Sanjeev', Keshav’ Rahul'])
age=pd.Series([37,42,38])
designation=pd.Series([Manager’ 'Clerk’,'Accountant')
d1={Name':name,'Age':age,'Designation’:designation)
df=pd.DataFrame(d1)
print(df)
dfi=dfsort values(by="Age')
print(af1)
Write a python program to sort the following data according to descending
order of Name.
| Name Age Designation
Sanjeev [37 Manager
Keshav 42 Clerk
Rahul 38 Accountant
import pandas as pd
name=pd Series([‘Sanjeev’,'Keshav','Rahul'])
age=pd.Series([37,42,38])
designation=pd Series({'Manager','Clerk’,'Accountant'})
d1=('Name':name, Age’:age,'Designation':designation}
df=pd.DataFrame(d1)
print(df)www.pythondcsip.com
df2=dfsort_values(b:
print(df2)
'Name’ascending=0)
Which of the following thing can be data in Pandas?
1. A python dictionary
2. Annd array
3. A scalar value
4. All of above
Ans:
5. All the above
All pandas data structure are,
mutable,
Size, value
Semantic, size
Value, size
None of the above
mutable but not always
PONE
Ans:
3. Value,size
| What is the output of the following program?
Data and index in an nd array must be of same length-
1. True
2. False
Ans:
1. True
port pandas as pd
.d.DataFrame(index=[0,1,2,3,4,5],columns
print dff‘one’].sumQ)
‘one’,two'})
Ans:
It will produce an error.
10
1
| What will be the output of following code:
| Users.groupby(‘occupation’).age.mean()
1. Get mean age of occupation
2. Groups users by mean age
3, Groups user by age and occupation
4, None
Ans:
1. Get mean age of occupation
Which object do you get after reading a CSV file using pandas.
1. Dataframe
2. Nd array
3. Char Vector
‘ead_csv()?
31PageAns:
Ans:
4. None
1. Dataframe
What will be the output of df.iloc[3:7,3:6]?
Ans:
It will display the rows with index 3
dataframe ‘df
How to select the rows where wi
1. df[dfl‘age’].isnull)
2. dfaff'age’
3. dfldff'age”
4. None
4. None As the right answer is df[dff'age'].isnull()]
here age is missing?
to 6 and columns with index 3 to 5 ina
| Consider the following record in dataframe IPL
Player Team Category | BidPrice
Hardik Pandya _| Mumbai Indians Batsman_| 13
KL.Rahul Kings Eleven Batsman 12,
Andre Russel___| Kolkata Knight riders_| Batsman [7
Jasprit Bumrah | Mumbai Indians Bowler | 10
Virat Kohli RCB Batsman 17
Rohit Sharma | Mumbai Indians Batsman | 15
Retrieve first 2 and last 3 rows using python program.
Ans:
"Team':['Mumbai Indians’,'Kings Eleven’,’Kolkata Knight Riders’, Mumbai
Indians','RCB',;Mumbai Indians‘], =
"Category’:['Batsman','Batsman’,Batsman’,'Bowler’,'Batsman’,Batsman’] ,
"Bidprice':[13,12,7,10,17,15],
‘Runs':[1000,2400,900,200,3600,3700]}
df=pd.DataFrame(d)
print(df)
print(dfiloc[:2,:])
print(dfiloc{-3:,])
d={ Player":[Hardik Pandya','K L Rahul','AndreRussel’, Jasprit Bumrah’,'Virat
Kohli’, Rohit Sharma’],
Ans:
print(dffdf[ BidPrice'}==dil'BidPrice'].maxQ))
Write a command to Find most expensive Player.
Write a command to Print total players per team.
a] Pagewww.pythondcsip.com
Ans:
print(dfgroupby(‘Team’),Player.count())
17 | Write a command to Find player who had highest BidPrice from each team.
Ans:
ifgroupby("Team')
print(valf Player’ 'BidPrice'].max()
Write a command to Find average runs of each team.
Ans:
print(df.groupby({'Team']).Runs.mean(Q)
Write a command to Sort all players according to BidPrice.
Ans:
print(dfsort values(by="BidPrice'))
We need to define an index in pandas-
1. True
2. False
Ans:
2 False
Who is data scientist?
1. Mathematician
2. Statistician
3. Software Programmer
4, All of the above
Ans:
4 All the above
22 | What is the built-in database used for python?
1, Mysql
2. Pysqlite
3. Sqlite3
4, Pysqln
Ans:
3 Sqlite3
23 | How can you drop columns in python that contain NaN?
Ans:
dfi.dropna(axis=1)
BT Page25
26
www.pythondcsip.com
How can you drop all rows that contains NaN?
Ans:
dfi.dropna(axis=0)
ASeriesis___array, which is labelled and. type.
Ans:
One dimensional array, homogeneous
Minimum number of arguments we require to pass in pandas series =
a0:
PON
ene
Ans:
1.0
27
What we pass in data frame in pandas?
1. Integer
2. String
3. Pandas series
4. All
Ans:
4 All
How many rows the resultant data frame will have?
import pandas as pd
dfl=pd.DataFrame({‘key’:{'a’/b'/c’/’], ‘value’:[1,2,3,
if.merge(df2, on="key’, how="outer’)
onan
ce
2.
3.
4,
Ans:
4.6
29
How many rows the resultant data frame will have?
import pandas as pd
d.DataFrame({‘key’ value’:(1,2,3,4]})
=pd.DataFrame({‘key’:('a’'b’/e’,b’], ‘value’:[5,6,7,8]})
fl merge(df2, on='key’, how="inner’)
30
How many rows the resultant data frame will have?
S[Pageimport pandas as pd
dfl=pd.DataFrame({‘key’:['a’/b’,
df2=pd.DataFrame({‘key’:['a’
ars
a"), value’:[1,2,3,4)})
’e,’b'], ‘value’:
dfl.merge(df2, on="key’, how. ad cee
a
1
2.4
3.5
4.6
Ans:
2.4
How many rows the resultant data fr:
import pandas as pd
dfl=pd.DataFrame({‘key’:['a'/b
af
af
: “'d'), ‘value’[1,2,3,4]})
cl. DataFrame({‘key’:['a'/b'/e"/b'], ‘value’:[5,6,7,8]})
fl.merge(df2, on="key’, how='left’)
3
1
2.4
Sno)
4.6
3.5
series as a result.
is an interactive way to quickly summari:
ta frame will have?
method is used to delete the series and also return the
ee
ize large amount of data. |
sort_values() method.
Ans:
Inplace
|36 | Write a program in python to calculate the su
iven dataset-
‘E5:[45,55,78,95,99,971, ‘IP’:[87,89,98,94,78,77]
Ans:
‘CS:[45,55,78,95,99,97], ‘IP':[87,89,98,94,78,77] }
df=pd.DataFrame(d1)
print(df['cs'].sumQ)
Ans:
Pivoting g
[34 i Method is used to rename the existing indexes in a data frame,
Ans:
rename i
35 __Attribute that can prohibit to create a new data frame in
CS subject ina
ageWrite a python program to
the list given below-
{179,92}[86,96},{85,91,[80,99)}
Ans:
1=[110,20},[20,30},{30,40))
aF-pd DataFrame(),columns=['CS'/1p"))
print(df)
How you can find the total number of rows and columns in a data frame.
Ans:
df.shape
[MaxTemp_ _[Mintemp [ety [RainFall
ciate [S0raaaae 7 Delk iam 256
__| Guwahati 415
Ha
[48 a SEs Chennai 368
32 Bangluru 40.2
aa —}Mumbai 5
[Ease 7s] JalpuranaheeEma faa}
Consider the above data frame as df-
1. Write command to compute sum of every column of the data frame.
Ans:
print(df.sum(axis=0)) ie
| Based on the above data frame df, Write a command to compute mean of
column MaxTemp.
Ans:
| Print(df[’MaxTemp']mean())
Based on the above data frame df, Write a command to compute average
MinTemp, RainFall for first 4 rows.
Ans:
af{{'Mintemp:,'RainfallJI:4].mean()
Which method is used to read the data from MySQL database through Data
Frame?
Ans:
read_sql_query()
Which method is used to perform a query in MySQL through Data Frame? _
Ans:
execute()
What will be the output of following code?
BlPacewww.python4csip.com
import pandas as pd
df= pd.DataFrame([45,50,41,56}
print(df.iloc{True])
| index = [True, False, True, False])
Ans:
It will display error message like- Ca
y - Cannot index by location index with a ni
key because iloc accept only integer Index, | os on nex WHR 8 non
Write a program in python to join two data frame.
Ans:
xiia={'sub:{'eng’,'mat, ‘ip’ phy'che id':['302''041'/065',042',043''044"]}
xiie=('sub’:['eng’'mat’ ip’ , '55',056',057']}
dfl=pd.DataFrame(xiia)
print(df1)
df2=pd.DataFrame(xiic)
print(df2)
print(dfL.merge(df2,on
print(df1.merge(df2,on
What is a Series? Explain with the help of an example,
[_CEMPT'Salary']=Sal
Pandas Series is a one-dimensional labeled array capable of holding data of any
type (integer, string, float, python objects etc.). The axis labels are collectively called.
index,
import pandas as pd
data =pd.Series((1,2,3,4,5))
print(datAns:
Hitesh wants to display the last four rows of the dataframe df and has written
the following code:
df.tail)
But last 5 rows are being displayed. Identify the error and rewrite the correct
code so that last 4 rows get displayed. i
If tail) doesn’t receive any argument, then by default last 5 rows will be
displayed. Correct Code is
df.tail(4) 4
rite the command to add a new column in the last place(3rd place) named |
alary” from the list of values, Sal=[10000,15000,20000] in an existing
dataframe named EMP, assume already having 2 columns.
Consider the following python code and write the
outpu
import pandas as pd
d.series([2-4,6,8,10,12,
14)
print(icquantite({0.50,0.75))
0.75 11.0
‘Write a small python code to drop a row from dataframe labeled as 0.
df=dk.drop(0)
‘What is Pivoting? Name any two functions of Pandas which support pivoting.
Pivoting is a technique to quickly summarize large amount of data so that data can
be viewed in a different perspective. Pivot table in pivoting can be used to apply
aggregate function like-count.
STPagewww.pythondcsip.com.
ons Tor pivoting are: pivot) and pivot tablet}
wi ite a vien code to create a dataframe with appropriate headings from the
t given below:
sit ‘Amy’, 70], ['S102', 'Risha’, 69], ['S104’, ‘Susan’, 75], [('S105','George',
import pandas as pd —
L-US101'Amy',70], ['S102:,'Risha’,69], ['S104',Susan’,75], ['S105',George',82]]
d PPE Neat ihe A],columns=['ID',Name’,'Points'])
pun
Consider the following dataframe, and answer the questions given below:
import pandas as pd
df= pd.DataFrame({“Quarter1":[2000, 4000, 5000, 4400, 10000],
“Quarter2":[5800, 2500, 5400, 3000, 2900],
"Quarter3":[20000, 16000, 7000, 3600, 8200],
"Quarter4":]1400, 3700, 1700, 2000, 6000]})
Write the code to find mean value from above dataframe df over the index and
column axis. (Skip NaN value)
print(dfimean(axis= Tue))
| print(dfimean(axis=1,skipna=True))
“Use sum() function to find the sum ofall the values over the index ax
print(dfsum(axis=0) i
he median of the dataframe df._
~print(dfimedian0)
120),('a': 6,"b
me(data,column
pd.DataFrame(data,columns:
print(df1:
print (df2 |
ab
0 10 20
1 632
abl
0 10 NaN
1 6NaN
mark
150 30
451 20
302 20
703 50
import pandas as pd
x1=[[10,150],[40,451],[15,302},[40,703]] :
.d.DataFrame(x1,columns=['mark1,'mark2"])
{[30,20},[20,25},[20,30},{5.30]]
df2=pd.DataFrame(x2,columns=[mark1'/mark2
") print(aft)
print(df2)
ToTPageo add dataframes dfi and df
print(dfl.add(df2y)_
To subtract df2 from dfi
print(dFi.sub(df2))
To change index label of dfi from 0 to zero and from 1 to one,
dfl=dfi rename(index={0'zero,1;one})
What will be the output of the following python
code?
import pandas as pd
d=(‘Student’:['Ali’,'Al ‘Tom','Tom'],
‘House’:[/Red’,Red’ Blue','Blue'],
‘Points':[50,70,60,80]}
df =pd.DataFrame(d)
df f.pivot_table(index='Student',columns='House',values='Points' aggfun Ss
um’)
print(df1)
House Blue Red
Student
Ali NaN 120.0
Tom 140.0 NaN
For the given code fill in the blanks so that we get the desired output with
maximum value for Quantity and Average Value for Cost:
import pandas as pd
import numpy as np
‘Apple’ 'Pear’,'Banana’,'Grapes'],'Quantity':[100,150,200,250],
‘Cost':[1000,1500,1200,900]}
df = pd.DataFrame(d)
Quantity 250.0
Cost 1150.0
dtype: float64
dfl=pd.DataFrame(dfl Quantity’].max(),dif'Cost’].mean()],index=['Quantity’ Cost'})
Find Output for the following program code:import pandas as pd
dfl=pd.DataFrame( Teecream':{'Van; a’ ButterScotch’,Caramel'),
‘Oreo'}})
DairyMilk’,’
Hide and Seek,'Britannia’})
df2.reindex like(df1)
print(afa)
“Cookies':['Goodday";Britannia’
df2=pd.DataFrame({‘Chocolate’. [
Kitkat']Icecream':['Vanila',Butterscote
h],'Cookies': =
Cookies
Hide and Seek
Britannia
Chocolate Teecream
0 DairyMilk Vanila
1 Kitkat Butterscotch
A dictionary Smarks contains the following data:
‘rashmi’,harsh’,’priya’],’grade':[‘A1//A2’/1']}
Write a statement to create DataFrame called df.
Assume that pandas has been imported as pd.
df=pd.DataFrame(Smarks,index=[1,2,3])
andas, Sis a series with the following
resul
S=pd.Series([5,10,15,20,25])
The series object is automatically indexed as 0,1,2,3,4. Write a statement to
assign the series as a, b, c, d,e index expli
1d. Series([5,10,15,20,25],index
66.
Write python statement to delete the 3rd and Sth rows from dataframe df
dfi=dF.drop(index=[2,4] axis=0)
or,
dfl=dfdrop([2,4))
Given the two dataframes df1 and df2 as given below:
dfl d2
[First [Second [Thir ] ]First | Secon | Third
Air | la
7 [a7__|14 B
a [as [35 14
za fay [as
6 [eons oa aaa
Write the commands to do the following on the dataframe:
To add dataframes df1 and df2.
print(dfl.add(di2))——Mebythondcsip.com
descending order. -————_____
"To display those r Ws
| Print(@R [ANTE thie |>asy—
Consider the following dataframe: student af
Name class marks
Anamay XI 95
Aditi XI 82
Mehak XI 65
Kriti XI 45
Write a statement to get the minimum value of the column marks
print(student dif Marks'].min()
Write a small python code to add a row to a dataframe.
import pandas as pd :
student_df=pd.DataFrame({'Name’:['Ananmay’,'Aditi;/Mehak’,'Kriti'] Class’
XI',XI'] 'Marks’:[95,82,65,45]},index=[1,2,3,4])
data=('Name':'Sohail’,
newstd=pd.DataFrame(data,inde
student _df-studen "
Jitesh wants to sort a DataFrame df, He has written the following code.
df=pd.DataFrame({"a":[13, 24, 43, 4],"b"[51, 26, 37, 48]})
print(df)
df.sort_values(‘a’)
print(df)
He is getting an output which is showing original DataFrame and not the sorted
DataFrame. Identify the error and suggest the correction so that the sorted
DataFrame is printed.
The possible reason is that the original dataframe is not
modified. The correct answer is:
df.sort_values(‘a’inplace=True) i
Write a command to display the name of the company and the highest car price
from DataFrame having data about cars.
import pandas as pd
‘Name':['Innova’
0,650000}}
df=pd.DataFrame(car,index=[1,2,3,4])
print(dffdf Price==df,Price.max()])
vera','Royal’,'Scorpio'],'Price':[300000,800000,25000
Write a command in python to Print the total number of records in the
| DataFrame. = saa
print(arT.count0)
BIPageConsider a DataFrame “ar created 1;
exam_data = {‘name':
[Anastasia’, ‘Dima’, ‘Katherine’
‘Kevin’, ‘Jonas'],
°, 16.5, np.NaN, 9, 20,14.5, np.NaN, 8, 19],
‘attempts’ : [1, 3, 2,3, 2,3, 1,1, 2,4],
' ‘James’, ‘Emily’,
"Michael’,"Matthew’, ‘Lara’,
‘score’: [12.5,
no','yes', ‘no’, ‘no’, 'yes', ‘yes’, ‘no’, ‘no’, 'yes'}}
the rows having NaN values.
€ a command to create a pivot table based on “qualify” column and display |
_sum of the score and attempt columns.
print(dFpivot-table(column: lues=['se« empts'Jaggfunc="s
tempts Tagghunc='sum'))
jents who have qualified.
mmand to change the indices to ‘zero’,/one’two’,three’ and ‘four’
ely.
——epythondcsip.com
the questions given belo ng the dictionary given below, answer —
af=dfrename(index=(0;"Zero"1"One\2! Two" 3! Three)
Write command to compute mean of every column of the data frame. _
print(afmean(axis=0))
Write command to add one more row to the data frame with data [5,12,33,3]
{coll 5, ‘col2": 12, ‘col3": 33, "colW’:3)
dfappend(df2, ignore index=True)—_wwi
‘wwpythondcsip.com
Dept
Tr
Finance
aul aaat io
30 | Tg
140 —[ Ruchi_| RD —|~17000
onsider the above Data fr
Write a Python Code to c
and the Contract em
int(dE-groupby(Statu:
Contract
ame as di
alculate the average salary of the Regular employees
ees separately,
Wr
ite a Python Code to print the dataframe in the descending order of Salary.
fsort_values(by="Salary’,ascending=False
print(df) : _
| Write a Python Code to update the Salary of all Contract employees to RS
00
dfSalary[df-Status=="Contract ]=19000
| Write a Python Code to count the total number of employees in each —
department.
print(dfgroupby(‘Dept).count().Name) ae
"| Write a Python Code to display the maximum salary of the “Contract” staff.
print(df[dil Status" ]=="Contract'].max() Salary)
“Write a Python Code to display the 4" Record. —
«| Print(dfiloc[3:4:])
88. | Write a Python Code to delete the column Status.
del df['Status’]
9, | Write a Python Code to display the
‘IT’ department.
| print(df[df.Dept==IT'].max() Salary)
Write a Python Code to delete the 1SCand the last record.
di=didrop((0.4y)
“Consider a dataframe as follows
Aare
1.5691 13
2 -29 -63 34
Gunite a Python Code to : Replace all negative numbers with 0
dffdf<0)=0
Count the number of elements which are greater than 50
BTPage: ___—___waiiisytonacsiprea ts ae
Print{@[alSS0].countQ.sam) — —
Se ee
"Write Python ¢ number of
mumbai 7 code to taframunt the number of even numbers and number oredd |
Print('No of Even Numbers: di[dP%: Peay
Print('No of Odd Numbers:,dfldMo3=-1}count(paany
‘Consider the above data frame af.
State
[125600 | Deki]
"235600_| Tamil Nad
213400 [Kerala
Er [A SS000 [raat Haryana
| 456000 West Bengal |
172000 Haryana
[Kerala
Write Python Program to create the above dataframe.
import pandas as pd
data={‘employee':['Sahay','George' 'Priya’ Manila’ Raina’ /Manila’,'Priya'],
‘Sales':[125600,235600,213400,189000,456000,172000,201400],
‘Quarter’:[1,1,1,1,1,2,2],'State':[’Delhi'"TamilNadu’, Kerala’ Haryana’,‘West
Bengal’'Haryana',Kerala'}}
df=pd.DataFrame(data)
print(dA)
‘Write Python Program to find total sales per stat
print(dfgroupby('State).sum()-Sales)
Write Python Program to find total sales per employee.
print(dfgroupby( employee')-sum()-Sales)
"Write Python Program to find average sales on both employee and state wi:
print(dfgroupby(I employee’, State']).sum().Sales)
Write Python Program to find mean,median and minimum sale statewise.
99.
_|_print(dfgroupby(’State’).min().Sales)
Fint(di.groupby(’State’).mean().Sales)
Pinar eroupbyt State’) ‘median().Sales)
Write Python Program to find maximum sales quarter-wise.
print(dfgroupby(‘Quarter’).max() Sales)
Write Python Program to create a Pivot Table with State as the index, Sales as
the values and calculating the maximum Sales in each State.
print(df pivot table(index= State’ values="Sales'aggrunc="max’))
TeTPage