[This question paper contains 16 printed pages.
Your Roll No .............. .
Sr. No. of Question Paper : 4432 G
Unique Paper Code 32347507
Name of the Paper Data Analysis and
Visualization
Name of the Course B.Sc. (Hons.) Computer
Science
Semester V
Duration : 3 Hours Maximum Marks : 75
Instructions for Candidates
1. Write your Roll No. on the top immediately on receipt
of this question paper.
2. Question No. 1 is compulsory.
3. Attempt any four questions out of Q.2 to Q.7.
4. Parts of a question must be answered together.
P.T.O.
4432 2
1. (a) Provide code to create a time-seri es with two
index labels- 2011/9/01 and 2011/9/0 2. Assign
random values. (2)
(b) What will be the output of the followin g
codes?
(i) (2)
import numpy as np
arr= np.arra y([[l,? ,3,4,5] , [6,7,8, 9,10]])
print(a rr[l,-1 ], arr[-1: ]) ••
(ii) (2)
List= [str[::-1 1 for str in ('happy', 'go','luc ky')l
print(Li st)
(c) Reshape the following array to dimension (2,6)
[[3,4,5,6], [7,8,9,10], [11,12,13,14]] (2)
(d) Python is a strongly "typed" language . Comment.
(2)
44 32 3
cod e and
(e) Giv e the out put of the fol low ing
te in the
ide ntif y the role of is_ uni que attr ibu
(2)
cod e.
im por t pan das as pd
S,6 ))
se rie s= pd .Se rie s([ {,, s,1 ,2, f,f ,/i,
pri nt( ser ies )
e)
pri nt( "Is Un iqu e: ",s eri es. is_ un iqu
(f) Dif fere ntia te bet wee n mu tab le
and imm uta ble
(2)
obj ects .
en dat afra me .
(g) Wr ite a pro gra m to cre ate the giv
(3)
i.d va l.u e
0 1 a
1 1 a
2 2 b
3 3 None
4 3 a
5 4 a
6 4 None
7 4 b
P.T .O.
4432 4
Further, split it into grou ps and coun t unique values
of 'valu e' colu mn.
(h) Prov ide the outp ut of the follo wing code : (3)
from date time imp ort date time , date , time
dt = date time (201 1, 10, 29, 20, 30, 21)
dt2 = date tirne (201 1, 11, 15, 22, 30)
del ta= dt2 - dt
prin t (del ta)
prin t(ty pe( delt a))
prin t(dt .rep lace (mi nute =0, seco nd= 0))
(i) Con side r the give n data fram e df cont ainin g data
of stud ents adm itted in the colle ge. (3)
Id N. . . Ag• Sect ion City c;.ncs.r Mark s
so Anit 10 A Gurgaon M 60
81 Alka 22 B Delh i F 80
Sid 13 C Mumbai M 60
S2
S3 Ruhi 21 B Delh i F 55
S4 Nehu 12 B Mumbai F 60
Geet 11 A Delh i F 56
S5
Om 17 A Mumbai M 45
S6
,I
4432 5
Set the first column 'Id' as the row index of the
given datafram e df Create a pivot table of df to
display the total number of admissio ns as per
'Section' and 'Gender' .
G) (i) Provide the output of the following code :
(4)
df = pd.DataF rame({
'a':np.ar ange(l,7 ),
'b':np.ar ange(7,1 3),
'c':np.ar ange(l2 ,18),
'd':np.ar ange(l7 ,23),
'e':np.ar ange(23 ,29),
'f':np.ar ange(29 ,35) },
columns =('a', 'b', 'c', 'd', 'e','f'] ,
index=[' Svaksh', 'Sarah', 'Svaraj' , 'Rivika' ,
'Rahul', 'Geet')) J
print (df)
df.iloc[ 2:4, [1,2] ]=np.NaN
print(df )
mapping = ( 'a ' : ' red' , 'b' : ' red' , ' c' : 'blue ' , ' d' :
'blue', 'e' : 'red', 'f' : 'orange' J
(ii) Using the above datafram e, group df by
mapping and find the sum.
P.T.O.
4432 6
(k) Consider the followin g dataset to perform the
following operation s : (4)
Age Section City Gender Favouri te_
color
0 10 A Gurgaon M red
l 22 B Delhi F NaN
2 13 C Mumbai F yellow
3 21 B Delhi M NaN
4 12 B Mumbai M black
5 11 A Delhi M green
6 17 A Mumbai F red
(i) Find all the raws where Age is greater
than or equal to 12 and the Gender is
male.
(ii) If Age is greater than 20, then use the loc
function to update Section with "S" and
City with Pune.
(iii) Select rows 1 to 2 with columns 2 to 3
4432 7
using iloc.
(L) What is the output of the following code: (4)
import pandas as pd
fruits=['ap ple','orang e', 'apple', 'apple']*2
N=len(frui ts)
print(N)
df=pd.Data Frame({'fr uit':fruits,' basket ID':
np.arange(N ), 'count' :
np.random. randint(3,1 5,size=N), 'weight':
np.randorn.u niform{0,4, size=N) })
print(df)
Fruit_cat= df['fruit'] .astype('ca tegory')
print(Fruit _cat)
print(df.dt ypes)
2. (a) Differentiate between :
(i) qcut and cut methods
(ii) Pandas.merge and pandas.concat (2)
P.T.O.
4432 8
(b) Cons ider the follo wing nume ric grad es (out of 4).
Form ulate bins for the give n grad es as per the
following condition : (3)
Belo w 2.5 Ver y bad
Betw een 2.5 to 3 Bad
Betw een 3 to 3.25 Ave rage
Betw een 3.25 to 3.5 Good
Betw een 3.5 to 3.75 Very good
Betw een 3.75 to 4 Exc elle nt
(c) Given the following dataframe, prov ide the outp ut
for the following commands : (5)
ord DO purc h amt ord date custo mer id
0 NaN NaN NaN NaN
1 NaN 270. 65 2012 -09-1 0 3001 . 0
,(
2 7000 2.0 65.2 6 NaN 3001 .0
.,/
I, 3 NaN NaN NaN NaN
4 NaN 948.5 0 2012 -09-1 0 3002 .0
/ 2012 -07-2 7 3001 .0
5 7000 5.0 2400 .60
v6 NaN 5760 .00 2012 -09-1 0 3001 . 0
7 7001 0.0 1983 .43 2012 -10-1 0 3004 .0 _.,
8 7000 3.0 2480 .40 2012 -10-1 0 3003 .0
9 7001 2.0 250. 45 2012 -06-2 7 3002 .0
10 NaN 75.2 9 2012 -08-1 7 3001 .0
NaN NaN NaN NaN
11
4432 9
(i) df.dropn a(thresh= 2)
(ii) df.dropn a(how='a ll')
(iii) df.dropn a(how=' all, axis= I)
(iv) df.isnull()
(v) df.isnull( ). values.an y()
3. (a) Write the code to read each row of a given csv
file. Skip the header of the file while reading.
Also print the number of rows and the field names.
(6)
(b) (i) Differen tiate between ffill and bfill. (4)
(ii) Provide the output of the given code :
import pandas as pd
obj3 = pd.Ser ies{['b lue', 'purple ', 'yellow '],
index=[O , 2, 4))
print(o bj3.rein dex{ran ge(6), method ='ffill') )
print{o bj3.rein dex{ran ge(6), method ='bfill' ))
P.T.O.
4432 10
4. (a) Consid er follow ing datafra me.
ord no purch - t ord date cuatoae r id aaleaaa n id
70009.0 890.00 2012-09 -11 3004.0 5001
70002.0 270.65 2012-09 -10 3001.0 5006
70007.0 65.26 2012-09 -11 3001.0 5005
70008.0 78.00 2012-09 -10 3002.0 5003
70006.0 948.50 2012-09 -17 3002.0 5002
70005.0 2400.60 2012-07 -27 3001. 0 5001
70004. 0 5760.00 2012-09 -10 3001.0 5003
70010.0 1983.43 2012-10 -10 3004.0 5006
70003.0 2480.40 2012-10 -10 3003.0 5005
70012.0 250.45 2012-06 -27 3002.0 5002
70034.0 75.29 :c>Ol2-08-17 3001.0 5004
70022.0 56.90 2012-06 -27 3003.0 5005
With respec t to the above datafra me, write the
code for the follow ing :
(i) Group the data on the column ord_da te
and calcula te the total purcha se amoun t
purch_ amt year wise and month wise.
(2)
4432 11
(ii) Group the data on the column customer_Id
and create a list of order date ord_date
for each group. (2)
(iii) Group on the columns customer_id,
salesman_id and then sort sum of
purch_amt within the groups. (2)
(b) (i) Write a generator function to print Fibonacci
numbers. (4)
(ii) What is the output of the following code :
def simpleGenerato rFunc():
yield 1
yield 2
x = simpleGenerato rFunc()
print (next (x))
print(next(x))
P.T.O.
4432 12
5. (a) Give output of the following code. Justify your
answer. (2)
var={l, 2, (3,4))
var[l]='geet'
print (var)
(b) Write the code to merge the two given datasets
using key 1, key2. (4)
datal:
keyl key2 p Q
0 KO KO PO QO
1 KO Kl Pl Ql
2 Kl KO P2 Q2
3 K2 Kl P3 Q3
data2:
keyl key2 a s
0 KO KO RO so
1 Kl KO Rl Sl
2 Kl KO R2 S2
3 K2 KO R3 S3
4432 13
(c) Write the code to split the given dataset into
groups based on customer_id and create a
list of order date ord_date for each group.
(4)
ord DO purch - t ord date customer id
0 70009.0 890.00 2012-09-11 3004.0
1 70002.0 270.65 2012-09-10 3001. 0
2 70007.0 65.26 2012-09-11 3001. 0
3 70008.0 78.00 2012-09-10 3002.0
4 70006.0 948.50 2012-09-17 3002.0
5 70005.0 2400.60 2012-07-27 3001. 0
6 70004.0 5760.00 2012-09-10 3001. :J
7 70010.0 1983. 43 2012-10-10 3004.0
8 70003.0 2480.40 2012-10-10 3003.0
9 70012.0 250.45 2012-06-27 3002.0
10 70034.0 75.29 2012-08-17 3001. 0
11 70022.0 56.90 2012-06-27 3003.0
6. (a) Create a Timeseries Dataframe with date range
01-02-2022 to 30-02-2022 with I min frequency
interval. The dataframe has two columns populated
with random values. (3)
P.T.O.
4432 14
(b) Identify the need to resample Timeseries data.
(2)
(c) Consider following dataset.
Datetiae va1uel value2 value3
2020-01. 01 00:00:00 2 92 56
2020-01. 01 00:01:00 9 78 80
2020-01. 01 00:02:00 69 83 43
2020-01. 01 00:03:00 47 62 45
2020-01. 01 00:04:00 47 90 13
-· ... ... ...
2020-02. 27 23:56:00 73 81 35
2020-02. 27 23:57:00 20 66 58
2020-02.2 7 23:58:00 42 16 48
2020-02. 27 23:59:00 32 40 19
2020-02. 28 00:00:00 37 63 95
83521 rows X 3
col'Ullllls
(i) Resample for 1Orn in with sum function for
value I, mean for value2 and max for
value 3. (3)
4432 15
(ii) Downsam ple data to 30s.
(2)
7. (a) Create a DataFram e of 8 rows and 8 columns
containi ng random integers in the range of I to
I 0. Compute the correlati on of each row with the
precedin g row.
(2)
(b) Consider the followin g table that lists the last week
Delhi's AQI.
AQ:I Date
67 2/10/20 22
79 3/10/20 22
80 4/10/20 22
90 5/10/20 22
99 6/10/20 22
110 7/10/20 22
112 8/10/20 22
140 9/10/20 22
165 10/10/2 022
178 11/10/2 022
(i) Plot a line graph showing AQI (Air Quality
Index) against date with line colour as red,
line width as "4pixels " and dashed line
style. (4)
P.T.O.
4432 16
(ii) Add title "Delhi AQI for last
ten days".
(l)
(iii) Se t lab el for x-a xis "D ate
" and y-axis
"A Ql ".
(1)
(iv) Sh ow gri ds in the backgroun
d. (1)
(v) Se t marker as '*' .
(1)
(1000)