0% found this document useful (0 votes)
17 views16 pages

Cs Sem V Dav Upc 32347507 Sl. No. Qp. 4432 Dec '23

This document is a question paper for a Data Analysis and Visualization course, containing 16 printed pages with a total of 75 marks. It includes various programming tasks and questions related to Python, specifically focusing on data manipulation using libraries such as pandas and numpy. Candidates are instructed to answer specific questions, with some being compulsory, and to provide code outputs for given scenarios.

Uploaded by

Avi Chadha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views16 pages

Cs Sem V Dav Upc 32347507 Sl. No. Qp. 4432 Dec '23

This document is a question paper for a Data Analysis and Visualization course, containing 16 printed pages with a total of 75 marks. It includes various programming tasks and questions related to Python, specifically focusing on data manipulation using libraries such as pandas and numpy. Candidates are instructed to answer specific questions, with some being compulsory, and to provide code outputs for given scenarios.

Uploaded by

Avi Chadha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

[This question paper contains 16 printed pages.

Your Roll No .............. .

Sr. No. of Question Paper : 4432 G

Unique Paper Code 32347507

Name of the Paper Data Analysis and


Visualization

Name of the Course B.Sc. (Hons.) Computer


Science

Semester V

Duration : 3 Hours Maximum Marks : 75

Instructions for Candidates

1. Write your Roll No. on the top immediately on receipt


of this question paper.

2. Question No. 1 is compulsory.

3. Attempt any four questions out of Q.2 to Q.7.

4. Parts of a question must be answered together.

P.T.O.
4432 2
1. (a) Provide code to create a time-seri es with two

index labels- 2011/9/01 and 2011/9/0 2. Assign

random values. (2)

(b) What will be the output of the followin g

codes?

(i) (2)

import numpy as np
arr= np.arra y([[l,? ,3,4,5] , [6,7,8, 9,10]])
print(a rr[l,-1 ], arr[-1: ]) ••

(ii) (2)

List= [str[::-1 1 for str in ('happy', 'go','luc ky')l


print(Li st)

(c) Reshape the following array to dimension (2,6)

[[3,4,5,6], [7,8,9,10], [11,12,13,14]] (2)

(d) Python is a strongly "typed" language . Comment.

(2)
44 32 3
cod e and
(e) Giv e the out put of the fol low ing
te in the
ide ntif y the role of is_ uni que attr ibu
(2)
cod e.

im por t pan das as pd


S,6 ))
se rie s= pd .Se rie s([ {,, s,1 ,2, f,f ,/i,
pri nt( ser ies )
e)
pri nt( "Is Un iqu e: ",s eri es. is_ un iqu

(f) Dif fere ntia te bet wee n mu tab le


and imm uta ble
(2)
obj ects .

en dat afra me .
(g) Wr ite a pro gra m to cre ate the giv
(3)

i.d va l.u e
0 1 a
1 1 a
2 2 b
3 3 None
4 3 a
5 4 a
6 4 None
7 4 b

P.T .O.
4432 4

Further, split it into grou ps and coun t unique values


of 'valu e' colu mn.

(h) Prov ide the outp ut of the follo wing code : (3)

from date time imp ort date time , date , time


dt = date time (201 1, 10, 29, 20, 30, 21)
dt2 = date tirne (201 1, 11, 15, 22, 30)
del ta= dt2 - dt

prin t (del ta)


prin t(ty pe( delt a))
prin t(dt .rep lace (mi nute =0, seco nd= 0))

(i) Con side r the give n data fram e df cont ainin g data

of stud ents adm itted in the colle ge. (3)

Id N. . . Ag• Sect ion City c;.ncs.r Mark s


so Anit 10 A Gurgaon M 60
81 Alka 22 B Delh i F 80
Sid 13 C Mumbai M 60
S2
S3 Ruhi 21 B Delh i F 55
S4 Nehu 12 B Mumbai F 60
Geet 11 A Delh i F 56
S5
Om 17 A Mumbai M 45
S6

,I
4432 5
Set the first column 'Id' as the row index of the
given datafram e df Create a pivot table of df to
display the total number of admissio ns as per
'Section' and 'Gender' .

G) (i) Provide the output of the following code :

(4)

df = pd.DataF rame({
'a':np.ar ange(l,7 ),
'b':np.ar ange(7,1 3),
'c':np.ar ange(l2 ,18),
'd':np.ar ange(l7 ,23),
'e':np.ar ange(23 ,29),
'f':np.ar ange(29 ,35) },
columns =('a', 'b', 'c', 'd', 'e','f'] ,
index=[' Svaksh', 'Sarah', 'Svaraj' , 'Rivika' ,
'Rahul', 'Geet')) J
print (df)
df.iloc[ 2:4, [1,2] ]=np.NaN
print(df )

mapping = ( 'a ' : ' red' , 'b' : ' red' , ' c' : 'blue ' , ' d' :
'blue', 'e' : 'red', 'f' : 'orange' J

(ii) Using the above datafram e, group df by

mapping and find the sum.

P.T.O.
4432 6

(k) Consider the followin g dataset to perform the

following operation s : (4)

Age Section City Gender Favouri te_


color
0 10 A Gurgaon M red
l 22 B Delhi F NaN
2 13 C Mumbai F yellow
3 21 B Delhi M NaN
4 12 B Mumbai M black
5 11 A Delhi M green
6 17 A Mumbai F red

(i) Find all the raws where Age is greater


than or equal to 12 and the Gender is
male.

(ii) If Age is greater than 20, then use the loc

function to update Section with "S" and

City with Pune.

(iii) Select rows 1 to 2 with columns 2 to 3


4432 7
using iloc.

(L) What is the output of the following code: (4)

import pandas as pd
fruits=['ap ple','orang e', 'apple', 'apple']*2
N=len(frui ts)

print(N)
df=pd.Data Frame({'fr uit':fruits,' basket ID':
np.arange(N ), 'count' :
np.random. randint(3,1 5,size=N), 'weight':
np.randorn.u niform{0,4, size=N) })
print(df)
Fruit_cat= df['fruit'] .astype('ca tegory')
print(Fruit _cat)
print(df.dt ypes)

2. (a) Differentiate between :

(i) qcut and cut methods

(ii) Pandas.merge and pandas.concat (2)

P.T.O.
4432 8

(b) Cons ider the follo wing nume ric grad es (out of 4).

Form ulate bins for the give n grad es as per the

following condition : (3)

Belo w 2.5 Ver y bad


Betw een 2.5 to 3 Bad
Betw een 3 to 3.25 Ave rage
Betw een 3.25 to 3.5 Good
Betw een 3.5 to 3.75 Very good
Betw een 3.75 to 4 Exc elle nt

(c) Given the following dataframe, prov ide the outp ut


for the following commands : (5)

ord DO purc h amt ord date custo mer id


0 NaN NaN NaN NaN
1 NaN 270. 65 2012 -09-1 0 3001 . 0
,(
2 7000 2.0 65.2 6 NaN 3001 .0
.,/
I, 3 NaN NaN NaN NaN
4 NaN 948.5 0 2012 -09-1 0 3002 .0
/ 2012 -07-2 7 3001 .0
5 7000 5.0 2400 .60
v6 NaN 5760 .00 2012 -09-1 0 3001 . 0
7 7001 0.0 1983 .43 2012 -10-1 0 3004 .0 _.,
8 7000 3.0 2480 .40 2012 -10-1 0 3003 .0
9 7001 2.0 250. 45 2012 -06-2 7 3002 .0
10 NaN 75.2 9 2012 -08-1 7 3001 .0
NaN NaN NaN NaN
11
4432 9
(i) df.dropn a(thresh= 2)

(ii) df.dropn a(how='a ll')

(iii) df.dropn a(how=' all, axis= I)

(iv) df.isnull()

(v) df.isnull( ). values.an y()

3. (a) Write the code to read each row of a given csv

file. Skip the header of the file while reading.

Also print the number of rows and the field names.

(6)

(b) (i) Differen tiate between ffill and bfill. (4)

(ii) Provide the output of the given code :

import pandas as pd
obj3 = pd.Ser ies{['b lue', 'purple ', 'yellow '],
index=[O , 2, 4))
print(o bj3.rein dex{ran ge(6), method ='ffill') )
print{o bj3.rein dex{ran ge(6), method ='bfill' ))

P.T.O.
4432 10
4. (a) Consid er follow ing datafra me.

ord no purch - t ord date cuatoae r id aaleaaa n id


70009.0 890.00 2012-09 -11 3004.0 5001
70002.0 270.65 2012-09 -10 3001.0 5006
70007.0 65.26 2012-09 -11 3001.0 5005
70008.0 78.00 2012-09 -10 3002.0 5003
70006.0 948.50 2012-09 -17 3002.0 5002
70005.0 2400.60 2012-07 -27 3001. 0 5001
70004. 0 5760.00 2012-09 -10 3001.0 5003
70010.0 1983.43 2012-10 -10 3004.0 5006
70003.0 2480.40 2012-10 -10 3003.0 5005
70012.0 250.45 2012-06 -27 3002.0 5002
70034.0 75.29 :c>Ol2-08-17 3001.0 5004
70022.0 56.90 2012-06 -27 3003.0 5005

With respec t to the above datafra me, write the

code for the follow ing :

(i) Group the data on the column ord_da te

and calcula te the total purcha se amoun t

purch_ amt year wise and month wise.

(2)
4432 11
(ii) Group the data on the column customer_Id

and create a list of order date ord_date

for each group. (2)

(iii) Group on the columns customer_id,

salesman_id and then sort sum of

purch_amt within the groups. (2)

(b) (i) Write a generator function to print Fibonacci

numbers. (4)

(ii) What is the output of the following code :

def simpleGenerato rFunc():


yield 1
yield 2
x = simpleGenerato rFunc()

print (next (x))


print(next(x))

P.T.O.
4432 12
5. (a) Give output of the following code. Justify your
answer. (2)

var={l, 2, (3,4))
var[l]='geet'
print (var)

(b) Write the code to merge the two given datasets

using key 1, key2. (4)

datal:
keyl key2 p Q
0 KO KO PO QO
1 KO Kl Pl Ql
2 Kl KO P2 Q2
3 K2 Kl P3 Q3

data2:
keyl key2 a s
0 KO KO RO so
1 Kl KO Rl Sl
2 Kl KO R2 S2
3 K2 KO R3 S3
4432 13
(c) Write the code to split the given dataset into
groups based on customer_id and create a
list of order date ord_date for each group.
(4)

ord DO purch - t ord date customer id


0 70009.0 890.00 2012-09-11 3004.0
1 70002.0 270.65 2012-09-10 3001. 0
2 70007.0 65.26 2012-09-11 3001. 0
3 70008.0 78.00 2012-09-10 3002.0
4 70006.0 948.50 2012-09-17 3002.0
5 70005.0 2400.60 2012-07-27 3001. 0
6 70004.0 5760.00 2012-09-10 3001. :J
7 70010.0 1983. 43 2012-10-10 3004.0
8 70003.0 2480.40 2012-10-10 3003.0
9 70012.0 250.45 2012-06-27 3002.0
10 70034.0 75.29 2012-08-17 3001. 0
11 70022.0 56.90 2012-06-27 3003.0

6. (a) Create a Timeseries Dataframe with date range

01-02-2022 to 30-02-2022 with I min frequency

interval. The dataframe has two columns populated

with random values. (3)

P.T.O.
4432 14
(b) Identify the need to resample Timeseries data.
(2)

(c) Consider following dataset.

Datetiae va1uel value2 value3


2020-01. 01 00:00:00 2 92 56
2020-01. 01 00:01:00 9 78 80
2020-01. 01 00:02:00 69 83 43
2020-01. 01 00:03:00 47 62 45
2020-01. 01 00:04:00 47 90 13
-· ... ... ...
2020-02. 27 23:56:00 73 81 35
2020-02. 27 23:57:00 20 66 58
2020-02.2 7 23:58:00 42 16 48
2020-02. 27 23:59:00 32 40 19
2020-02. 28 00:00:00 37 63 95

83521 rows X 3
col'Ullllls

(i) Resample for 1Orn in with sum function for

value I, mean for value2 and max for

value 3. (3)
4432 15
(ii) Downsam ple data to 30s.
(2)

7. (a) Create a DataFram e of 8 rows and 8 columns

containi ng random integers in the range of I to


I 0. Compute the correlati on of each row with the
precedin g row.
(2)

(b) Consider the followin g table that lists the last week
Delhi's AQI.

AQ:I Date
67 2/10/20 22
79 3/10/20 22
80 4/10/20 22
90 5/10/20 22
99 6/10/20 22
110 7/10/20 22
112 8/10/20 22
140 9/10/20 22
165 10/10/2 022
178 11/10/2 022

(i) Plot a line graph showing AQI (Air Quality

Index) against date with line colour as red,

line width as "4pixels " and dashed line

style. (4)

P.T.O.
4432 16
(ii) Add title "Delhi AQI for last
ten days".

(l)

(iii) Se t lab el for x-a xis "D ate


" and y-axis
"A Ql ".
(1)

(iv) Sh ow gri ds in the backgroun


d. (1)

(v) Se t marker as '*' .


(1)

(1000)

You might also like