0% found this document useful (0 votes)
31 views24 pages

23HCS4142 PDF

This document is a practical file for a Data Analysis and Visualization course using Python at Deen Dayal Upadhyaya College. It includes various programming tasks using NumPy and Pandas, such as creating arrays, performing statistical analysis, handling missing values, and visualizing data with plots. The file also contains instructions for working with Excel files and the Iris dataset to demonstrate data manipulation and visualization techniques.

Uploaded by

Rohan Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views24 pages

23HCS4142 PDF

This document is a practical file for a Data Analysis and Visualization course using Python at Deen Dayal Upadhyaya College. It includes various programming tasks using NumPy and Pandas, such as creating arrays, performing statistical analysis, handling missing values, and visualizing data with plots. The file also contains instructions for working with Excel files and the Iris dataset to demonstrate data manipulation and visualization techniques.

Uploaded by

Rohan Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

COMPUTER SCIENCE 1

PRACTICAL FILE

OF

Data Analysis and


Visualization using python
(DSE:01)

Deen Dayal Upadhyaya College


(University of Delhi)
Sector-3, Dwarka · New Delhi-110078

Submitted To: Submitted By:


Prof. Arpita Sharma
Rohan Singh
Prof. Deepak Mittal
Roll no.-23HCS4142
(CS Department) BSC CS(H)
COMPUTER SCIENCE 2

PRACTICAL FILE
Q1. Write programs in Python using NumPy library to do the following:

a) Create a two-dimensional array, ARR1 having random values from 0 to 1. Compute the
mean, standard deviation, and variance of ARR1 along the second axis.
COMPUTER SCIENCE 3

b) Create a 2-dimensional array of size m x n integer elements, also print the shape, type and
data type of the array and then reshape it into an n x m array, where n and m are user
inputs given at the run time.

c) Test whether the elements of a given 1D array are zero, non-zero and NaN. Record the
indices of these elements in three separate arrays.
COMPUTER SCIENCE 4

d) Create three random arrays of the same size: Array1, Array2 and Array3. Subtract Array
2 from Array3 and store in Array4. Create another array Array5 having two times the
values in Array1. Find Covariance and Correlation of Array1 with Array4 and Array5
respectively.

e) Create two random arrays of the same size 10: Array1, and Array2. Find the sum of the
first half of both the arrays and product of the second half of both the arrays.
COMPUTER SCIENCE 5

f) Create an array with random values. Determine the size of the memory occupied by the
array.

g) Create a 2-dimensional array of size m x n having integer elements in the range (10,100).
Write statements to swap any two rows, reverse a specified column and store updated
array in another variable

Output
COMPUTER SCIENCE 6

Q2. Do the following using PANDAS Series:

a. Create a series with 5 elements. Display the series sorted on index and also sorted on
values separately

b. Create a series with N elements with some duplicate values. Find the minimum and
maximum ranks assigned to the values using ‘first’ and ‘max’ methods
COMPUTER SCIENCE 7

c. Display the index value of the minimum and maximum element of a Series

Q3. Create a data frame having at least 3 columns and 50 rows to store numeric dat
generated using a random function. Replace 10% of the values by null values whose
index positions are generated using random function.
Do the following:
COMPUTER SCIENCE 8

a. Identify and count missing values in a data frame

b. Drop the column having more than 5 null values.

c. Identify the row label having maximum of the sum of all values in a row and drop that
row.
COMPUTER SCIENCE 9

d. Sort the data frame on the basis of the first column.

e. Remove all duplicates from the first column.

f. Find the correlation between first and second column and covariance between second and
third column.

g. Discretize the second column and create 5 bins.


COMPUTER SCIENCE 10

Q4. Consider two excel files having attendance of two workshops, each of duration 5 days.
Each file has three fields ‘Name’, ‘Date, duration (in minutes) where names may be
repetitive within a file. Note that duration may take one of three values (30, 40, 50) only.
Import the data into two data frames and do the following:

a. Perform merging of the two data frames to find the names of students who had attended
both workshops.

b. Find names of all students who have attended a single workshop only.
COMPUTER SCIENCE 11

c. Merge two data frames row-wise and find the total number of records in the data frame.

d. Merge two data frames row-wise and use two columns viz. names and dates as multi-row
indexes. Generate descriptive statistics for this hierarchical data frame.
COMPUTER SCIENCE 12

Q5. Using Iris data, plot the following with proper legend and axis labels: (Download IRIS
data from: https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn datasets)

a. Load data into pandas’ data frame. Use pandas.info () method to look at the info on
datatypes in the dataset.

b. Find the number of missing values in each column (Check number of null values in a
column using df.isnull().sum())
COMPUTER SCIENCE 13

c. Plot bar chart to show the frequency of each class label in the data.

Output
COMPUTER SCIENCE 14

d. Draw a scatter plot for Petal Length vs Sepal Length and fit a regression line

Output

e. Plot density distribution for feature Petal width.


COMPUTER SCIENCE 15

Output

f. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.

Output
COMPUTER SCIENCE 16

g. Draw heatmap for any two numeric attributes


COMPUTER SCIENCE 17

Output

h. Compute mean, mode, median, standard deviation, confidence interval and standard error
for each numeric feature
COMPUTER SCIENCE 18

Output

i. Compute correlation coefficients between each pair of features and plot heatmap

Output
COMPUTER SCIENCE 19

Q6. Consider the following data frame containing a family name, gender of the family
member and her/his monthly income in each record.

a. Clean the data by dropping the column which has the largest number of missing values.

Output
COMPUTER SCIENCE 20

b. Find total number of passengers with age more than 30


c. Find total fare paid by passengers of second class
d. Compare number of survivors of each passenger class
e. Compute descriptive statistics for age attribute gender wise

Output

f. Draw a scatter plot for passenger fare paid by Female and Male passengers separately
COMPUTER SCIENCE 21

Output

g. Compare density distribution for features age and passenger fare


COMPUTER SCIENCE 22

Output

h. Draw the pie chart for three groups labelled as class 1, class 2, class 3 respectively
displayed in different colors. The occurrence of each group converted into percentage
should be displayed in the pie chart. Appropriately Label the chart.
COMPUTER SCIENCE 23

Output

i. Find % of survived passengers for each class and answer the question “Did class play a
role in survival?”

Q7. Consider the following data frame containing a family name, gender of the family
member and her/his monthly income in each record.

a. Calculate and display familywise gross monthly income.


b. Display the highest and lowest monthly income for each family name.
c. Calculate and display monthly income of all members earning income less than Rs.
80000.00.
d. Display total number of females along with their average monthly income.
e. Delete rows with Monthly income less than the average income of all members.
COMPUTER SCIENCE 24

Output

You might also like