0% found this document useful (0 votes)
21 views15 pages

DAV Manual

The document is a laboratory manual for a Data Analysis and Visualization course at the Ahmedabad Institute of Technology, detailing various experiments and exercises. It includes tasks related to data correlation, statistical analysis, COVID case trends, college selection using machine learning, and data visualization techniques. The manual also outlines evaluation criteria for each experiment and provides theoretical background for the exercises.

Uploaded by

darshandarji1222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views15 pages

DAV Manual

The document is a laboratory manual for a Data Analysis and Visualization course at the Ahmedabad Institute of Technology, detailing various experiments and exercises. It includes tasks related to data correlation, statistical analysis, COVID case trends, college selection using machine learning, and data visualization techniques. The manual also outlines evaluation criteria for each experiment and provides theoretical background for the exercises.

Uploaded by

darshandarji1222
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1

Ahmedabad Institute of Technology


IT Department
Data Analysis and Visualization

(3161613)

Laboratory Manual

NAME

ENROLLMENT NUMBER

BATCH

YEAR

Subject Faculty: Head of Department


Dr. Shital Patel Dr. Ashish Chaurasiya(HOD IT)
2

Ahmedabad Institute of Technology


IT Department

CERTIFICATE
This is to certify that Mr. / Ms._________________________________ of
Enrolment No:_________________________ has Satisfactorily completed the
course in_________________ as by the Gujarat Technological University
for_______________ Year (B.E.) semester of_______ Information Technology in
the Academic year_______________.
Date of Submission:- ________________

Dr. Shital Patel Dr. Ashish Chaurasiya


(Head Of Department)
Signature: _________________
3

INDEX

Sr.No. Experiment Date Signature


Page No.

From To

Prepare synthetic data set for student data,


1(a). consisting of Enrollment number, name, gender,
semester wise, subject wise marks, difficulty level of
the subject, SPI(Semester Index) , address with
geographical location. a. (i) Write a program to find
correlation between gender and Semester marks. (ii)
Write a program to find correlation between
geographical location and semester marks. Analyze
which two are highly correlated.
Write a program to calculate correlation between
1(b).
difficulty level and subject marks. The higher the
difficulty level the marks should be less. The two
should be negatively correlated. Analyze the
correlation.

2. Consider the sample of 50 students. Gather the


university exam score of the students across all
semesters of Engineering for one college. Write a
program to find out mean and standard deviation
for this college. Now consider the sample of
students of different colleges of Gujarat for
university exam score. Write a program to find out
mean and standard deviation. Write the
observations.

3. Collect the month wise COVID cases data for cities


– Ahmedabad, Vadodara, Rajkot,Surat. Plot this
time series Data. Analyze the trend as per time.

There is a need to advice the 12th standard students


4. that which college he/she should choose for
engineering education. Decide the features to use
for grading the engineering college. Prepare the
data set. Write a program to apply random forest
4

algorithm and suggest the best suited college for


12th standard students.

5 Consider the following data set.


Write a program for KNN algorithm to find out
weight lifting category for height 161cm and weight
61kg.

Take the data of the students prepared in exercise 1.


6 Visualize the data to show region wise results,
branch wise results, subject wise results. Decide the
visualization technique to show appropriate data.
bar chart, pie chart, maps, scatter plot
5

7
Use D3.js to show following. (i) Take year wise
population. (ii) Show appropriate size circle for
population as per year. (iii) Fill color in circle. (iv)
Prepare bar chart and pie chart. (v) Explore other
functionality of D3.js
6

EXPERIMENT NO: 1 DATE: / /

TITLE: Prepare synthetic data set for student data, consisting of Enrollment
number, name, gender, semester wise, subject wise marks, difficulty level of the
subject, SPI(Semester Index) , address with geographical location. a. (i) Write a
program to find correlation between gender and Semester marks. (ii) Write a
program to find correlation between geographical location and semester marks.
Analyze which two are highly correlated.

Theory:
Many Statistical and machine learning approaches take the assumption that our
features are unrelated.
To determine whether they are independent, we must first assess their correlation,
or the degree to which variables are interdependent.
The value of a variable which ranges from -1 to 1, is used to measure correlation.
The greater the correlation between two variables, the closer the r value is to 1 or
-1.
If the r-value of two variables is near to 0, it indicates that they are independent
variables.

Exercise:
1. Write a program to find correlation between gender and Semester marks.
2. Write a program to find correlation between geographical location and semester
marks. Analyze which two are highly correlated.
7

EXPERIMENT NO: 1(b)

TITLE: Write a program to calculate correlation between difficulty level and


subject marks. The higher the difficulty level the marks should be less. The two
should be negatively correlated. Analyze the correlation.

Theory:

The Pearson correlation coefficient (r) is the most common way of measuring a
linear correlation. It is a number between –1 and 1 that measures the strength and
direction of the relationship between two variables.
The Pearson correlation coefficient is a descriptive statistic, meaning that it
summarizes the characteristics of a dataset. Specifically, it describes the strength
and direction of the linear relationship between two quantitative variables.
Spearman’s Rank Correlation is a statistical measure of the strength and direction
of the monotonic relationship between two continuous variables.

Exercise:
1. Write a program to calculate correlation between difficulty level and subject
marks.

EVALUATION:

Timely
Involvement (4) Understanding /
Completion Total
Problem solving (3)
(3) (10)

Signature with date:


8

EXPERIMENT NO: 2 DATE: / /

TITLE: Consider the sample of 50 students. Gather the university exam score of
the students across all semesters of Engineering for one college. Write a program
to find out mean and standard deviation for this college. Now consider the sample
of students of different colleges of Gujarat for university exam score. Write a
program to find out mean and standard deviation. Write the observations.

Theory:
The mean is the sum of all the entries divided by the number of entries. For
example, if we have a list of 5 numbers [1,2,3,4,5], then the mean will be
(1+2+3+4+5)/5 = 3.
Standard deviation is a measure of the amount of variation or dispersion of a set
of values. We first need to calculate the mean of the values, then calculate the
variance, and finally the standard deviation.
Steps to Calculate Mean:
● Take the sum of all the entries.
● Divide the sum by the number of entries. Steps to Calculate Standard
Deviation:
● Calculate the mean as discussed above. The mean of [1, 2, 3, 4, 5] is 3.
● Calculate variance for each entry by subtracting the mean from the value of
the entry. So variance will be [-2, -1, 0, 1, 2].
● Then square each of those resulting values and sum the results. For the above
example, it will become 4+1+0+1+4=10.
● Then divide the result by the number of data points minus one. This will give
the variance. So variance will be 10/(5-1) = 2.5
● The square root of the variance (calculated above) is the standard deviation.
So standard deviation will be sqrt(2.5) = 1.5811388300841898.
Exercise:
1. Write a program to find out mean and standard deviation.

EVALUATION:
Understanding / Timely Total
Involvement (4) Problem solving (3) Completion (3) (10)

Signature with date:


9

EXPERIMENT NO: 3 DATE: / /

TITLE: Collect the month wise COVID cases data for cities – Ahmedabad, Vadodara,
Rajkot,Surat. Plot this time series Data. Analyze the trend as per time.

Theory:
Time series analysis is a specific way of analyzing a sequence of data points collected
over an interval of time. In time series analysis, analysts record data points at
consistent intervals over a set period of time rather than just recording the data points
intermittently or randomly. However, this type of analysis is not merely the act of
collecting data over time.
What sets time series data apart from other data is that the analysis can show how
variables change over time. In other words, time is a crucial variable because it shows
how the data adjusts over the course of the data points as well as the final results. It
provides an additional source of information and a set order of dependencies between
the data.
Time series analysis typically requires a large number of data points to ensure
consistency and reliability. An extensive data set ensures you have a representative
sample size and that analysis can cut through noisy data. It also ensures that any
trends or patterns discovered are not outliers and can account for seasonal variance.
Additionally, time series data can be used for forecasting—predicting future data
based on historical data.

Exercise:
1. Plot this time series Data. Analyze the trend as per time.

EVALUATION:
Understanding / Timely
Involvement (4) Problem solving (3) Completion Total
(3) (10)

Signature with date:


10

EXPERIMENT NO: 4 DATE: / /

TITLE:There is a need to advice the 12th standard students that which college he/she
should choose for engineering education. Decide the features to use for grading the
engineering college. Prepare the data set. Write a program to apply random forest
algorithm and suggest the best suited college for 12th standard students.

Theory:
Random Forest algorithm is a powerful tree learning technique in Machine Learning.
It works by creating a number of Decision Trees during the training phase. Each tree
is constructed using a random subset of the data set to measure a random subset of
features in each partition. This randomness introduces variability among individual
trees, reducing the risk of overfitting and improving overall prediction performance.
In prediction, the algorithm aggregates the results of all trees, either by voting (for
classification tasks) or by averaging (for regression tasks) This collaborative decision-
making process, supported by multiple trees with their insights, provides an example
stable and precise results. Random forests are widely used for classification and
regression functions, which are known for their ability to handle complex data, reduce
overfitting, and provide reliable forecasts in different environments.
11

Exercise:

1. Write a program to apply random forest algorithm and suggest the best suited
college for 12th standard students.

EVALUATION:
Understanding / Timely
Involvement (4) Problem solving (3) Completion Total
(3) (10)

Signature with date:


12

EXPERIMENT NO: 5 DATE: / /

TITLE: Take the data of the students prepared in exercise 1. Visualize the data to
show region wise results, branch wise results, subject wise results. Decide the
visualization technique to show appropriate data. bar chart, pie chart, maps, scatter
plot

Theory:
Bar chart: A bar chart or bar graph is a chart or graph that presents categorical data with
rectangular bars with heights or lengths proportional to the values that they represent.

pie chart: A pie chart shows how a total amount is divided between levels of a
categorical variable as a circle divided into radial slices.

Map: A map is a symbolic representation of selected characteristics of a place, usually


drawn on a flat surface. Maps present information about the world in a simple, visual
way.
13

Scatter plot: A scatter plot (aka scatter chart, scatter graph) uses dots to represent
values for two different numeric variables. The position of each dot on the horizontal
and vertical axis indicates values for an individual data point. Scatter plots are used
to observe relationships between variables.

Exercise:

1. Visualize the data to show region wise results, branch wise results, subject wise
results.
EVALUATION:
Understanding / Timely
Involvement (4) Problem solving (3) Completion Total
(3) (10)

Signature with date:


14

EXPERIMENT NO: 6 DATE: / /

TITLE: Consider the following data set. Write a program for KNN algorithm to find
out weight lifting category for height 161cm and weight 61kg.

Exercise:
1. Write a program for KNN algorithm to find out weight lifting category for height
161cm and weight 61kg

EVALUATION:

Understanding / Timely
Total (10)
Involvement (4) Problem solving (3) Completion (3)

Signature with date:


15

EXPERIMENT NO: 7 DATE: / /

TITLE: Use D3.js to show following. (i) Take year wise population. (ii) Show
appropriate size circle for population as per year. (iii) Fill color in circle. (iv) Prepare
bar chart and pie chart. (v) Explore other functionality of D3.js

Theory:

D3.js (also known as D3, short for Data-Driven Documents) is a JavaScript library for
producing dynamic, interactive data visualizations in web browsers. It makes use of
Scalable Vector Graphics (SVG), HTML5, and Cascading Style Sheets (CSS) standards.
It is the successor to the earlier Protovis framework. Its development was noted in
2011, as version 2.0.0 was released in August 2011. With the release of version 4.0.0 in
June 2016, D3 was changed from a single library into a collection of smaller, modular
libraries that can be used independently.

Exercise:
1. Take year wise population.
2. Show appropriate size circle for population as per year.
3. Fill color in circle.
4. Prepare bar chart and pie chart.
5. Explore other functionality of D3.js

EVALUATION:
Understanding / Timely
Involvement (4) Problem solving (3) Completion Total
(3) (10)

Signature with date:

You might also like