Data Analysis and Visualization (VisionPapers - In)
Data Analysis and Visualization (VisionPapers - In)
Bachelor of Engineering
Subject Code: 3161613
Semester – VI
Subject Name: Data Analysis and Visualization
Prerequisite: NA
Rationale: Data Analytics involves data discovery that helps in making smart decisions, creating
suggestions for options based on previous choices. Data visualization sees the pattern in data and also sees
the pattern when data is not part of pattern.
Content:
Sr. Content Total Marks
No. Hrs Weight
age
(%)
1 Math, probability and statistical modeling 05 20
Introducing clustering basics, identifying clusters, Categorizing data with Random forest
algorithm
3 Modeling instances 06 15
Recognizing the Difference between Clustering and Classification, Making sense of data
with nearest neighbor analysis, classifying data with average nearest neighbor algorithms,
classifying data with K- nearest neighbor algorithms, Solving Real-world problems
4 Principles of Data Visualization Design 05 20
Data visualization: The big three, Designing to meet the needs, Picking the most
appropriate Design style, Choosing how to add context, Selecting the appropriate Data
Graphic Type, Choosing a Data Graphic, Using D3.js for Data Visualization.
5 Web based applications for visualization design, Exploring best practices in Dashboard 06 20
Design, Making maps from spatial data
Page 1 of 4
w.e.f. AY 2018-19
GUJARAT TECHNOLOGICAL UNIVERSITY
Bachelor of Engineering
Subject Code: 3161613
6 Data science for driving growth in E-commerce 03 10
Books
1) Data science for Dummies by Lillian Pierson WILEY publication
2) Doing Data Science by Cathy O'Neil, Rachel Schutt , O'Reilly Media, Inc.
3) Data Analytics for Beginners: Basic Guide to Master Data Analytics Paperback –by Paul
Kinley
1) https://www.analyticsvidhya.com/
List of Practical:
1. Prepare synthetic data set for student data, consisting of Enrollment number, name, gender,
semester wise, subject wise marks, difficulty level of the subject, SPI(Semester Index) , address
with geographical location.
a.
(i) Write a program to find correlation between gender and Semester marks.
(ii) Write a program to find correlation between geographical location and semester marks.
Analyze which two are highly correlated.
b. Write a program to calculate correlation between difficulty level and subject marks. The higher
the difficulty level the marks should be less. The two should be negatively correlated.
Analyze the correlation.
Page 2 of 4
w.e.f. AY 2018-19
GUJARAT TECHNOLOGICAL UNIVERSITY
Bachelor of Engineering
Subject Code: 3161613
2. Consider the sample of 50 students. Gather the university exam score of the students across all
semesters of Engineering for one college. Write a program to find out mean and standard deviation
for this college. Now consider the sample of students of different colleges of Gujarat for university
exam score. Write a program to find out mean and standard deviation. Write the observations.
3. Collect the month wise COVID cases data for cities – Ahmedabad, Vadodara, Rajkot,Surat. Plot
this time series Data. Analyze the trend as per time.
4. There is a need to advice the 12th standard students that which college he/she should choose for
engineering education. Decide the features to use for grading the engineering college. Prepare the
data set. Write a program to apply random forest algorithm and suggest the best suited college for
12th standard students.
Page 3 of 4
w.e.f. AY 2018-19
GUJARAT TECHNOLOGICAL UNIVERSITY
Bachelor of Engineering
Subject Code: 3161613
Write a program for KNN algorithm to find out weight lifting category for height 161cm and weight 61kg.
6. Take the data of the students prepared in exercise 1. Visualize the data to show region wise results,
branch wise results, subject wise results. Decide the visualization technique to show appropriate
data.
bar chart, pie chart, maps, scatter plot
Page 4 of 4
w.e.f. AY 2018-19