0% found this document useful (0 votes)
549 views

ccs346 Eda

NOTES AND QUESTION BANK
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
549 views

ccs346 Eda

NOTES AND QUESTION BANK
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

C2 023

COURSE OBJECTIVES:
 To outline an overview of exploratory data analysis.
 To implement data visualization using Matplotlib.
 To perform univariate data exploration and analysis.
 To apply bivariate data exploration and analysis.
 To use Data exploration and visualization techniques for multivariate and time series
data.

UNIT I EXPLORATORY DATA ANALYSIS 6


EDA fundamentals – Understanding data science – Significance of EDA – Making
sense of data – Comparing EDA with classical and Bayesian analysis – Software
tools for EDA - Visual Aids for EDA- Data transformation techniques-merging
database, reshaping and pivoting, Transformation techniques.

UNIT II EDA USING PYTHON 6


Data Manipulation using Pandas – Pandas Objects – Data Indexing and Selection –
Operating on Data – Handling Missing Data – Hierarchical Indexing – Combining
datasets – Concat, Append, Merge and Join – Aggregation and grouping – Pivot
Tables – Vectorized String Operations.

UNIT III UNIVARIATE ANALYSIS 6


Introduction to Single variable: Distribution Variables - Numerical Summaries of
Level and Spread - Scaling and Standardizing – Inequality.

UNIT IV BIVARIATE ANALYSIS 6


Relationships between Two Variables - Percentage Tables - Analysing
Contingency Tables - Handling Several Batches - Scatterplots and Resistant Lines.

UNIT V MULTIVARIATE AND TIME SERIES ANALYSIS 6


Introducing a Third Variable - Causal Explanations - Three-Variable Contingency
Tables and Beyond – Fundamentals of TSA – Characteristics of time series data –
Data Cleaning – Time-based indexing – Visualizing – Grouping – Resampling.
30 PERIODS

PRACTICAL EXERCISES:

30 PERIODS
1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
2. Perform exploratory data analysis (EDA) with datasets like email data set.
Export all your emails as a dataset, import them inside a pandas data
frame, visualize them and get different insights from the data.
3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply
various plot features in R on sample data sets and visualize.
5. Perform Time Series Analysis and apply the various visualization techniques.
6. Perform Data Analysis and representation on a Map using various Map
data sets with Mouse Rollover effect, user interaction, etc..

Course Outcomes:

At the end of this course, the students will be able to:

1. Understand the fundamentals of exploratory data analysis.


2. Implement the data visualization using Matplotlib.
3. Perform univariate data exploration and analysis.
4. Apply bivariate data exploration and analysis.
5. Use Data exploration and visualization techniques for multivariate and time series data.

Text Books:
1. Suresh Kumar Mukhiya, Usman Ahmed, Hands-On Exploratory Data Analysis with Python,
Packt Publishing, 2020. (Unit 1)
2. Jake Vander Plas, “Python Data Science Handbook: Essential Tools for Working with Data”,
First Edition, O Reilly, 2017. (Unit 2)
3. Catherine Marsh, Jane Elliott, Exploring Data: An Introduction to Data Analysis for Social
Scientists, Wiley Publications, 2nd Edition, 2008. (Unit 3,4,5)

You might also like