Unit 3 Notes

Exploratory data analysis (EDA) refers to analyzing data sets to understand their key characteristics and relationships. The goals of EDA include data cleaning, descriptive statistics, data visualization, feature engineering, identifying correlations and relationships, data segmentation, hypothesis generation, and data quality assessment. There are different types of EDA, including univariate analysis of single variables, bivariate analysis of relationships between pairs of variables, multivariate analysis of interactions between multiple variables, and time series analysis of data with a temporal component. EDA techniques help explore data, discover patterns, and gain insights to inform further formal statistical analysis or modeling.

Uploaded by

patilamrutak2003

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views

Unit 3 Notes

Uploaded by

patilamrutak2003

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Typical data format and the types of EDA,

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) refers to the method of studying and exploring
record sets to apprehend their predominant traits, discover patterns, locate
outliers, and identify relationships between variables. EDA is normally carried
out as a preliminary step before undertaking extra formal statistical analyses or
modeling.

The Foremost Goals of EDA

1. Data Cleaning: EDA involves examining the information for errors, lacking
values, and inconsistencies. It includes techniques including records imputation,
managing missing statistics, and figuring out and getting rid of outliers.
2. Descriptive Statistics: EDA utilizes precise records to recognize the
important tendency, variability, and distribution of variables. Measures like
suggest, median, mode, preferred deviation, range, and percentiles are usually
used.
3. Data Visualization: EDA employs visual techniques to represent the
statistics graphically. Visualizations consisting of histograms, box plots, scatter
plots, line plots, heatmaps, and bar charts assist in identifying styles, trends,
and relationships within the facts.
4. Feature Engineering: EDA allows for the exploration of various variables
and their adjustments to create new functions or derive meaningful insights.
Feature engineering can contain scaling, normalization, binning, encoding
express variables, and creating interplay or derived variables.
5. Correlation and Relationships: EDA allows discover relationships and
dependencies between variables. Techniques such as correlation analysis,
scatter plots, and pass-tabulations offer insights into the power and direction of
relationships between variables.
6. Data Segmentation: EDA can contain dividing the information into
significant segments based totally on sure standards or traits. This
segmentation allows advantage insights into unique subgroups inside the
information and might cause extra focused analysis.
7. Hypothesis Generation: EDA aids in generating hypotheses or studies
questions based totally on the preliminary exploration of the data. It facilitates
form the inspiration for in addition evaluation and model building.
8. Data Quality Assessment: EDA permits for assessing the nice and reliability
of the information. It involves checking for records integrity, consistency, and
accuracy to make certain the information is suitable for analysis.

Types of EDA

Depending on the number of columns we are analyzing we can divide EDA into
two types.
EDA, or Exploratory Data Analysis, refers back to the method of analyzing and
analyzing information units to uncover styles, pick out relationships, and gain
insights. There are various sorts of EDA strategies that can be hired relying on
the nature of the records and the desires of the evaluation. Here are some not
unusual kinds of EDA:
1. Univariate Analysis: This sort of evaluation makes a speciality of analyzing
character variables inside the records set. It involves summarizing and
visualizing a unmarried variable at a time to understand its distribution, relevant
tendency, unfold, and different applicable records. Techniques like histograms,
field plots, bar charts, and precis information are generally used in univariate
analysis.
2. Bivariate Analysis: Bivariate evaluation involves exploring the connection
between variables. It enables find associations, correlations, and dependencies
between pairs of variables. Scatter plots, line plots, correlation matrices, and
move-tabulation are generally used strategies in bivariate analysis.
3. Multivariate Analysis: Multivariate analysis extends bivariate evaluation to
encompass greater than variables. It ambitions to apprehend the complex
interactions and dependencies among more than one variables in a records set.
Techniques inclusive of heatmaps, parallel coordinates, aspect analysis, and
primary component analysis (PCA) are used for multivariate analysis.
4. Time Series Analysis: This type of analysis is mainly applied to statistics
sets that have a temporal component. Time collection evaluation entails
inspecting and modeling styles, traits, and seasonality inside the statistics
through the years. Techniques like line plots, autocorrelation analysis,
transferring averages, and ARIMA (AutoRegressive Integrated Moving Average)
fashions are generally utilized in time series analysis.
5. Missing Data Analysis: Missing information is a not unusual issue in
datasets, and it may impact the reliability and validity of the evaluation. Missing
statistics analysis includes figuring out missing values, know-how the patterns
of missingness, and using suitable techniques to deal with missing data.
Techniques along with lacking facts styles, imputation strategies, and sensitivity
evaluation are employed in lacking facts evaluation.
6. Outlier Analysis: Outliers are statistics factors that drastically deviate from
the general sample of the facts. Outlier analysis includes identifying and
knowledge the presence of outliers, their capability reasons, and their impact at
the analysis. Techniques along with box plots, scatter plots, z-rankings, and
clustering algorithms are used for outlier evaluation.
7. Data Visualization: Data visualization is a critical factor of EDA that entails
creating visible representations of the statistics to facilitate understanding and
exploration. Various visualization techniques, inclusive of bar charts,
histograms, scatter plots, line plots, heatmaps, and interactive dashboards, are
used to represent exclusive kinds of statistics.
These are just a few examples of the types of EDA techniques that can be
employed at some stage in information evaluation. The choice of strategies
relies upon on the information traits, research questions, and the insights
sought from the analysis.

OBJECTIVES OF EXPLORATORY DATA ANALYSIS

The objectives of exploratory data analysis include, but not limited to:

1. identifying data outliers,

2. identifying trends in time and space,
3. detecting patterns of interest,
4. generating hypotheses,
5. opening opportunities for new ways to collect data, and
6. enabling hypothesis testing through experiments.

TYPES OF EXPLORATORY DATA ANALYSIS:

1. Univariate Non-graphical
2. Multivariate Non-graphical
3. Univariate graphical
4. Multivariate graphical
1. Univariate Non-graphical: this is the simplest form of data analysis as
during this we use just one variable to research the info. The standard goal of
univariate non-graphical EDA is to know the underlying sample distribution/
data and make observations about the population. Outlier detection is
additionally part of the analysis. The characteristics of population distribution
include:
 Central tendency: The central tendency or location of distribution has got
to do with typical or middle values. The commonly useful measures of
central tendency are statistics called mean, median, and sometimes mode
during which the foremost common is mean. For skewed distribution or when
there’s concern about outliers, the median may be preferred.
 Spread: Spread is an indicator of what proportion distant from the middle
we are to seek out the find the info values. the quality deviation and variance
are two useful measures of spread. The variance is that the mean of the
square of the individual deviations and therefore the variance is the root of
the variance
 Skewness and kurtosis: Two more useful univariates descriptors are the
skewness and kurtosis of the distribution. Skewness is that the measure of
asymmetry and kurtosis may be a more subtle measure of peakedness
compared to a normal distribution
2. Multivariate Non-graphical: Multivariate non-graphical EDA technique is
usually wont to show the connection between two or more variables within the
sort of either cross-tabulation or statistics.
 For categorical data, an extension of tabulation called cross-tabulation is
extremely useful. For 2 variables, cross-tabulation is preferred by making a
two-way table with column headings that match the amount of one-variable
and row headings that match the amount of the opposite two variables, then
filling the counts with all subjects that share an equivalent pair of levels.
 For each categorical variable and one quantitative variable, we create
statistics for quantitative variables separately for every level of the specific
variable then compare the statistics across the amount of categorical
variable.
 Comparing the means is an off-the-cuff version of ANOVA and comparing
medians may be a robust version of one-way ANOVA.
3. Univariate graphical: Non-graphical methods are quantitative and objective,
they are not able to give the complete picture of the data; therefore, graphical
methods are used more as they involve a degree of subjective analysis, also
are required. Common sorts of univariate graphics are:
 Histogram: The foremost basic graph is a histogram, which may be a
barplot during which each bar represents the frequency (count) or proportion
(count/total count) of cases for a variety of values. Histograms are one of the
simplest ways to quickly learn a lot about your data, including central
tendency, spread, modality, shape and outliers.
 Stem-and-leaf plots: An easy substitute for a histogram may be stem-and-
leaf plots. It shows all data values and therefore the shape of the distribution.
 Boxplots: Another very useful univariate graphical technique is that the
boxplot. Boxplots are excellent at presenting information about central
tendency and show robust measures of location and spread also as
providing information about symmetry and outliers, although they will be
misleading about aspects like multimodality. One among the simplest uses
of boxplots is within the sort of side-by-side boxplots.
 Quantile-normal plots: The ultimate univariate graphical EDA technique is
that the most intricate. it’s called the quantile-normal or QN plot or more
generally the quantile-quantile or QQ plot. it’s wont to see how well a specific
sample follows a specific theoretical distribution. It allows detection of non-
normality and diagnosis of skewness and kurtosis
4. Multivariate graphical: Multivariate graphical data uses graphics to display
relationships between two or more sets of knowledge. The sole one used
commonly may be a grouped barplot with each group representing one level of
1 of the variables and every bar within a gaggle representing the amount of the
opposite variable.
Other common sorts of multivariate graphics are:
 Scatterplot: For 2 quantitative variables, the essential graphical EDA
technique is that the scatterplot , sohas one variable on the x-axis and one
on the y-axis and therefore the point for every case in your dataset.
 Run chart: It’s a line graph of data plotted over time.
 Heat map: It’s a graphical representation of data where values are depicted
by color.
 Multivariate chart: It’s a graphical representation of the relationships
between factors and response.
 Bubble chart: It’s a data visualization that displays multiple circles (bubbles)
in two-dimensional plot.

215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Unit 3
No ratings yet
Unit 3
31 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
EDA Exploratory Data Analysis (1)
No ratings yet
EDA Exploratory Data Analysis (1)
6 pages
DOC-20250125-WA0000.
No ratings yet
DOC-20250125-WA0000.
15 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Unit 3
No ratings yet
Unit 3
47 pages
DSE 3 Unit 4
No ratings yet
DSE 3 Unit 4
8 pages
Data Science- Module 2 (Updated )
No ratings yet
Data Science- Module 2 (Updated )
94 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
2 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
Unit 3
No ratings yet
Unit 3
77 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Unit 3
No ratings yet
Unit 3
222 pages
EDA
No ratings yet
EDA
3 pages
DataAnalytics(Unit 2)
No ratings yet
DataAnalytics(Unit 2)
131 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Unit 2
No ratings yet
Unit 2
58 pages
The analysis_In_EDA
No ratings yet
The analysis_In_EDA
7 pages
Why Exploratory Data Analysis is Important
No ratings yet
Why Exploratory Data Analysis is Important
2 pages
Edashsh
No ratings yet
Edashsh
7 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
5. Exploratory Data Analysis (EDA) in Data
No ratings yet
5. Exploratory Data Analysis (EDA) in Data
12 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
datascience unit-4
No ratings yet
datascience unit-4
6 pages
Unit II. Methods and Techniques For Data Analytics
No ratings yet
Unit II. Methods and Techniques For Data Analytics
91 pages
05_AIHC_Exp02
No ratings yet
05_AIHC_Exp02
11 pages
827b551be7606030c4c1ca693fb54a0ed875
No ratings yet
827b551be7606030c4c1ca693fb54a0ed875
12 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
Assignment 3 - Exploratory Data Analysis
No ratings yet
Assignment 3 - Exploratory Data Analysis
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
17 pages
Eda
No ratings yet
Eda
6 pages
Exploratory Data Analysis types
No ratings yet
Exploratory Data Analysis types
14 pages
Unit 5 Exploratory Data Analysis (EDA)
100% (1)
Unit 5 Exploratory Data Analysis (EDA)
41 pages
Komorowski EDA2016
No ratings yet
Komorowski EDA2016
20 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Unit-1
No ratings yet
Unit-1
52 pages
Document (4)
No ratings yet
Document (4)
21 pages
Unit 2 Lec4
No ratings yet
Unit 2 Lec4
24 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Module 2
No ratings yet
Module 2
81 pages
UNIT II-DSDA.docx Notes
No ratings yet
UNIT II-DSDA.docx Notes
26 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
ML EXP1_2201107
No ratings yet
ML EXP1_2201107
34 pages
Dev 1
No ratings yet
Dev 1
2 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
03a EDA
No ratings yet
03a EDA
47 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Best Practices For
No ratings yet
Best Practices For
8 pages
Best Journal
No ratings yet
Best Journal
11 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
6 8086 and 80286 Microprocessor - 2022 - 04may - 2022
No ratings yet
6 8086 and 80286 Microprocessor - 2022 - 04may - 2022
27 pages
4 AMM - Unit 4 Arduino Interfacing - 18 April 2022 - Watermark
No ratings yet
4 AMM - Unit 4 Arduino Interfacing - 18 April 2022 - Watermark
32 pages
5 Unit5 Arduino Interrupt, Timer and Communication
No ratings yet
5 Unit5 Arduino Interrupt, Timer and Communication
19 pages
3 Unit 3-Arduino Programming - 11march2022
No ratings yet
3 Unit 3-Arduino Programming - 11march2022
35 pages
UNIT 2 - Physical Layer
No ratings yet
UNIT 2 - Physical Layer
59 pages
UNIT - 5 - Transport Layer VVK 2021-22
No ratings yet
UNIT - 5 - Transport Layer VVK 2021-22
42 pages
UNIT - 6 - Apllication Layer
No ratings yet
UNIT - 6 - Apllication Layer
29 pages
Statistical Package For The Social Sciences
No ratings yet
Statistical Package For The Social Sciences
5 pages
F21 Econ2121a PS 01
No ratings yet
F21 Econ2121a PS 01
5 pages
111111
No ratings yet
111111
4 pages
Lme4: Mixed-Effects Modeling With R
No ratings yet
Lme4: Mixed-Effects Modeling With R
145 pages
Test Bank for Introductory Statistics 10th Edition Neil A. Weiss - Full Version Is Ready For Free Download
100% (3)
Test Bank for Introductory Statistics 10th Edition Neil A. Weiss - Full Version Is Ready For Free Download
48 pages
Traducción y Adaptación Del Índice de Fatiga Vocal Al Español
No ratings yet
Traducción y Adaptación Del Índice de Fatiga Vocal Al Español
11 pages
Further Maths Practice Paper 1 Mark S
No ratings yet
Further Maths Practice Paper 1 Mark S
8 pages
AL3451 Assignment Question1
No ratings yet
AL3451 Assignment Question1
3 pages
Department of Applied Mathematics Chapter 5. Non-Parametric Tests
No ratings yet
Department of Applied Mathematics Chapter 5. Non-Parametric Tests
12 pages
Practical File - BRM
No ratings yet
Practical File - BRM
41 pages
Lesson Correlation Analysis Jan 7 2021
No ratings yet
Lesson Correlation Analysis Jan 7 2021
16 pages
Identification of Demand Forecasting Model Considering Key Factors in The Context of Healthcare Products
No ratings yet
Identification of Demand Forecasting Model Considering Key Factors in The Context of Healthcare Products
5 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
HELM (2005) : Section 41.2: Tests Concerning A Single Sample
No ratings yet
HELM (2005) : Section 41.2: Tests Concerning A Single Sample
14 pages
Physical Violence
No ratings yet
Physical Violence
4 pages
Ali2021 Article DeterminantsOfEarlyAgeOfMother
No ratings yet
Ali2021 Article DeterminantsOfEarlyAgeOfMother
7 pages
Determine Whether Each Statement Is True or False. (Statistics)
0% (1)
Determine Whether Each Statement Is True or False. (Statistics)
3 pages
Assessment in Learning 1 Chi Square
No ratings yet
Assessment in Learning 1 Chi Square
5 pages
Autocorrelation
No ratings yet
Autocorrelation
36 pages
Notes-Advanced Statistical Methods For Business Decision Making
No ratings yet
Notes-Advanced Statistical Methods For Business Decision Making
69 pages
(FREE PDF Sample) Primer of Applied Regression and Analysis of Variance 3rd Edition Glantz S.A. Ebooks
100% (1)
(FREE PDF Sample) Primer of Applied Regression and Analysis of Variance 3rd Edition Glantz S.A. Ebooks
62 pages
ML - Practical List
No ratings yet
ML - Practical List
3 pages
Post Test PDF
No ratings yet
Post Test PDF
9 pages
Cheeeeet Sheeet
No ratings yet
Cheeeeet Sheeet
3 pages
Jenis Kelamin Tingkat Pendidikan Crosstabulation: 1.gender Didik
No ratings yet
Jenis Kelamin Tingkat Pendidikan Crosstabulation: 1.gender Didik
5 pages
Integrity Statement: by Typing My Name Below, and Checking This Box, I Acknowledge That I Have
No ratings yet
Integrity Statement: by Typing My Name Below, and Checking This Box, I Acknowledge That I Have
6 pages
The Parametic Test of Significance Test T - Distribution
No ratings yet
The Parametic Test of Significance Test T - Distribution
43 pages
05 Inference Lab
No ratings yet
05 Inference Lab
12 pages
Store Clerk Project Document01
No ratings yet
Store Clerk Project Document01
22 pages