0% found this document useful (0 votes)

3 views

Exploratory Data Analysis_v4_part2

Uploaded by

ahmedpandit48

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Exploratory Data Analysis_v4_part2

Uploaded by

ahmedpandit48

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

Exploratory Data

Analysis
Benefits, Techniques, and Examples
Part 2
Recall

Identifying the
Right Data

Clean the Data

What is in the
data?
Bivariate Analysis
• A statistical method used in data analysis to examine and understand
the relationship, association, or interaction between two different
variables.
• It involves the simultaneous analysis of two variables, which can be
numerical or categorical.
• It explores how changes in one variable are associated with changes
in another variable.
Types of Bivariate Analysis
Bivariate analysis involves examining the relationships or associations
between two different variables.
The type of bivariate analysis you choose depends on the nature of the
variables you're working with.
Here are the common types of bivariate analysis:
1. Numerical-Numerical Analysis
2. Categorical-Categorical Analysis
3. Categorical-Numerical Analysis
Categorical-Categorical Analysis
• examine the relationship between two categorical variables.
• Used to determine association/dependency between categorical
variables.
• Common techniques include contingency tables and chi-square tests.
Categorical-Categorical Analysis
Contingency Table
• Choose variables
• Create a Contingency Table (also called a cross-tabulation or crosstab)
to display the distribution of one categorical variable in relation to the other.
Rows represent categories of one variable, and columns represent categories
of the other variable.
The values in the table are the counts or frequencies of observations falling
into each combination of categories.
Categorical-Categorical Analysis
Chi-Square Test
• Create Contingency Table
• Calculate Expected Frequencies
• For each cell under the assumption of independence.
• Perform the Chi-Square Test
• chi-square test for independence is used to determine whether there is a
significant association between the two categorical variables.
• It tests the null hypothesis that the variables are independent.
Chi-Square Test
Chi-Square Test
Steps for Categorical-Categorical
Analysis
• Interpret the Results
Examine the chi-square statistic and its associated p-value.
If the p-value is less than a chosen significance level (e.g., 0.05), you can
reject the null hypothesis and conclude that there is a statistically significant
association between the two variables.
Categorical-Numerical Analysis
ID Education Gender Income
• Explore the
1 FSc Female 30,000
relationship/association
2 BS Male 50,000
between a categorical variable
3 B.Ed Male 45,000
and a numerical variable
4 MS Female 90,000
5 FSc Male 32,000
• Determine, if there are 6 Ph.D. Male 150,000

statistically significant 7 BS Male 80,000

differences in the numerical 8 FSc Male 37,000

9 Ph.D. Female 220,000
variable across different
categories Is there a significant difference in Income based on
education level?

What is the relationship between Gender and Income?

Categorical-Numerical Analysis
Types

• Descriptive Analysis
• Summarize the numerical variable’s statistics within each category
• Inferential Analysis
• Use statistical tests
• Such as T-Test, ANOVA
Categorical-Numerical Analysis
ANOVA

• ANOVA: Analysis of Variance

• Statistical method to analyze differences among group means in a
sample.
• Statistical significant differences between the means of three or more
independent groups.
• Assumes normal distribution for data and independence of
observations
Categorical-Numerical Analysis
ANOVA: Purpose

• Test the null Hypothesis (H0): Several groups are equal

• No significant difference between the groups
• Variations in data caused due to differences between groups or random
fluctuations within groups
Categorical-Numerical Analysis
ANOVA: Types

• One-Way ANOVA
• 1 categorical (independent) variable with 3 or more groups
• Two-Way ANOVA
• 2 categorical (independent) variables
• Analyze individual and interactive effects on the dependent variable
Categorical-Numerical Analysis
ANOVA: F-Statistic

• Variance between group means VS variance within groups

• >F-Statistic and <p-value : significant differences between groups
• Calculation of F-Statistic:
• Explained Variation – variation between group
• Also known as “between-group Sum of Squares (SSB)” or “Treatment Sum of Squares
(SST)”
• Measure of difference between group means vs overall mean
Categorical-Numerical Analysis
ANOVA: Calculating F-Statistic

1. Explained Variation – variation between group

• Also known as “between-group Sum of Squares (SSB)” or “Treatment Sum of
Squares (SST)”
• Measure of difference between group means vs overall mean
Categorical-Numerical Analysis
ANOVA: Calculating F-Statistic

2. Unexplained Variation – variation between group

• Also known as “within-group Sum of Squares (SSW)” or “Error Sum of Squares
(SSE)”
• Measure of difference between individual data points within each group vs
group mean
Categorical-Numerical Analysis
ANOVA: Calculating F-Statistic

3. Degrees of Freedom (df)

• Degrees of Freedom for the between-group variation (dfB):
• Number of groups(k)
• dfB = k-1
• Degrees of Freedom for the within-group variation (dfW):
• Total Number of Observations(N)
• dfW = N-k
Categorical-Numerical Analysis
ANOVA: Calculating F-Statistic

4. F-Statistic = (SSB/dfB) / (SSW/dfW)

Next for ANOVA:

• Use a Cumulative Distribution Function (CDF) for the F-distribution to
find the p-value associated with F-Statistic.
• Compare p-value to the threshold.
• If p-value is less than the threshold, H0 is rejected.
Categorical-Numerical Analysis
ANOVA: Example

Step 1:
• Null Hypothesis (H0): There is no significant difference in the test
scores among the three schools (μa = μb = μc).
• Alternative Hypothesis (Ha): There is a significant difference in the
test scores among at least one pair of schools.
Categorical-Numerical Analysis
ANOVA: Example

Step 2:
• Test Scores for students from 3 schools
• School A:[85,88,90,82,89]
• School B:[78,81,83,80,85]
• School C:[92, 89, 94, 88, 91]
Categorical-Numerical Analysis
ANOVA: Example

Step 3:
• Mean for each school:
• μa: (85+88+90+82+89)/5 = 86.8
• μb : (78+81+83+80+85)/5 = 81.4
• μc: (92+89+94+88+91)/5 = 90.8
Categorical-Numerical Analysis
ANOVA: Example

Step 4:
• Calculate the overall mean (μ) for all test scores:
• Overall Mean (μ) = (86.8 + 81.4 + 90.8) / 3 = 86.33
Categorical-Numerical Analysis
ANOVA: Example

Step 5:
• Calculate the Between-Group Variation (SSB)

• Where: k = number of groups

• = size from group j
• = mean of data items in group j
• = mean of all data items in the dataset
• where yij is each individual data point in group i.
• SSB = (5 * (86.8 - 86.33)²) + (5 * (81.4 - 86.33)²) + (5 * (90.8 - 86.33)²)
= 103.29
Categorical-Numerical Analysis
ANOVA: Example

Step 6:
• Calculate the Within-Group Variation (SSW)

• Where: k = number of groups

• = size from group j
• = mean of data items in group j
• = ith observation in group j

• SSW = (85 - 86.8)² + (88 - 86.8)² + ….= ?

Categorical-Numerical Analysis
ANOVA: Example

Step 7:
• Calculate Degrees of Freedom
• dfB = k-1 = ?
• dfW = N-k = ?
Categorical-Numerical Analysis
ANOVA: Example

Step 8:
• Calculate F-Statistic:
• F = (SSB / dfB) / (SSW / dfW) = ?
Categorical-Numerical Analysis
ANOVA: Example

Step 9:
• Identify the critical F-value threshold (e.g. )
• For , (numerator) dfB=2, (denominator) dfW=12, the critical F-value is
~3.8853
• Calculate the p-value using F-distribution.
# Calculate the p-value
• Get CDF of F-distribution p_value <- 1 - pf(F, dfB, dfW)
• p_value = 1- cumulative probability

• For F=15.01, dfB=2, dfW=12, the calculated p-value is 0.04999981

Categorical-Numerical Analysis
F-Table
Categorical-Numerical Analysis
ANOVA: Example

Step 10:
• Compare the p-value to the chosen significance level (α)
• If the p-value ≤ α
• Reject the null hypothesis (Ha is supported).
• If the p-value > α
• Fail to reject the null hypothesis (H0 is supported).
Categorical-Numerical Analysis
ANOVA: Example

Step 11:
• Interpret the Results:
• Since H0 is rejected, we conclude that there is a significant difference in the
test scores among at least one pair of schools.
Categorical-Numerical Analysis
T-Test

• Statistical hypothesis test

• Determines if there is a significant difference between means of two
groups.
• Useful to assess if the means of a continuous variable in one group
differs from the other.
• Parametric Test: Assumes,
• Data is normally distributed
• Variances in the two groups are equal
Categorical-Numerical Analysis
T-Test: Types

• Independent Samples T-Test (Student’s T-Test)

• Two independent Groups
• Compare means of a variable between these groups
• Assess, if the means are significantly different from each other.
• Paired Samples T-Test
• One group of subjects
• Measure same variable for each subject twice, under different conditions or
different time points.
• Assess, if the means of the paired observations are significantly different.
Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

• Independent Samples T-Test (Student’s T-Test)

• Determine, if there is a significant difference in the average test scores of two
groups of students (Group A and Group B) Score When
• Group A: [85, 88, 90, 82, 89] 85 Group A
88 Group A
• Group B: [78, 81, 83, 80, 85] 90 Group A
82 Group A
89 Group A
78 Group B
81 Group B
83 Group B
80 Group B
85 Group B
Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

Step 1: Formulate the Hypotheses

• Null Hypothesis(H0)
• means of test scores in Group A and Group B are equal ()
• Alternate Hypothesis(Ha)
• means of test scores in Group A and Group B are not equal ()
Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

Step 2: Calculate the Means

Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

Step 3: Calculate the Variance and Standard Error

•
• ?
Categorical-Numerical Analysis
Formula for t-Statistic is:

T-Test: Independent Samples T-Test (Example) Where:

• is the sample mean.
Step 4: Calculate the T-Statistic • μ is the hypothesized
population mean (under the
• null hypothesis).
• SE is the standard error.
• represents the t-statistic.
• and are the sample means for Group A and Group B, respectively.
• and are the sample variances for Group A and Group B respectively.
• and are the sample sizes for Group A and Group B, respectively.
Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

Step 5: Determine Degrees of Freedom

•
Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

Step 6: Find the Critical T-Value

• Based on chosen threshold (α) and degrees of freedom
• Use T-Table

Step 7: Calculate p-value

• Associated with the t-statistic using a t-distribution table
Categorical-Numerical Analysis
T-Test: Independent Samples T-Test (Example)

Step 8: Make a decision

• If |t-statistic| > critical t-value and p-value < α, then reject the null
hypothesis.
• If |t-statistic| ≤ critical t-value or p-value ≥ α, then fail to reject the
null hypothesis.
Categorical-Numerical Analysis
T-Test: Paired Samples T-Test (Example)

• Determine, if there is a significant difference in the test scores of

students before and after a special tutoring program students (Group
A and Group B) Score When
85 Pre-Tutor
• Pre-Tutor: [85, 88, 90, 82, 89]
88 Pre-Tutor
• Post-Tutor: [78, 81, 83, 80, 85] 90 Pre-Tutor
82 Pre-Tutor
89 Pre-Tutor
78 Post-Tutor
81 Post-Tutor
83 Post-Tutor
80 Post-Tutor
85 Post-Tutor
Categorical-Numerical Analysis
T-Test: Paired Samples T-Test (Example)

Step 1: Formulate the Hypotheses

• Null Hypothesis(H0)
• no significant difference between the mean test scores before tutoring and
after tutoring ()
• Alternate Hypothesis(Ha)
• a significant difference between the mean test scores before tutoring and
after tutoring ()
Categorical-Numerical Analysis
T-Test: Paired Samples T-Test (Example)

Step 2: Calculate the differences

• Differences d = [85-78,88-81,90-83,82-80,89-85]
Step 3: Calculate the Same Mean and Sample SD

Step 4: Calculate t-statistic

Categorical-Numerical Analysis
T-Test: Paired Samples T-Test (Example)

Step 5: Calculate the degrees of freedom

Step 6: Find the Critical T-Value (tα/2,df )

Step 7: Calculate the p-value (ppaired)

Step 8: Make a decision

• If ∣tpaired∣>tα/2,df and ppaired<α
• Reject H0
Multivariate Analysis
Next time!

CNUR 860 - FALL - 2020 Stats 2 - Instruction
100% (1)
CNUR 860 - FALL - 2020 Stats 2 - Instruction
7 pages
BRM Data Analysis Techniques
No ratings yet
BRM Data Analysis Techniques
53 pages
Stats For Primary FRCA
No ratings yet
Stats For Primary FRCA
7 pages
Structure of English-LET REVIEWER
No ratings yet
Structure of English-LET REVIEWER
25 pages
Lecture 2 Anova Erb
No ratings yet
Lecture 2 Anova Erb
26 pages
ANOVA 2023 Aa 2564896
No ratings yet
ANOVA 2023 Aa 2564896
26 pages
Spss 3
No ratings yet
Spss 3
27 pages
CSCI946 w4-Clustering
No ratings yet
CSCI946 w4-Clustering
70 pages
Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).
No ratings yet
Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).
21 pages
14. EDITED ANALYSIS OF VARIANCE -FINAL ANOVA (2)
No ratings yet
14. EDITED ANALYSIS OF VARIANCE -FINAL ANOVA (2)
58 pages
SPSS Workshop: Day 2 - Data Analysis
No ratings yet
SPSS Workshop: Day 2 - Data Analysis
32 pages
Statistical Analysis Cont 1
No ratings yet
Statistical Analysis Cont 1
53 pages
Z-TEST and ANOVA BY GROUP 3
No ratings yet
Z-TEST and ANOVA BY GROUP 3
33 pages
Module 3.4 ANOVA
No ratings yet
Module 3.4 ANOVA
60 pages
QTM Cycle 7 Session 6
No ratings yet
QTM Cycle 7 Session 6
76 pages
L4 ANOVA_082842
No ratings yet
L4 ANOVA_082842
18 pages
Lesson 14
No ratings yet
Lesson 14
102 pages
Final Exam
No ratings yet
Final Exam
47 pages
H. Descriptive Statistics मराठी
No ratings yet
H. Descriptive Statistics मराठी
27 pages
lecture 2 - Advanced Topics (1)
No ratings yet
lecture 2 - Advanced Topics (1)
63 pages
Anova: Module 3 - Advanced Statistics
No ratings yet
Anova: Module 3 - Advanced Statistics
17 pages
ANOVA
No ratings yet
ANOVA
22 pages
Anova Notes (unit 4)
No ratings yet
Anova Notes (unit 4)
33 pages
Chapter 5 Data AnalysisRevised
No ratings yet
Chapter 5 Data AnalysisRevised
33 pages
Anova
No ratings yet
Anova
56 pages
Balanced ANOVA
No ratings yet
Balanced ANOVA
40 pages
Statistic Inference
No ratings yet
Statistic Inference
38 pages
Central Tendency Dispersion Visualization
No ratings yet
Central Tendency Dispersion Visualization
34 pages
1 Hypothesis Testing Rev
No ratings yet
1 Hypothesis Testing Rev
122 pages
Statistical Analysis Cont 2
No ratings yet
Statistical Analysis Cont 2
47 pages
Lecture 7.descriptive and Inferential Statistics
No ratings yet
Lecture 7.descriptive and Inferential Statistics
44 pages
Collection and Presentation of Datafinal
No ratings yet
Collection and Presentation of Datafinal
95 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
hypothesis formulation and testing
No ratings yet
hypothesis formulation and testing
23 pages
1668068232statistical PDF
No ratings yet
1668068232statistical PDF
26 pages
SOC 210 MAR 11 Slides
No ratings yet
SOC 210 MAR 11 Slides
25 pages
GROUP-8 (Double Varience)
No ratings yet
GROUP-8 (Double Varience)
25 pages
ANOVA
No ratings yet
ANOVA
23 pages
Analysis of Variance: Randomized Blocks: Farrokh Alemi Ph.D. Kashif Haqqi M.D
No ratings yet
Analysis of Variance: Randomized Blocks: Farrokh Alemi Ph.D. Kashif Haqqi M.D
35 pages
CH 10
No ratings yet
CH 10
54 pages
Statistical Analysis, Chapter 4
No ratings yet
Statistical Analysis, Chapter 4
31 pages
13 - Anova
No ratings yet
13 - Anova
33 pages
Chapter 5 Descriptive Inferential Statistics
No ratings yet
Chapter 5 Descriptive Inferential Statistics
33 pages
Lecture 5 - Diff - MeansPart2 - ANOVA - ANCOVA
No ratings yet
Lecture 5 - Diff - MeansPart2 - ANOVA - ANCOVA
62 pages
One Way ANOVA Presentation
No ratings yet
One Way ANOVA Presentation
11 pages
ANOVA F Value
No ratings yet
ANOVA F Value
22 pages
Parametric Tests
100% (1)
Parametric Tests
9 pages
ANo VA
100% (5)
ANo VA
56 pages
Ch-5
No ratings yet
Ch-5
26 pages
PHPS30020 Week1 (5) - 29nov2023 (Test Decisions & Assumptions, Hypothesis, Compare 2 Groups)
No ratings yet
PHPS30020 Week1 (5) - 29nov2023 (Test Decisions & Assumptions, Hypothesis, Compare 2 Groups)
16 pages
5 ASAP Advanced Statistics - ANOVA - Total
No ratings yet
5 ASAP Advanced Statistics - ANOVA - Total
127 pages
Psychology Research Method
No ratings yet
Psychology Research Method
77 pages
Data Exploration
No ratings yet
Data Exploration
23 pages
L4 Chapter 3 Data Analysis
No ratings yet
L4 Chapter 3 Data Analysis
21 pages
Hypothesis Testing - Analysis of Variance (ANOVA)
No ratings yet
Hypothesis Testing - Analysis of Variance (ANOVA)
14 pages
RM - Unit 3
No ratings yet
RM - Unit 3
35 pages
Chapter 5 Hypothesis Testing
No ratings yet
Chapter 5 Hypothesis Testing
27 pages
Chi-Square and Analysis of Variance (ANOVA)
No ratings yet
Chi-Square and Analysis of Variance (ANOVA)
21 pages
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Nurse Patient
No ratings yet
Nurse Patient
37 pages
SC30-3533-04 5494 Functions Reference
No ratings yet
SC30-3533-04 5494 Functions Reference
316 pages
English Literature
No ratings yet
English Literature
9 pages
24 Sep
No ratings yet
24 Sep
10 pages
Computer System Architecture Lab Report 3
No ratings yet
Computer System Architecture Lab Report 3
7 pages
Passive Voice (Present and Past Simple)
0% (1)
Passive Voice (Present and Past Simple)
4 pages
Lec-14_Notes os
No ratings yet
Lec-14_Notes os
2 pages
National Shrine Policies - Approved
No ratings yet
National Shrine Policies - Approved
3 pages
All Night Prayer 2020
No ratings yet
All Night Prayer 2020
2 pages
Grade 5 COT English 1st Quarter
No ratings yet
Grade 5 COT English 1st Quarter
9 pages
Like and Dislikes
No ratings yet
Like and Dislikes
2 pages
Start Apache and Mysql at Boot On Kali Linux
No ratings yet
Start Apache and Mysql at Boot On Kali Linux
7 pages
Jiabs 25.1-2
100% (2)
Jiabs 25.1-2
337 pages
Assorted Questions 2: INSTRUCTION: Select The Correct Answer For Each of The Following
50% (2)
Assorted Questions 2: INSTRUCTION: Select The Correct Answer For Each of The Following
16 pages
Comprehension Clubs Research Final
No ratings yet
Comprehension Clubs Research Final
8 pages
Reported Speech and Conditional
No ratings yet
Reported Speech and Conditional
3 pages
17 Lokalitas Tafsir Nusantara Dinamika Studi Al-Qur'an Di Indonesia PDF
No ratings yet
17 Lokalitas Tafsir Nusantara Dinamika Studi Al-Qur'an Di Indonesia PDF
26 pages
2024 SP51 CP1407 Final Exam Template
No ratings yet
2024 SP51 CP1407 Final Exam Template
4 pages
Be Zealous and Repent
No ratings yet
Be Zealous and Repent
4 pages
Musical Symbols Score Study
No ratings yet
Musical Symbols Score Study
2 pages
Sample Paper 9
No ratings yet
Sample Paper 9
5 pages
20th Century British Lit. Upto 1940
No ratings yet
20th Century British Lit. Upto 1940
161 pages
No Me Sephardim
No ratings yet
No Me Sephardim
7 pages
I AM THE HIGHWAY CHORDS by Audioslave
No ratings yet
I AM THE HIGHWAY CHORDS by Audioslave
3 pages
Questionnaire On Teaching Learning 1
100% (1)
Questionnaire On Teaching Learning 1
4 pages
A GROUP-DISCUSSION METHOD OF Teaching Psychology
No ratings yet
A GROUP-DISCUSSION METHOD OF Teaching Psychology
11 pages
AmberRightsCat 2020 Web
No ratings yet
AmberRightsCat 2020 Web
43 pages
Chapter 1: Introduction: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edit9on
No ratings yet
Chapter 1: Introduction: Silberschatz, Galvin and Gagne ©2013 Operating System Concepts - 9 Edit9on
32 pages
Literary Criticism Reviewer
No ratings yet
Literary Criticism Reviewer
5 pages

Exploratory Data Analysis_v4_part2

Uploaded by

Exploratory Data Analysis_v4_part2

Uploaded by

Exploratory Data

Clean the Data

statistically significant 7 BS Male 80,000

differences in the numerical 8 FSc Male 37,000

What is the relationship between Gender and Income?

• ANOVA: Analysis of Variance

• Test the null Hypothesis (H0): Several groups are equal

• Variance between group means VS variance within groups

1. Explained Variation – variation between group

2. Unexplained Variation – variation between group

3. Degrees of Freedom (df)

4. F-Statistic = (SSB/dfB) / (SSW/dfW)

Next for ANOVA:

• Where: k = number of groups

• Where: k = number of groups

• SSW = (85 - 86.8)² + (88 - 86.8)² + ….= ?

• For F=15.01, dfB=2, dfW=12, the calculated p-value is 0.04999981

• Statistical hypothesis test

• Independent Samples T-Test (Student’s T-Test)

• Independent Samples T-Test (Student’s T-Test)

Step 1: Formulate the Hypotheses

Step 2: Calculate the Means

Step 3: Calculate the Variance and Standard Error

T-Test: Independent Samples T-Test (Example) Where:

Step 5: Determine Degrees of Freedom

Step 6: Find the Critical T-Value

Step 7: Calculate p-value

Step 8: Make a decision

• Determine, if there is a significant difference in the test scores of

Step 1: Formulate the Hypotheses

Step 2: Calculate the differences

Step 4: Calculate t-statistic

Step 5: Calculate the degrees of freedom

Step 6: Find the Critical T-Value (tα/2,df )

Step 7: Calculate the p-value (ppaired)

Step 8: Make a decision

You might also like