0% found this document useful (0 votes)
8 views

Week 5 MKMN 1018 - SE

Uploaded by

abhiatricanada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Week 5 MKMN 1018 - SE

Uploaded by

abhiatricanada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

MKMN 1018

Data Analytics and Data Mining


Schedule for Today

Agenda
• t test: 3 types
• ANOVA
• Linear Regression
• Test Statistic review
WEEK 5
T TEST
General Hypothesis Testing Steps

A hypothesis is a premise or a claim that we want to test.

Formulate the Choose the Select the


significance Collect Data Calculate the
Hypothesis appropriate
level 𝞪 test statistic
test*

Compare p
Determine the value
P value to chosen
significance
level 𝞪

*t value, z value, f value, x2 value


The type of test statistic you use depends on a couple of factors
We will be using excel to perform all calculations.

Type of Data: Categorical Data Numerical Data

Comparison Association Comparison Association


Business Objective:
Comparing Determining Comparing Determining
independent groups relationships independent groups relationships

Test Statistic: Proportion z test Chi-Squared (𝛘2) t test: Independent, one- Linear regression
for binary - Goodness of sample or paired
categorical data fit (1 variable)
- Test of z test: one sample or two
independence sample
(2 variables)
ANOVA (2+ samples)

Key Assumption: Your data is normally distributed (check by plotting in a histogram and observing
the shape)
#4: What if our sample size is small (<30)…use a t test
Types of t tests:
When to use this test:
When comparing means of a small sample (n < 30)

Test Assumptions:
Sample size is small (>30)
Data is normally distributed
Population variance is unknown – so sample variance is
used
#5: What if we have a more than 2 variables?
Analysis of Variance (ANOVA): Single Factor When to use this test:
When comparing means of a 3+ small samples (n >
Use sample data to make inferences about the 30).Only looking at single factor for this course
properties of entire populations.
Test Assumptions:
H0:μ1= μ2 = μ3 Sample size is small (>30)
HA: At least one μ differs from the others Data is normally distributed
Population variance is unknown – so sample variance is
used

Example:
You are testing 3 brands of batteries to see what the
average life of a battery is.

You use 18 flashlights and test the 3 brands.

Your goal is to determine the level of variation


between group and within groups.
#5: What if we have more than 2 variables?
Analysis of Variance (ANOVA): Single Factor

Test Steps:
1. Determine your Null and Alternative Hypothesis
2. Gather your data and ensure you meet all assumptions
3. Use the Data Analysis tool in excel to output the Anova: Single Factor Analysis (F test)
4. Compare the p value to your alpha value
The type of test statistic you use depends on a couple of factors
We will be using excel to perform all calculations.

Type of Data: Categorical Data Numerical Data

Comparison Association Comparison Association


Business Objective:
Comparing Determining Comparing Determining
independent groups relationships independent groups relationships

Test Statistic: Proportion z test Chi-Squared (𝛘2) t test: Independent, one- Linear regression
for binary - Goodness of sample or paired
categorical data fit (1 variable)
- Test of z test: one sample or two
independence sample
(2 variables)
ANOVA (2+ samples)

Key Assumption: Your data is normally distributed (check by plotting in a histogram and observing
the shape)
#6: What if we want to determine the relationship between
numerical variables?
Linear Regression
Assumptions
Models the relationship between 2 variables by
1. For any fixed value of x, y is normally distributed
fitting a linear equation to the observed data. 2. Observations are independent of each other
3. The proposed relationship between x and y is linear
y = mx + b

Y is the dependent variable


X is the independent variable Test Steps:
m is the slope of the line
1. Scatterplot to visually determine if a relationship exists
b is the y intercept (value of y when x = 0)
2. Create a Null and Alternative Hypothesis. Null
hypothesis is that there is no correlation.
3. Typically a two tailed test (the relationship being
positive or negative does not usually matter)
4. Use the data analysis tool to determine the p value
5. Check the p value against your alpha value
Putting it all together….

Match the scenario to the best test statistic and explain your choice:

1. A marketer is looking at customer satisfaction before and after the launch of a new online customer
service portal. 20 people were asked to rate their satisfaction at two different points in time.
2. A sales team is looking at the impact that three variables have on the average transaction size
3. A customer service team is trying to understand if there is any link between customer satisfaction
(Satisfied Yes/No) and gender (Male/Female).
4. A marketing team is trying to determine if there is a relationship between average transaction size
and the purchase time of day

Chi Squared Test


Paired t test Linear Regression ANOVA
for Independence
Week 5 Recap
What we did this week:
1. Test Statistics
- t test
- ANOVA
- Linear regression

Looking ahead to next week:


2. Deep dive on surveys and coding of data
3. Getting ready for Test #1

Action items:
4. Catch up on chapter readings and review concepts from class
5. Review test statistics and bring any questions with you to class next week.

You might also like