0% found this document useful (0 votes)
5 views

Business Research Methods S16

The document discusses One-Way Analysis of Variance (ANOVA), a statistical method used to compare the means of different categories based on a single independent variable. It details the components involved in ANOVA, including the calculation of sums of squares, the F statistic for testing hypotheses, and the interpretation of results. An illustrative example is provided, demonstrating the application of ANOVA in evaluating the effect of in-store promotions on sales.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Business Research Methods S16

The document discusses One-Way Analysis of Variance (ANOVA), a statistical method used to compare the means of different categories based on a single independent variable. It details the components involved in ANOVA, including the calculation of sums of squares, the F statistic for testing hypotheses, and the interpretation of results. An illustrative example is provided, demonstrating the application of ANOVA in evaluating the effect of in-store promotions on sales.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

ONE-WAY ANALYSIS OF VARIANCE

Marketing researchers are often interested in examining the differences


in the mean values of the dependent variable for several categories of a
single independent variable or factor. For example:
 Do the various segments differ in terms of their volume of product
consumption?
 Do the brand evaluations of groups exposed to different commercials
vary?
 What is the effect of consumers' familiarity with the store (measured
as high, medium, and low) on preference for the store?
STATISTICS ASSOCIATED WITH ONE-WAY
ANALYSIS OF VARIANCE (1 OF 2)

 eta2 ( 2). The strength of the effects of X (independent variable or


factor) on Y (dependent variable) is measured by eta2 ( 2). The
value of  2 varies between 0 and 1.
 F statistic. The null hypothesis that the category means are equal
in the population is tested by an F statistic based on the ratio of
mean square related to X and mean square related to error.
 Mean square. This is the sum of squares divided by the
appropriate degrees of freedom.
STATISTICS ASSOCIATED WITH ONE-WAY
ANALYSIS OF VARIANCE (2 OF 2)

 SSbetween. Also denoted as SSx, this is the variation in Y related to the


variation in the means of the categories of X. This represents
variation between the categories of X, or the portion of the sum of
squares in Y related to X.
 SSwithin. Also referred to as SSerror, this is the variation in Y due to
the variation within each of the categories of X. This variation is not
accounted for by X.
 SSy. This is the total variation in Y.
CONDUCTING ONE-WAY ANOVA
CONDUCTING ONE-WAY ANALYSIS OF VARIANCE
DECOMPOSE THE TOTAL VARIATION (1 OF 2)

The total variation in Y, denoted by SSy, can be decomposed into two


components:
SSy = SSbetween + SSwithin

where the subscripts between and within refer to the categories of X.


SSbetween is the variation in Y related to the variation in the means of the
categories of X. For this reason, SSbetween is also denoted as SSx.

SSwithin is the variation in Y related to the variation within each category of X.


SSwithin is not accounted for by X. Therefore it is referred to as SSerror.
CONDUCTING ONE-WAY ANALYSIS OF VARIANCE
The total variation
DECOMPOSE in Y TOTAL
THE may be decomposed
VARIATIONas:
(2 OF 2)
SSy = SSx + SSerror

Where N
SS y  (Yi  Y 2 )
i 1
c
SS x  n(Y j  Y ) 2
j 1
c n
SSerror  (Yij  Y j ) 2
j i

Yi = individual observation

Y j = mean for category j


Y = mean over the whole sample, or grand mean
Yij = ith observation in the jth category
DECOMPOSITION OF THE TOTAL VARIATION:
ONE-WAY ANOVA
CONDUCTING ONE-WAY ANALYSIS OF VARIANCE
In analysis of variance, we estimate two measures of variation: within groups
(SSwithin) and between groups (SSbetween).

Thus, by comparing the Y variance estimates based on between-group and


within-group variation, we can test the null hypothesis.

Measure the Effects


The strength of the effects of X on Y are measured as follows:
 2
= SSx/SSy = (SSy − SSerror)/SSy

The value of  2
varies between 0 and 1.
CONDUCTING ONE-WAY ANALYSIS OF VARIANCE
TEST SIGNIFICANCE (1 OF 2)
In one-way analysis of variance, the interest lies in testing the null
hypothesis that the category means are equal in the population.
H0: µ1 = µ2 = µ3 = ........... = µc
Under the null hypothesis, SSx and SSerror come from the same source of
variation. In other words, the estimate of the population variance of Y,
S y2 SS x / (c  1)
Mean square due to X
MS x
or
S y2 SSerror / ( N  c)
Mean square due to error
MSerror
CONDUCTING ONE-WAY ANALYSIS OF VARIANCE
TEST SIGNIFICANCE (2 OF 2)
The null hypothesis may be tested by the F statistic
based on the ratio between these two estimates:

SS x / (c  1) MS x
F 
SSerror / ( N  c ) MSerror

This statistic follows the F distribution, with (c


− 1) and (N − c) degrees of freedom (df).
CONDUCTING ONE-WAY ANALYSIS OF VARIANCE
INTERPRET THE RESULTS

 If the null hypothesis of equal category means is not


rejected, then the independent variable does not have a
significant effect on the dependent variable.
 On the other hand, if the null hypothesis is rejected, then
the effect of the independent variable is significant.
 A comparison of the category mean values will indicate
the nature of the effect of the independent variable.
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE

Department Store Data


The department store is attempting to determine the effect of in-store promotion (X) on sales
(Y). For the purpose of illustrating hand calculations,
The null hypothesis is that the category means are equal:
H0: µ1 = µ2 = µ3
EFFECT OF PROMOTION AND CLIENTELE ON SALES (1
OF 2)

Store Number In-Store Sales Clientele Rating


Promotion
1 1 10 9
2 1 9 10
3 1 10 8
4 1 8 4
5 1 9 6
6 2 8 8
7 2 8 4
8 2 7 10
9 2 9 6
10 2 6 9
11 3 5 8
12 3 7 9
13 3 6 6
14 3 4 10
15 3 5 4
EFFECT OF PROMOTION AND CLIENTELE ON SALES (2
OF 2)

Store In-Store Sales Clientele


Number Promotion Rating
16 1 8 10
17 1 9 6
18 1 7 8
19 1 7 4
20 1 6 9
21 2 4 6
22 2 5 8
23 2 5 10
24 2 6 4
25 2 4 9
26 3 2 4
27 3 3 6
28 3 2 10
29 3 1 9
30 3 2 8
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (1 OF 7)
Blank Blank Level of In-Store Promotion Blank
Store No. High (1) Medium (2) Low (3)
Blank Blank Normalized Sales Blank
1 10 8 5
2 9 8 7
3 10 7 6
4 8 9 4
5 9 6 5
6 8 4 2
7 9 5 3
8 7 5 2
9 7 6 1
10 6 4 2
Column totals 83 62 37
Category means:
Yj 83 / 10 = 8.3 62 / 10 = 6.2 37 / 10 = 3.7
Grand mean,Y Blank = (83+62+37) / 30 = 6.067 Blank
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (2 OF 7)
To test the null hypothesis, the various sums of squares are computed as follows:
SSy = (10−6.067)2 + (9−6.067)2 + (10−6.067)2 + (8−6.067)2 + (9−6.067)2
+ (8−6.067)2 + (9−6.067)2 + (7−6.067)2 + (7−6.067)2 + (6−6.067)2
+ (8−6.067)2 + (8−6.067)2 + (7−6.067)2 + (9−6.067)2 + (6−6.067)2
(4−6.067)2 + (5−6.067)2 + (5−6.067)2 + (6−6.067)2 + (4−6.067)2
+ (5−6.067)2 + (7−6.067)2 + (6−6.067)2 + (4−6.067)2 + (5−6.067)2
+ (2−6.067)2 + (3−6.067)2 + (2−6.067)2 + (1−6.067)2 + (2−6.067)2
= (3.933)2 + (2.933)2 + (3.933)2 + (1.933)2 + (2.933)2
+ (1.933)2 + (2.933)2 + (0.933)2 + (0.933)2 + (−0.067)2
+ (1.933)2 + (1.933)2 + (0.933)2 + (2.933)2 + (−0.067)2
(−2.067)2 + (−1.067)2 + (−1.067)2 + (−0.067)2 + (−2.067)2
+ (−1.067)2 + (0.9333)2 + (−0.067)2 + (−2.067)2 + (−1.067)2
+ (−4.067)2 + (−3.067)2 + (−4.067)2 + (−5.067)2 + (−4.067)2
= 185.867
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (3 OF 7)

SSx = 10(8.3−6.067)2 + 10(6.2−6.067)2 + 10(3.7−6.067)2


= 10(2.233)2 + 10(0.133)2 + 10(−2.367)2
= 106.067
SSerror = (10−8.3)2 + (9−8.3)2 + (10−8.3)2 + (8−8.3)2 + (9−8.3)2
+ (8−8.3)2 + (9−8.3)2 + (7−8.3)2 + (7−8.3)2 + (6−8.3)2
+ (8−6.2)2 + (8−6.2)2 + (7−6.2)2 + (9−6.2)2 + (6−6.2)2
+ (4−6.2)2 + (5−6.2)2 + (5−6.2)2 + (6−6.2)2 + (4−6.2)2
+ (5−3.7)2 + (7−3.7)2 + (6−3.7)2 + (4−3.7)2 + (5−3.7)2
+ (2−3.7)2 + (3−3.7)2 + (2−3.7)2 + (1−3.7)2 + (2−3.7)2
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (4 OF 7)

= (1.7)2 + (0.7)2 + (1.7)2 + (−0.3)2 + (0.7)2


+ (−0.3)2 + (0.7)2 + (−1.3)2 + (−1.3)2 + (−2.3)2
+ (1.8)2 + (1.8)2 + (0.8)2 + (2.8)2 + (−0.2)2
+ (−2.2)2 + (−1.2)2 + (−1.2)2 + (−0.2)2 + (−2.2)2
+ (1.3)2 + (3.3)2 + (2.3)2 + (0.3)2 + (1.3)2
+ (−1.7)2 + (−0.7)2 + (−1.7)2 + (−2.7)2 + (−1.7)2
= 79.80
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (5 OF 7)

It can be verified that


SSy = SSx + SSerror

as follows:
185.867 = 106.067 +79.80

The strength of the effects of X on Y are measured as follows:


 2 = SSx/SSy
= 106.067/185.867
= 0.571
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (6 OF 7)
In other words, 57.1% of the variation in sales (Y) is accounted for by in-store
promotion (X), indicating a modest effect. The null hypothesis may now be
tested.

SS x / (c  1) MS X
F 
SSerror / ( N  c) MS error
106.067 / (3  1)
F
79.800 / (30  3)
17.944
ILLUSTRATIVE APPLICATIONS OF ONE-WAY
ANALYSIS OF VARIANCE (7 OF 7)

 From Table in the Statistical Appendix we see that for 2 and 27 degrees of freedom, the
critical value of F is 3.35 for  = 0.05. Because the calculated value of F is greater than the
critical value, we reject the null hypothesis.
 We now illustrate the analysis of variance procedure using a computer program. The
results of conducting the same analysis by computer are presented in Table 16.4.
ONE-WAY ANOVA: EFFECT OF IN-STORE PROMOTION ON
STORE SALES
Table 16.4 One-Way ANOVA: Effect of In-Store Promotion on Store
Sales

Source of Variation Sum of df Mean F F


Squares Squar Ratio Prob.
e
Between groups (In-store 106.067 2 53.033 17.944 0.000
promotion)
Within groups (Error) 79.800 27 2.956 Blank Blank

TOTAL 185.867 29 6.409 Blank Blank


Cell Means Level of In-store Coun Mean
Promotion t
High (1) 10 8.300
Medium (2) 10 6.200
Low (3) 10 3.700
TOTAL 30 6.067
ASSUMPTIONS IN ANALYSIS OF VARIANCE

The salient assumptions in analysis of variance can be summarized as follows:

1. Ordinarily, the categories of the independent variable are assumed to be fixed.


Inferences are made only to the specific categories considered. This is referred
to as the fixed-effects model.
2. The error term is normally distributed, with a zero mean and a constant
variance. The error is not related to any of the categories of X.
3. The error terms are uncorrelated. If the error terms are correlated (i.e., the
observations are not independent), the F ratio can be seriously distorted.
IN CLASS EXERCISE

 Callaway Golf is the largest golf equipment manufacturing


company in the United States, specializing in golf clubs, bags,
balls, apparel, and gloves. The firm is currently experimenting with
three designs of a golf club. The R&D department of the firms
claim that all three designs are equivalent in terms of power. To
check this claim you collected random samples of ball distance (in
yards) for the three designs. Conduct a one-way ANOVA to check
the hypothesis that the designs are equivalent.
PRODUCT MOMENT CORRELATION

 The product moment correlation, r, summarizes the strength of


association between two metric (interval or ratio scaled) variables, say X
and Y.

 It is an index used to determine whether a linear or straight-line


relationship exists between X and Y.

 As it was originally proposed by Karl Pearson, it is also known as the


Pearson correlation coefficient.
It is also referred to as simple correlation, bivariate correlation, or merely
the correlation coefficient.
PRODUCT MOMENT
From a sample CORRELATION
of n observations, (2
X OF
and5)Y, the product
moment correlation, r, can be calculated as:


 Where Co-variance of X and Y is given by

Cars sold vs TV ads example


 r varies between −1.0 and +1.0.
 The correlation coefficient between two variables
will be the same regardless of their underlying
units of measurement.
EXPLAINING ATTITUDE TOWARD THE CITY OF RESIDENCE
Explaining Attitude Toward the City of Residence
Responden Attitude Duration of Importance
t No. Toward the Residence Attached to
City Weather
1 6 10 3
2 9 12 11
3 8 12 4
4 3 4 1
5 10 12 11
6 4 6 1
7 5 8 7
8 2 2 4
9 11 18 8
10 9 9 10
11 10 17 8
12 2 2 5
REGRESSION ANALYSIS
Regression analysis examines associative relationships between a metric dependent variable
and one or more independent variables in the following ways:
 Determine whether the independent variables explain a significant variation in the
dependent variable: whether a relationship exists.
 Determine how much of the variation in the dependent variable can be explained by
the independent variables: strength of the relationship.
 Determine the structure or form of the relationship: the mathematical equation
relating the independent and dependent variables.
 Predict the values of the dependent variable.
 Control for other independent variables when evaluating the contributions of a
specific variable or set of variables.
 Regression analysis is concerned with the nature and degree of association between
variables and does not imply or assume any causality.
A WORD OF CAUTION

Although independent variables may explain the


variation in the dependent variable, this does
not imply causation.
https://www.tylervigen.com/spurious-correlations
STATISTICS ASSOCIATED WITH BIVARIATE REGRESSION
ANALYSIS (1 OF 3)
 Bivariate regression model. The basic regression equation is Yi = β0 +
β1 Xi + ei, where Y = dependent or criterion variable,
X = independent or predictor variable, β0 = intercept of the line,
β1 = slope of the line, and ei is the error term associated with the i th
observation.
 Coefficient of determination. The strength of association is measured
by the coefficient of determination, r2. It varies between 0 and 1 and
signifies the proportion of the total variation in Y that is accounted for by
the variation in X.
 Estimated or predicted value. The estimated or predicted value of Yi is Ŷi = a +
bx, where Ŷi is the predicted value of Yi, and a and b are estimators of β0 and β1 ,
respectively.
STATISTICS ASSOCIATED WITH BIVARIATE REGRESSION
ANALYSIS (2 OF 3)

 Regression coefficient. The estimated parameter b is usually referred to as the non-


standardized regression coefficient.

 Scattergram. A scatter diagram, or scattergram, is a plot of the values of two variables


for all the cases or observations.

 Standard error of estimate. This statistic, SEE, is the standard deviation of the actual Y
values from the predicted Ŷ values.

 Standard error. The standard deviation of b, SEb, is called the standard error.
CONDUCTING BIVARIATE REGRESSION ANALYSIS
FORMULATE THE BIVARIATE REGRESSION MODEL

In the bivariate regression model, the general form of a straight line is: Y = β0 + β1 X

where
Y = dependent or criterion variable
X = independent or predictor variable
β0 = intercept of the line
β1 = slope of the line

The regression procedure adds an error term to account for the probabilistic or stochastic nature of the
relationship:

Y i = β 0 + β 1 Xi + e i
where ei is the error term associated with the i th observation.
PLOT OF ATTITUDE WITH DURATION
Plot of Attitude with Duration
WHICH STRAIGHT LINE IS BEST?
Which Straight Line Is Best?
BIVARIATE REGRESSION (1 OF 2)
CONDUCTING BIVARIATE REGRESSION ANALYSIS
ESTIMATE THE PARAMETERS (1 OF 3)
𝑌 = 𝛽0 + 𝛽1 𝑋

In most cases,β0 and β1 are unknown and are estimated from the sample
observations using the equation
= a + b xi
where is the estimated or predicted value of Yi, and a and b are estimators of β0
and β1, respectively.
ASSUMPTIONS

 The error term is normally distributed.


 The mean of the error term is 0.
 The variance of the error term is constant. This variance does not depend on the values
assumed by X.
 The error terms are uncorrelated. In other words, the observations have been drawn
independently.
RESIDUAL PLOT INDICATING THAT VARIANCE IS NOT
CONSTANT
PLOT OF RESIDUALS INDICATING THAT A FITTED MODEL IS
APPROPRIATE
STEPS
BIVARIATE REGRESSION (2 OF 2)

Multiple R 0.93608
R2 0.87624
Adjusted R2 0.86387
Standard error 1.22329

Blank Blank Analysis of Variance Blank


Blank df Sum of Squares Mean Square
Regression 1 105.95222 105.95222
Residual 10 14.96444 1.49644
F = 70.80266 Significance of F = Blank Blank
0.0000
blank blank Variables in the blank blank blank
Equation
Variable b SEB Beta (B) t Significance
of t
Duration 0.5897 0.07008 0.93608 8.414 0.0000
2
(Constant 1.0793 0.74335 blank 1.452 0.1772
) 2
MULTIPLE REGRESSION

 Can variation in sales be explained in terms of variation in


advertising expenditures, prices, and level of distribution?
 Can variation in market share be accounted for by the size
of sales force, advertising expense, and sales promotion
budgets?
MULTIPLE REGRESSION (1 OF 2)

The general form of the multiple regression model is as follows:

Y = β0+ β1X1+ β2X2+ β3X3+…+ βkXk+ e

which is estimated by the following equation:

Ŷ = a + b1X1+ b2X2+ b3X3+…+ bkXk

As before, the coefficient a represents the intercept, but the b's are now the partial regression
coefficients.
STATISTICS ASSOCIATED WITH MULTIPLE REGRESSION (1 OF 2)

 Adjusted R2. R2, coefficient of multiple determination, is adjusted


for the number of independent variables and the sample size to
account for the diminishing returns. After the first few variables,
the additional independent variables do not make much
contribution.

 Coefficient of multiple determination. The strength of


association in multiple regression is measured by the square of
the multiple correlation coefficient, R2, which is also called the
coefficient of multiple determination.
MULTIPLE REGRESSION (2 OF 2)

Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard error 0.85974

Analysis of
Blank Blank Variance Blank
Blank df Sum of Squares Mean Square
Regression 2 114.26425 57.13213
Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = Blank Blank
blank blank 0.0000
Variables in the blank blank blank
Equation
Variable b SEB Beta (B) t Significance
of t
Importanc 0.2886 0.08608 0.31382 3.353 0.0085
e 5
Duration 0.4810 0.05895 0.76363 8.160 0.000
8
(Constant 0.3373 0.56736 blank 0.595 0.5668
MORE ON EXAMINING RESIDUALS

 A residual is the difference between the observed value of Yi and the value predicted by
the regression equation Ŷi.
 Scattergrams of the residuals, in which the residuals are plotted against the predicted
values, Ŷi, time, or predictor variables, provide useful insights in examining the
appropriateness of the underlying assumptions and regression model fit.
 The assumption of a normally distributed error term can be examined by constructing a
histogram of the residuals.
 The assumption of constant variance of the error term can be examined by plotting the
residuals against the predicted values of the dependent variable, Ŷi.
EXAMINING RESIDUALS

 A plot of residuals against time, or the sequence of observations,


will throw some light on the assumption that the error terms are
uncorrelated.
 Plotting the residuals against the independent variables provides
evidence of the appropriateness or inappropriateness of using a
linear model. Again, the plot should result in a random pattern.

You might also like