Multivariate Analysis IBS
Multivariate Analysis IBS
2
Multiple Regression Analysis
The general multiple regression with k
independent variables is given by:
Y ' = a + b1 X 1 + b2 X 2 + ...+ bk X k
Greek letters are used for a (α ) and b (β ) when
Cars 1.000
b2(Size) +
b3(College)
The variable college is called a dummy
or indicator variable. It can take
only one of the two possible outcomes
i.e. a child is a college student or not.
Examples of dummy variables: gender,
the part is acceptable or not, the voter
will or will not vote for the incumbent
governor etc.
We usually code one value of the
dummy variable as “1” and the other
“0.”
9
Fam ily Food Incom e Size Student
1 3900 376 4 0
2 5300 515 5 1
3 4300 516 4 0
4 4900 468 5 0
5 6400 538 6 1
6 7300 626 7 1
7 4900 543 5 0
8 5300 437 4 0
9 6100 608 5 1
10 6400 513 6 1
11 7400 493 6 1
12 5800 563 5 0
Example 1 continued
10
Example 1 continued
Food Expend.=$954+$1.09*500+$748*4+
$565*0 12
The regression equation is
Food = 954 + 1.09 Income + 748 Size + 565 Student
Analysis of Variance
Source DF SS MS F P
Regression 3 10762903 3587634 10.94 0.003
Residual Error 8 2623764 327970
Total 11 13386667
13
Example 1 continued
The coefficient Food Income Size College
of determination
is 80.4 percent. Food 1.000
This means that
Income 0.587 1.000
more than 80
Size 0.876 0.609 1.000
percent of the
College 0.773 0.491 0.743 1.000
variation in the
amount spent on
food is Correlation matrix
accounted for by
the variables
income, family
The strongest correlation between the
size, and
dependent variable and an independent
student.
variable is between family size and amount
14
spent on food.
Conduct an individual test to
determine which coefficients are not
zero. This is the hypotheses for the
independent variable family size.
H0 : β2 = 0 H1: β2 ≠ 0
From the MINITAB
output, the only Thus, using the
significant variable 5% level of
is FAMILY (family significance,
size) using the p- reject H0 if the p-
values. The other
value < .05.
variables can be
omitted from the Example 1 continued15
If we rerun the analysis using only the
significant independent variable i.e.
family size. The new regression
equation is:
Y’ = 340 + 1031X2
The coefficient of determination is
76.8 percent. We dropped two
independent variables, and the R-
square term was reduced by only 3.6
percent.
Example 1 continued16
Regression Analysis: Food versus Size
Analysis of Variance
Source DF SS MS F P
Regression 1 10275977 10275977 33.03 0.000
Residual Error 10 3110690 311069
Total 11 13386667
Example 1 continued
17
A residual is the difference between the actual
value of Y and the predicted value Y’.
Analysis of Residuals
18
Residual Plots against Estimated Values of Y
1000
Residuals
500
-500
4500 6000 7500
Y’ Residual Plot
19
8
7
6
Frequency
5
4
3
2
1
0
-600 -200 200 600 1000
Residuals
Histograms of Residuals
20