0% found this document useful (0 votes)
150 views

ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH

1. ANOVA can test differences between means of more than two samples and determine if population means are equal. 2. Pearson's correlation coefficient measures the strength and direction of linear relationship between two quantitative variables. A value closer to 1 or -1 indicates a stronger correlation. 3. Simple linear regression generates an equation to predict the value of a dependent variable based on the value of an independent variable, and the regression coefficient indicates the strength of their association.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH

1. ANOVA can test differences between means of more than two samples and determine if population means are equal. 2. Pearson's correlation coefficient measures the strength and direction of linear relationship between two quantitative variables. A value closer to 1 or -1 indicates a stronger correlation. 3. Simple linear regression generates an equation to predict the value of a dependent variable based on the value of an independent variable, and the regression coefficient indicates the strength of their association.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 40

ANOVA, Correlation

and Regression
Dr. Faris Al Lami
MB,ChB PhD FFPH
ANALYSIS OF VARIANCE
(ANOVA, F-test)
Learning Objectives

1.Using analysis of variance (ANOVA) to test difference


between means of more than 2.samples
3.Obtain a measure of the linear relationship between
two quantitative random variables (X &Y).
4.Interpret the scatter diagram.
5.Interpret the value of linear correlation coefficient (r).
6. Identify the use of simple linear regression
ANALYSIS OF VARIANCE (ANOVA)
• It is a technique where a total variation present in a set of data
is partitioned into several components, each component has its
specific variation.

• ANOVA can ascertain the magnitude of the contribution of each


of these sources to the total variation
 ANOVA tests the hypothesis that three or more population
means are equal.
H0: µ1 = µ2 = µ3 = . . . µk
H1: At least one mean is different
ANOVA methods
require the F-distribution
1. The F-distribution is not symmetric; it is skewed to the right.
2. The values of F can be 0 or positive, they cannot be negative.
3. There is a different F-distribution for each pair of degrees of
freedom for the numerator and denominator. Critical values of F
are given in F Table
One-Way ANOVA
Assumptions
1. The populations have approximately normal
distributions.
2. The populations have the same variance 2
3. The samples are simple random samples.
4. The samples are independent of each other.
5. The different samples are from populations which are
categorized in only one way.
The total sum of squares (SST)
• It is the sum of squares of the deviations of each observation from
the mean of all the observations taken together

Within Group Sum of Squares (SSW)


• Computing within each group the sum of the squared deviations of
the individual observation from their mean. Also called Error

Among Groups Sum of Squares (SSA)


• Computing for each group the squared deviation of the group mean
from the grand mean and to multiply the result by the size of the
group

•SST= SSA + SSW


Mean Squares (MS)
Sum of Squares SS (Among) and SS (within) divided by
corresponding number of degrees of freedom.
SS (total)
MST =
N–1

SS (within)
MSW =
N–k

SS (Among)
MSA =
k–1
Test Statistic for ANOVA

MS (Among)
F=
MS (Within)

 Numerator df = k – 1
 Denominator df = N – k
SPSS output
Go to options and select descriptives
ANOVA model-Example

As shown in the descriptive output, the mean score was


highest for the front row and lowest for the back row.
Null hypothesis: no difference between the 3 population
mean scores. i.e. there is nothing special about the
sitting position affecting the students score.
F-test

The F ratio is 15.7.


P ANOVA is <0.001.
We reject the null hypothesis.
There is a statistically significant difference in mean
score between at least one group compared to others.
ANOVA model-Where is the difference

• It is not possible to identify where exactly is the difference


in mean? (Front Vs Middle……Front Vs Back……Middle Vs
Back).

• Independent samples t-test can’t be used since this will


lead to increase type 1 error (alpha value) more than 0.05.
Testing for Significant Differences
Between Individuals Pairs of Means
• Bonferroni t-test
• Tukey’s HSD Test
• Multiple comparison procedure used for the
null hypothesis that all possible pairs of
treatment means are equal.
• HSD use single value against which differences
are compared
THE CORRELATION
MODEL
OBJECTIVE
•Obtain a measure of the relationship
between two random variables (X &Y)
Pearson’s Correlation Coefficient (r)

It is a measure of the linear (or straight line)


relationship between two interval level
variables

•Its value lies between (-1---- +1)


-1: perfect inverse linear correlation
+1: Perfect positive linear correlation
0: No correlation
Pearson’s Correlation Coefficient
(r)
• The value of (r) indicates the strength of the
relationship
• <0.2 : very weak
• 0.2- <0.4 : weak
• 0.4- <0.7 : moderate
• 0.7- <0.9 : strong
• ≥0.9 : very strong
Pearson’s Correlation Coefficient
(r)
• The sign of (r) indicates the direction of the
relationship
• Positive correlation indicates that high score
on one variable is associated with high scores
on a second variable
• Negative correlation indicates that high
scores on one variable is associated with low
scores on the second variable
Testing significance of (r)
• The (r ) value represents a sample value and the
P (rho) represent the population value.
• Ho P (rho)=0
• HA P≠0
n-2
• t=r √-----------
1-r2
df=n-2
A larger (r) and a bigger sample size is associated with a
higher calculated t- value and thus higher probability of
being statistically significant.
Scatter Diagram
• Scatter Diagram is a graphic device used to visually
summarize the relationship between two variables
• The X-axis is represents the independent variable
• The Y –axis represents the dependent variable
• In correlation model it is not important to know which
is the dependent and independent variable, while in
regression model this distinction is crucial.
• The closer the dots that represent pairs of observations
for study subjects to the regression line the stronger is
the linear correlation.
Patient No. Method I Method II
1 132 130
2 138 134
3 144 132
4 146 140
5 148 150
6 152 144
7 158 150
8 130 122
9 162 160

Systolic Blood Pressure 10 168 150


11 172 160
Readings (mmHg) by two 12 174 178
methods in 25 Patients with 13 180 168

Essential Hypertension 14 180 174


15 188 186
16 194 172
17 194 182
18 200 178
19 200 196
20 204 188
21 210 180
22 210 196
23 216 210
24 220 190
25 220 202
220

200
Method II

180

160

140

120

100
100 120 140 160 180 200 220
Method I

Systolic Blood pressure readings (mm Hg), 25 Patients with essential hypertension
SPSS Output
SPSS Output
Example-SPSS output

• The linear correlation coefficient is 0.955


• Its P value is <0.001.
• There is a statistically significant very strong
direct linear correlation between method 1 and
method 2 for measuring systolic blood pressure.
Simple Linear Regression

Y= a+ bx
Simple Linear Regression

• It is another way to quantify the strength of


association between 2 quantitative variables under
the assumption of normal distribution (Dose-
response relationship).

• The independent variable (x) is pre-selected and


called non-random or mathematical variable. For
each value of x there is a set of normally distributed
values of Y.
Simple Linear Regression
The least square line summarizes the relationship
between X and Y:
Y= a+ bx
a= intercept : the point where the line crosses the
vertical axis (i.e.: amount of Y when X= 0)
b=slope : amount by which Y changes for each
change in X
X=independent variable
Y=dependent variable
Simple Linear Regression
It is helpful in:
• Ascertaining the probable form of the relationship
between variables
• Predict or estimate the value of one variable
corresponding to a given value of another variable. If we
enter a specific value of X in the regression equation one
can predict the value of Y.
• Assessing the strength of association between 2
quantitative variables measured on interval/ratio scale.
The higher the value of b (regression coefficient) the
stronger is the effect of x on the value of Y. (stronger
dose-response linear relation).
• Power of prediction of the model: is measured by R2
(determination coefficient) which is equal to the square
value of r (linear correlation coefficient). It measures
the proportion of observed variation in the response
variable explained by the regression model.
• The least square method is used to estimate the 2
points needed to draw the regression line.
• The calculated regression coefficient (beta or slope) is
tested for statistical significance by t-test against the
null hypothesis of beta=0 at the population level.
• The overall regression equation is tested for statistical
significance by ANOVA. The model should be
statistically significant before we are able to generalize
the results to reference population.
Patient No. Method I Method II
1 132 130
2 138 134
3 144 132
4 146 140
5 148 150
6 152 144
7 158 150
8 130 122
9 162 160

Systolic Blood Pressure 10 168 150


11 172 160
Readings (mmHg) by two 12 174 178
methods in 25 Patients with 13 180 168

Essential Hypertension 14 180 174


15 188 186
16 194 172
17 194 182
18 200 178
19 200 196
20 204 188
21 210 180
22 210 196
23 216 210
24 220 190
25 220 202
SPSS output
SPSS output
SPSS output

The first table in SPSS output is descriptive, specifying


the independent and dependent variables.
SPSS output

Unstandardized B (beta) is the regression coefficient. For each 1


score increase in the new test the standardized test score is
expected to increase by an average of 1.1.
The calculated regression coefficient was statistically significant
(P<0.001).
Regression equation Y = -1 + 1.124 X
Y is the response (dependent) variable and X is the explanatory
(independent) variable.
SPSS output

R is the simple linear correlation coefficient.


R square is the determination coefficient.
The new test explains 91.4% of variation in the
dependent variable.
SPSS output

•ANOVA model tests the statistical significance of


the regression model.
•P value is <0.001, which tells that the regression
model is statistically significant..

You might also like