0% found this document useful (0 votes)
2 views

Lecture 4 Correlation and regression

The document provides an overview of correlation and regression analysis in biostatistics, focusing on the relationship between continuous variables. It explains correlation coefficients, scatter plots, and the types of regression analysis, including linear and curvilinear relationships. Additionally, it discusses the interpretation of regression results and the significance of dependent and independent variables.

Uploaded by

danuberh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 4 Correlation and regression

The document provides an overview of correlation and regression analysis in biostatistics, focusing on the relationship between continuous variables. It explains correlation coefficients, scatter plots, and the types of regression analysis, including linear and curvilinear relationships. Additionally, it discusses the interpretation of regression results and the significance of dependent and independent variables.

Uploaded by

danuberh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Aksum University

College of Health Science and Referral Hospital


Departments of Public health
Course Name: Biostatistics

Correlation and Regression

Negasi A. (Assistance Prof. in Biostatistics)


1
CORRELATION ANALYSIS
 Correlation is the method of analysis to use when

studying the possible association between two continuous

variables.
 Correlation analysis is used to measure strength of the
association (linear relationship or straight line
association ) between two variables

 The standard method (Pearson correlation ) leads to a


quantity called r that can take on any value from -1 to
2
+1.
CORRELATION ANALYSIS
 The correlation between two variables is positive if
higher values of one variable are associated with higher
values of the other and negative if one variable tends to
be lower as the other gets higher.

 A correlation of around zero indicates that there is no


linear relation between the values of the two variables
(i.e. they are uncorrelated).
3
CORRELATION ANALYSIS

 It is important to note that a correlation between two


variables shows that they are associated but does not
necessarily imply a ‘cause and effect’
effect’ relationship.
 We consider the two variables equally

 In essence r is a measure of the scatter of the points

around an underlying linear trend:: the greater


the spread of the points the lower
the correlation. 4
SCATTER PLOT
 A Scatter Plot is a graph of the order pairs (x, y) of numbers consisting
of the independent variable X (on the horizontal axis) and the
dependent variable, Y (on the vertical axis) .
 Scatter plots are especially useful when you are examining the
relationship/association between two or more continuous variables using
statistical techniques such as correlation or regression.
 That is, they help you to understand how well linear regression fits your
data and how, positively, negatively or neither, the two variables are
related.

 Before trying to calculate r or fit any model, it is


better to see its scatter plot 5
TYPES OF SCATTER PLOT
PLOTS
S

(a).. Simple scatter plot


(a)
The Simple scatter plot graphs the relationship between
two quantitative variables
(b)matrix scatter plot
 allows you to see the relationship between all
combinations of many different pairs of variables.
 Therefore, a variable is plotted with every other
variable to visualize this relationship between two or
more variables.
 Every combination is plotted twice so that each
variable appears on both the X and Y axis. 6
7
8
9
Examples of Matrix scatter plot
Example:- Construct a Matrix scatter plot with four
Example:
variables, age, weight loss, weight before treatment,
weight after treatment

10
CORRELATION COEFFICIENT
 The population correlation coefficient ρ (rho) measures
the strength of the association between the variables

 The sample correlation coefficient r is an estimate of ρ


and is used to measure the strength of the linear
relationship in the sample observations

11
Features of ρ and r
 Unit free
 Range between -1 and 1
 The closer to -1, the stronger the negative linear relationship
 Example: Depression & Self-
Self-esteem , Studying & test errors

GPA and Average TV watching time

 The closer to 1, the stronger the positive linear relationship

 Example: GPA and Studying time

-Smoking and Lung Damage

-Performance Evaluation and Sociability

 The closer to 0, the weaker the linear relationship

 Example : GPA and Shoe Size 12


Examples of Approximate r Values

13
EXAMPLE
o Systolic Blood Pressure against Age

14
Calculating the Correlation Coefficient

15
16
17
18
19
20
Interpretation of correlation

 A very small correlation does not necessarily indicate


that two variables are not associated

 However, no linear association

 To be sure of this we should study a scatter plot of


data, because it is possible that the two variables
display a non-linear relationship (for example cyclical
or curved). 21
EXERCISE-1:
In an experimental design to determine the relationship between
the Risk factor, X and the Outcome, Y. The following sets of
data are obtained.

 What type relationship do you observe between x and y? Is an


increase in x followed by an increase in y?

22
EXCERCISE-2:
 The following data are records of birth weight in kg(x10) current
average income(x1000) and years of college education completed
by mothers for a simple random sample of 10 births occurring in a
single hospital in one month.

 Then, produce a scatter plot birth weight on college education


completed by mothers.

23
INTRODUCTION TO REGRESSION ANALYSIS
What is Regression Analysis?
Regression analysis is a tool/
tool/technique
technique in biostatistics for :-
 the investigation and modeling of relationships between
variables, one known as dependent and the remain known as
independent..
independent

 studying of the dependence of one variable on one or more other


variables.. In other words, we can use it for examining the
variables
relationship that may exist among certain variables
variables..
 Often, the dependent variable is denoted by y and the
independent variables by x1, x2, x3, - --
--,,xk

24
CLASSIFICATION OF VARIABLE IN REGRESSION

There are two types of variables in regression analysis


analysis..
 Those variables are often classified as y`s and x`s the
y- variable is called
 Dependent Variable,
 Outcome variable
 Output variable,
 Response Variable
 Target Variable,

25
and the x-variable is, however, known as
 Independent Variable
 Risk factor
 Explanatory Variable
 Predictor
 Input variable,

 Note that
that:: we have always one dependent variable , but
for this dependent variable, we can have one or more
independent variables

26
Example: We may employ regression technique/analysis
(1). To study how heart rate affected by or depends up on :-
 Emotional Stress level

 Illness  when the body immune system becomes compromised


eg.. injury, anemia
eg
 Exercise
In this case,

 heart rate is the Dependent Variable

 Exercise, Illness, Stress ….. Is the Independent Variable

27
Types of Regression Analysis
The relationship existing among regression variables could be:
(i). Linear or straight line relationship:
It is one in which the relationship between X and Y can best be
represented by a straight line.
line.
 Example:

 (ii). Curvilinear relationship:


relationship:
 A curvilinear relationship is one in which the relationship between X
and Y can best be represented by a curved line (such as quadratic,
cubic, polynomial, etc)

28
 Linear regression analysis is, therefore, a statistical technique
used to examine the linear relationship that can exist between
two groups of variables,
variables, one dependent and the other independent.
It can be classified as:
(a).. Simple Linear Regression Analysis
(a)
It is a regression between two variables, one is dependent and the
other one is independent
independent.. The nondeterministic model used in this
regard is given by

yi   0  1 x1   i
Simple
regression

One dependent
Onedependent One
29
variable independent
variable
30
31
32
33
34
35
36
Example: Simple Linear Regression
 A researcher wishes to examine the relationship
between the amount of the daily average diets taken by
a cohort of 24 sample children and the weight gained
by them in one month (both measured in kg). The
content of the food is the same for all of them.

 Dependent variable (y) = weight gained in one month


measured in kilogram

 Independent variable (x) = average weight of diet taken


per day by a child measured in Kilogram
37
Sample Data for child weight Model

38
REGRESSION RESULTS USING SPSS

39
Interpretation of the Intercept, b0

40
Interpretation of the Slope Coefficient, b1

41
Least Squares Regression Properties

42
Explained and Unexplained Variation

43
Explained and Unexplained

44
Explained and Unexplained

45
Coefficient of Determination, R2

46
The Standard Deviation of the Regression Slope

47
Inference about the Slope :t Test

48
Inferences about the Slope: t-Test
Example

49
Confidence Interval estimation

50

You might also like