0% found this document useful (0 votes)

15 views49 pages

6 Continuous Data Analysis

The document provides an overview of continuous data analysis in biostatistics, focusing on methods such as t-tests, ANOVA, correlation, and linear regression. It explains the significance of these methods in comparing means, interpreting relationships between variables, and assessing model fit through various statistical measures. The document also outlines assumptions for linear regression and techniques for variable selection in multiple linear regression models.

Uploaded by

Abas Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views49 pages

6 Continuous Data Analysis

Uploaded by

Abas Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Basic Biostatistics

Haramaya University

Collage of Health and Medicine Sciences

School of Public Health

Continuous Data Analysis

By Adisu Birhanu (Assistant prof. of Biostatistics)

Feb 2025
Session Objectives

Describe Continuous variable and method of analyses

Describe relationship between continuous variables

Interpret the outputs from linear regression models

Analysis of Continuous Data
A continuous variable is one which can take on infinity many,
uncountable possible values in the range of real numbers.

Data analysis methods such as scatter plot, line graphs and

histogram are applicable for describing numerical data.

More advanced methods for inferential analysis of continuous data

include correlation, t-test, ANOVA and linear regression.
Comparison of the means
t-test is appropriate to compare two means from two
population

There are three different t-tests

One sample t-test

Two independent sample t-test

Paired sample t-test

ANOVA is used for IV with more than two groups

BY ADISU B.
One sample t-test

 It is used to compare the estimate of a sample with

a hypothesized population mean to see if the
sample mean is significantly different.

 there is one group being compared against a standard value

BY ADISU B.
Independent two sample t-test

Used to compare mean of two unrelated or independent

groups

the groups come from two different populations (e.g., two

different people from two separate cities).

Hypothesis: Ho: Mean of group 1 = Mean of group 2

HA: Mean of group 1 ≠ Mean of group 2 ,

BY ADISU B.
Example
Research question: To test if there is significant difference in
birth weight of male and female infant→ Independent t-test is
appropriate

BY ADISU B.
Interpretation

The 95% confidence interval for the difference of means

does not contain 0.

The p-value is less than 0.05

Hence, we conclude that there is significant difference in

birth weight among the male and female infants.

BY ADISU B.
Paired t- test
 Compare means if each observation in one sample has
one and only one pair in the other sample dependent
to each other.

 In this case the groups come from a single population (e.g.,

measuring before and after an experimental treatment), perform
a paired t test.

 Hypothesis: Ho: Mean difference = 0 Vs HA: Mean

difference ≠ 0

BY ADISU B.
One way ANOVA (Analysis of Variance)
For two normal distributions the two sample means are
compared by t-test.

The means of more than two distributions need to be

compared.

BY ADISU B.
One way ANOVA…
The t-test methodology generalizes to the one-way analysis
of variance (ANOVA) for categorical variables with more
than two categories.

ANOVA do not tell you which group is different, but only

whether a difference exists.

To know which group is different, we used post hoc tests

(bonferroni, Tuckey, scheffe).

BY ADISU B.
One way ANOVA…

For K means (K> 3).

Ho : µ1 = µ2 = : : : =µ k ,

HA : at least one of the means is different.

There is one factor of grouping (one way ANOVA)

BY ADISU B.
One way ANOVA…

Consider infant data: Outcome variable: birth

weight

Factor variable: residence (urban= 1, semi-urban=

2, rural=3)

Objective: compare weight among the three place

We can conclude that at least one of the groups' means differ

on body weight.

Now the question is: which groups are different?

Answering this question requires multiple comparisons (post

hoc tests).

Bonferroni, Tukey and scheffe are commonly used methods.

Bonferroni method corrects probability of Type I error for the

BY ADISU B.
Interpretation;
All pairs of comparison are statistically significant at 0.05
level:
urban versus semi-urban, urban versus rural, semi-urban
versus rural.
STATA CODE: oneway weight place, bonferroni

BY ADISU B.
Correlation

Correlation is used to quantify the degree to which two

continuous random variables are related,
Common correlation measure
Pearson Correlation Coefficient: for linear relationship
between two variables
Scatterplot
Helpful tool in exploring relationship between two variables
 If No relationship between proposed explanatory and dependent
variables
 Then fitting a linear regression model to data probably will
not provide a useful model
 Before attempting to fit a linear model to observed data, a
modeler should first determine whether or not there is a
relationship between the variables of interest
 This does not necessarily imply that one variable causes the
other, but that there is some significant association between the
two variables
Scatter plot and correlation of two data
Scatter plot for age and CD4 count of
patients
The scatter plot of CD4 count versus age of patient
Correlation coefficient
A valuable numerical measure of relationship between
two variables
A value between -1 and 1 indicating the strength of the
linear relationship for two variables
 Population correlation coefficient ρ (rho) measures the
strength of linear relationship between two variables
 Sample correlation coefficient, r, is an estimate of ρ and is used
to measure the strength of the linear relationship in the
sample observations.
Correlation coefficient
Basic features of sample and population correlation
are:
 It is unit free, It range between -1 and 1

 The closer to -1, the stronger the negative linear relationship

 The closer to 1, the stronger the positive linear relationship

 The closer to 0, the weaker the linear relationship

Coefficient of determination/R
squared
Coefficient of determination is the measure of strength of the
model
Variation in dependent variable is split into two parts as
Variation in y = SSE + SSR
Sum of Squares Error (SSE):
 Measures amount of variation in y that remains unexplained
(i.e. due to error)
Sum of Squares Regression (SSR) :
 Measures amount of variation in y explained by variation in
the independent variable x
Coefficient of determination…
 Coefficient of determination does not have a critical value that enables
us to draw conclusions
 Higher the value of R squared, the better the model fits the data
 If R2= 1, it implies Perfect match between the line and the data points
 If R2=0 then it implies there are no linear relationship between x and y
 Quantitative measure of how well the independent variables account for
the outcome
 When R2 is multiplied by 100 it can be thought of as the percentage of the
variance in the dependent variable explained by the independent
variables
Linear Regression
We frequently measure two or more variables on the same individual
to explore the nature of the relationship among these variables.
Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent and independent
variable.
Questions to be answered
What is the relationship between Y and X?
How can changes in Y be explained by changes in X?
Linear regression (#2)
Linear regression attempts to model the relationship
between two variables by fitting a linear equation to
observed data
 Explanatory variable (X): can be any types of variables

 Dependent variable: Y

 Dependent variable for linear regression should be

numeric (continuous)
Linear regression (#3)
Goal of linear regression is to find the line that best
predicts dependent variable from independent variables

 Linear regression does this by finding the line that

minimizes the sum of the squares of the vertical distances
of the points from the line
How linear regression works?
Least-squares methods (OLS)
 Calculates the best-fitting line for the observed data by
minimizing the sum of the squares of the vertical deviations from
each data point to the line
 If a point lies on the fitted line exactly, then its vertical deviation is 0

 Goal of regression is to minimize the sum of the squares of the

vertical distances of the points from the line
Linear Regression Model
To understand linear regression, therefore, you must
understand the model
Y = intercept + slope *X =a + β *X+ ε
When X equals 0 the equation calculates that Y equals a
The slope, β, is the change in Y for every unit change in X
Epsilon (ε) represents random variability
The simplest way to express the dependence of the expected
response Yi on the predictor xi is to assume that it is a linear function, say

Constant or intercept:
 Parameter represents the expected response when xi =0

 Slope
 Parameter represents the expected increment in the response per
unit change in xi
 Note: Both α and β are population parameters which are usually
unknown and hence estimated from the data by a and b
Assumptions of linear regression
Linearity :- Relationship between independent and dependent variable is
linear
 To check this assumptions we draw a scatter plot of residuals and y
values
 If the scatter plot follows a linear pattern (i.e. not a curvilinear pattern)
that shows that linearity assumption is met
Linear Regression Assumptions
Normality (Normally Distributed Error Terms): - Error terms follow
the normal distribution. We can use `qnorm' and `pnorm' to check
the normality of the residuals.

Shapirowilk test can also be used

Homoscedasticity of Residuals
Homoscedasticity: - Variance of the error terms is constant.

Is about homogeneity of variance of the residuals.

If the model is well-fitted, there should be no pattern to the

residuals plotted against the fitted values.

If the variance of the residuals is non-constant. it is heteroscedastic.

Homoscedasticity …
The Breusch-Pagan test is used.
p-value < 0.05, reject the hypothesis that states that
variance is homogenous.
Multicollinearity
When there is a perfect linear relationship among the
predictors, the estimates cannot be uniquely computed.
The term collinearity implies that two variables are near perfect
linear combinations of one another.
The regression model estimates of the coefficients become
unstable.
The standard errors for the coefficients can get wildly inflated.
We can use the vif or tolerance to check for multicollinearity.
Multicollinearity…
As a rule of thumb, a variable whose VIF are greater than 5
may need further investigation.

Tolerance, defined as 1/VIF, is used by many researchers to

check on the degree of collinearity.
Multiple Linear Regression

Simple linear regression can be extended to multiple linear

regression models
Two or more independent variables which could be categorical
or continuous
 Response variable to be a function of k explanatory
variables x1; x2; : : : ; xk
Its purposes are mainly:
 Prediction, explanation
 Adjusting effects of confounders
Multiple Linear Regression

Best fitting model

 Minimizes sum of squared residual

 Residuals are deviations between observed response variables
and values predicted by fitted model
 Smaller residuals, closer the fitted line

 Note that residuals i are given by:

Coefficient in multiple linear regressions

beta coefficient measures amount of increase or decrease in

dependent variable for a one-unit difference in continuous
independent variable
If an independent variable has a nominal scale with more
than two categories
 Dummy variables are needed
 Each dummy should be considered as an independent
variable
Assumptions: Specification of model (model building)

Strategies to identify a subset of variables:

Option 1: Variable selection based on significance in

univariable models (simple linear regression):
 All variables that show a significant effect in uni-variable
models are included
 Variable with a p-value of less than 0.25 is taken to MLR
model
Option 2: Variable selection based on significance in
multivariable model:
 Backward

 stepwise

 forward selection
Backward/stepwise/forward selection
Backward selection:
 All variables will be entered in the model
 Then remove step by step until significantly contributing
variables are left in model
 Least contributing variable will be removed first
 Then second least contributor will be removed and so on
Forward selection:
 Model starts with empty (null model)
 Then most significantly contributing variable will enter first
 This continuous step by step until only significantly
contributing variables enter in the model
Stepwise selection
 Same as forward selection

 Even if a variable is included in the model its contribution

will be tested after inclusion of other variable/s
 Variables are added but can subsequently be removed if
they no longer contribute to the prediction
Option 3: Variable selection based on subject matter
knowledge:
 Best way to select variables, as it is not data-driven and it is
therefore considered as yielding unbiased results
Practical session for Multiple linear
regression using STATA
Thank you!!

CHAPTER 15_250310_105408
No ratings yet
CHAPTER 15_250310_105408
4 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
02 NOF Dam Section
No ratings yet
02 NOF Dam Section
26 pages
PARAMETRIC-TEST
No ratings yet
PARAMETRIC-TEST
49 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Predective Analytics or Inferential Statistics
No ratings yet
Predective Analytics or Inferential Statistics
27 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Ra Web
No ratings yet
Ra Web
70 pages
Multivariate Analysis Spss Operation and Application: Student Name: Deniz Yilmaz Student Number: M0987107
No ratings yet
Multivariate Analysis Spss Operation and Application: Student Name: Deniz Yilmaz Student Number: M0987107
27 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Chapter XI Correlation and Regression
No ratings yet
Chapter XI Correlation and Regression
41 pages
QT _Unit 2_Part B - Regression
No ratings yet
QT _Unit 2_Part B - Regression
40 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Topic - chapter 12 - Regression models
No ratings yet
Topic - chapter 12 - Regression models
1 page
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Correlation
100% (1)
Correlation
29 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
14 - Regresi dan Korelasi
No ratings yet
14 - Regresi dan Korelasi
34 pages
Chapter 17 Correlation Regression
No ratings yet
Chapter 17 Correlation Regression
42 pages
Business Research Methods: Bivariate Analysis: Measures of Associations
No ratings yet
Business Research Methods: Bivariate Analysis: Measures of Associations
66 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
32 pages
Simple Linear Regression (1)
No ratings yet
Simple Linear Regression (1)
83 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Business Research Methods: Bivariate Analysis: Measures of Associations
No ratings yet
Business Research Methods: Bivariate Analysis: Measures of Associations
66 pages
Linear Regression Analysis_1
No ratings yet
Linear Regression Analysis_1
18 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Stats10_Chapter+4 2
No ratings yet
Stats10_Chapter+4 2
54 pages
REGRESSION ANALYSIS
No ratings yet
REGRESSION ANALYSIS
6 pages
Cha 6
No ratings yet
Cha 6
8 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
1Introduction and descriptive stats
No ratings yet
1Introduction and descriptive stats
134 pages
1 Data Mngt ch 1,2,3
No ratings yet
1 Data Mngt ch 1,2,3
28 pages
4 Inferentials
No ratings yet
4 Inferentials
53 pages
Manuscript_publication (1)
No ratings yet
Manuscript_publication (1)
13 pages
Fistula
No ratings yet
Fistula
22 pages
Unit One - or
No ratings yet
Unit One - or
28 pages
Diaphram 101043
No ratings yet
Diaphram 101043
1 page
Minor Disorders of Pregnancy
No ratings yet
Minor Disorders of Pregnancy
22 pages
IUCD Insertion and Removal
No ratings yet
IUCD Insertion and Removal
2 pages
Contraceptive
No ratings yet
Contraceptive
50 pages
Fetal Skull
No ratings yet
Fetal Skull
36 pages
Analgsc
No ratings yet
Analgsc
29 pages
Contraceptive Assignment Adihailu
No ratings yet
Contraceptive Assignment Adihailu
46 pages
HYDATIDIFORM MOLE Session 4
No ratings yet
HYDATIDIFORM MOLE Session 4
12 pages
Contraceptive
No ratings yet
Contraceptive
53 pages
Pedi (1) - 043059
No ratings yet
Pedi (1) - 043059
129 pages
Congestive Heart Failure File
No ratings yet
Congestive Heart Failure File
31 pages
Directing in Nursing
100% (1)
Directing in Nursing
24 pages
Betty Neuman Model
100% (1)
Betty Neuman Model
28 pages
Curriculum Tesfaye
No ratings yet
Curriculum Tesfaye
77 pages
Motivation Behaviour
No ratings yet
Motivation Behaviour
41 pages
Faya G.Abdellah's Nursing Theory
No ratings yet
Faya G.Abdellah's Nursing Theory
34 pages
Controlling in Nursing
No ratings yet
Controlling in Nursing
35 pages
APPLIED STATISTICS AND PROBABILITY - Assignment2
No ratings yet
APPLIED STATISTICS AND PROBABILITY - Assignment2
1 page
Eng - Common Punctuation Marks
No ratings yet
Eng - Common Punctuation Marks
21 pages
1 ABC Costing
No ratings yet
1 ABC Costing
14 pages
Altering Consciousness To Achieve Non-Local Awareness
No ratings yet
Altering Consciousness To Achieve Non-Local Awareness
48 pages
Pspice Analysis of DC Circuits
No ratings yet
Pspice Analysis of DC Circuits
4 pages
Thermodynamics .Ravi
No ratings yet
Thermodynamics .Ravi
16 pages
Physics XII Concept Maps PDF
100% (1)
Physics XII Concept Maps PDF
15 pages
Stepph Curry RoboticsFinalProjectReport
No ratings yet
Stepph Curry RoboticsFinalProjectReport
5 pages
7 Es Lesson Planning
No ratings yet
7 Es Lesson Planning
9 pages
Prediction of Side Weir Discharge Coefficient by Genetic Programming Technique
No ratings yet
Prediction of Side Weir Discharge Coefficient by Genetic Programming Technique
10 pages
Bird Strike Novel Design Fan Blades Husainie PDF
No ratings yet
Bird Strike Novel Design Fan Blades Husainie PDF
16 pages
Patterns and Numbers in Nature and The World: Worksheet
100% (1)
Patterns and Numbers in Nature and The World: Worksheet
3 pages
Synthesis and Implementation of Vibration Suppression by 6 DOF Active Platform
No ratings yet
Synthesis and Implementation of Vibration Suppression by 6 DOF Active Platform
6 pages
Probability Pair Dice
No ratings yet
Probability Pair Dice
2 pages
Xtgls - Fit Panel-Data Models by Using GLS
No ratings yet
Xtgls - Fit Panel-Data Models by Using GLS
11 pages
2 - Modeling API Introduction
No ratings yet
2 - Modeling API Introduction
47 pages
4040_w13_qp_22
No ratings yet
4040_w13_qp_22
20 pages
Scheme of Studies BS Computer Science PDF
No ratings yet
Scheme of Studies BS Computer Science PDF
66 pages
What Is Artificial Intelligence?: John Mccarthy, Stanford University
No ratings yet
What Is Artificial Intelligence?: John Mccarthy, Stanford University
35 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Experiment 1- Orifice meter (1)
No ratings yet
Experiment 1- Orifice meter (1)
7 pages
A Comparative Analysis of Standard Labour Outputs of Selected Building Operations in Nigeria
No ratings yet
A Comparative Analysis of Standard Labour Outputs of Selected Building Operations in Nigeria
7 pages
Chapter 3. Parametric Curves.2016-2
No ratings yet
Chapter 3. Parametric Curves.2016-2
25 pages
Download Full Soft Sets: Theory and Applications Sunil Jacob John PDF All Chapters
100% (1)
Download Full Soft Sets: Theory and Applications Sunil Jacob John PDF All Chapters
65 pages
Moles and Molar Mass
No ratings yet
Moles and Molar Mass
27 pages
18. Deep Foundations (d)(1)
No ratings yet
18. Deep Foundations (d)(1)
49 pages
ESD Assignment
No ratings yet
ESD Assignment
14 pages
Hypothesis Testing in Concrete Nptel
No ratings yet
Hypothesis Testing in Concrete Nptel
43 pages

6 Continuous Data Analysis

Uploaded by

6 Continuous Data Analysis

Uploaded by

Basic Biostatistics

Collage of Health and Medicine Sciences

School of Public Health

Continuous Data Analysis

By Adisu Birhanu (Assistant prof. of Biostatistics)

Describe Continuous variable and method of analyses

Describe relationship between continuous variables

Interpret the outputs from linear regression models

Data analysis methods such as scatter plot, line graphs and

More advanced methods for inferential analysis of continuous data

There are three different t-tests

Two independent sample t-test

Paired sample t-test

ANOVA is used for IV with more than two groups

 It is used to compare the estimate of a sample with

 there is one group being compared against a standard value

Used to compare mean of two unrelated or independent

the groups come from two different populations (e.g., two

Hypothesis: Ho: Mean of group 1 = Mean of group 2

HA: Mean of group 1 ≠ Mean of group 2 ,

The 95% confidence interval for the difference of means

The p-value is less than 0.05

Hence, we conclude that there is significant difference in

 In this case the groups come from a single population (e.g.,

 Hypothesis: Ho: Mean difference = 0 Vs HA: Mean

The means of more than two distributions need to be

ANOVA do not tell you which group is different, but only

To know which group is different, we used post hoc tests

For K means (K> 3).

HA : at least one of the means is different.

There is one factor of grouping (one way ANOVA)

Consider infant data: Outcome variable: birth

Factor variable: residence (urban= 1, semi-urban=

Objective: compare weight among the three place

We can conclude that at least one of the groups' means differ

Now the question is: which groups are different?

Answering this question requires multiple comparisons (post

Bonferroni, Tukey and scheffe are commonly used methods.

Bonferroni method corrects probability of Type I error for the

Correlation is used to quantify the degree to which two

 The closer to -1, the stronger the negative linear relationship

 The closer to 1, the stronger the positive linear relationship

 The closer to 0, the weaker the linear relationship

 Dependent variable for linear regression should be

 Linear regression does this by finding the line that

 Goal of regression is to minimize the sum of the squares of the

Shapirowilk test can also be used

Is about homogeneity of variance of the residuals.

If the model is well-fitted, there should be no pattern to the

If the variance of the residuals is non-constant. it is heteroscedastic.

Tolerance, defined as 1/VIF, is used by many researchers to

Simple linear regression can be extended to multiple linear

Best fitting model

 Minimizes sum of squared residual

 Note that residuals i are given by:

beta coefficient measures amount of increase or decrease in

Strategies to identify a subset of variables:

Option 1: Variable selection based on significance in

 Even if a variable is included in the model its contribution

You might also like