Lecture 5. Part 1 - Regression Analysis

- Intrinsically linear models use data transformations to linearize relationships between variables that are not truly linear. An example showed transforming hours and unit number data using natural logs to create linear relationships. - Logistic regression predicts probabilities of binary/dichotomous outcomes like disease presence/absence using continuous predictor variables. It expresses the log-odds as a linear combination of the predictors. - Kaplan-Meier survival analysis estimates the survival function from lifetime data and handles censored observations where the event was not observed for some subjects.

Uploaded by

Richelle Pausang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Lecture 5. Part 1 - Regression Analysis

Uploaded by

Richelle Pausang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Special Regression Topics

Lecture 5. STT153A
Overview
In this lecture, we are going to discuss the following:
• Intrinsically Linear Models
• Logistic Regression
• Kaplan-Meier Survival Analysis
Intrinsically Linear Models
Reference: Mislick, G. K., & Nussbaum, D. A. (2015). Cost estimation: Methods and
tools. John Wiley & Sons.
Intrinsically Linear Models
What if the relationship between our variables are NOT linear?

• We will be using data transformations to “linearize” the model

Example
Transform the data

• using calculator
(ln value)
• Excel formula
function
(= ln value)
ln (Hours) ln( Unit #)

4.09434 1.60944

3.80666 2.48491

3.46574 3.55535

3.21888 4.31749

3.04452 4.82831
Since there is only one independent variable use simple linear
regression for the transformed data

Unit
Hours Number ln (Hours) ln( Unit #) Enter the data in Statistica

60 5 4.09434 1.60944 Statistics -> Multiple Regression -> Dependent ln(Hours),

Independent ln(unit #) -> Ok> Summary Regression
45 12 3.80666 2.48491 Results

32 35 3.46574 3.55535

25 75 3.21888 4.31749

21 125 3.04452 4.82831

Get the regression model of the transformed data
Logistic Regression
Logistic regression
The basic difference between linear regression and logistic regression is
the dependent variable.
Linear regression dependent variable is continuous (ex. Ice cream sales,
GPA, Household Income)
Logistic regression dependent variable is dichotomous (two possible
outcome (ex. Have a disease=1 Not have a disease=0). This variable is
the probability of occurrence either 100% or 0%. Logistic regression
expresses values between 0 and 1.
This video is a good introduction to comparing linear regression and
logistic regression.
https://www.youtube.com/watch?v=C5268D9t9Ak (1 to 5 minutes)
or it can be expressed as

Log Likelihood
Encode the data in Statistica
Example 1. Statistics-> Advance Liner/Nonlinear Models ->Stepwise Model Builder->
Logistic Regression
Buys the
Select Variables -> Dependent, Continuous, Categorial -> Bad code (Yes),
Product Salary Age
Good code (No). This is because the model will predict the probability of the
Yes 1500 33 “bad code”, in this situation our interest is if the customer will the product.
No 1200 33
No 2200 34
No 2100 42
Yes 1500 29
Yes 1700 19
No 3000 50
No 3000 55
Yes 2800 31
Yes 2900 46
No 2750 36
No 2550 48
Yes 1200 24
-> Full Sample-> Add variables (select all variables from Marginal Results
Example 1. Table) -> Summary
Example 1

1
𝑃 𝑦 = 1 ∣ 𝑥1 , … , 𝑥𝑛 =
1 + 𝑒− 𝑏1 ⋅𝑥1 +⋯+𝑏𝑘 ⋅𝑥𝑘 +𝑎

1
𝑃 Buys the product = Yes ∣ Salary, 𝐴𝑔𝑒 =
1+𝑒 − 0.000777⋅𝑥1 −0.212893𝑥2
Example 1
1
𝑃 Buys the product = Yes ∣ Salary, 𝐴𝑔𝑒 =
1 + 𝑒− 0.000777⋅𝑥1 −0.212893𝑥2

Interpretation
Logistic regression analysis was performed to examine the influence of Age, and Salary on variable Buys the
Product to predict the value “Yes".
The coefficient of the variable Salary is 𝑏1 = 0.000777, which is positive. This means that an increase in Salary
is associated with an increase in the probability that the dependent variable is “Yes”. However, the p-value of
0.5837 indicates that this influence is not statistically significant. The odds Ratio of exp(0.000777 ) =
1.000777302 indicates that one unit increase of the variable Salary will increase the odds that the dependent
variable is “Yes" by 1.000777302 times.

The coefficient of the variable Age is 𝑏2 = −0.212893, which is negative. This means that an increase in Age is
associated with a decrease in the probability that the dependent variable is “Yes”. However, the p-value of
0.094680 indicates that this influence is not statistically significant. The odds Ratio of exp(0.094680)
=1.099307021 indicates that one unit increase of the variable Age will decrease the odds that the dependent
variable is “Yes" by 1.000777302 times.
Example 2.
Open Data CHD Logistic
Reference: https://www.kaggle.com/datasets/dileep070/heart-disease-
prediction-using-logistic-regression
Example 2.
Open Data CHD Logistic
Reference: https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression
Variables/ Demographic:
• Sex: male or female(Nominal)
• Age: Age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of
age is continuous)
Behavioral
• Current Smoker: whether or not the patient is a current smoker (Nominal)
• Cigs Per Day: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as
one can have any number of cigarettes, even half a cigarette.)
Medical( history)
• BP Meds: whether or not the patient was on blood pressure medication (Nominal)
• Prevalent Stroke: whether or not the patient had previously had a stroke (Nominal)
• Prevalent Hyp: whether or not the patient was hypertensive (Nominal)
• Diabetes: whether or not the patient had diabetes (Nominal)
Medical(current)
• Tot Chol: total cholesterol level (Continuous)
• Sys BP: systolic blood pressure (Continuous)
• Dia BP: diastolic blood pressure (Continuous)
• BMI: Body Mass Index (Continuous)
• Heart Rate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are
considered continuous because of large number of possible values.)
• Glucose: glucose level (Continuous)

Predict variable (desired target)

• 10 year risk of coronary heart disease CHD (binary: “1”, means “Yes”, “0” means “No”)
Example 2. Variables
Observe the continuous and categorical. Exclude education.
Bad Code = 1 ( Yes) 10 year risk of coronary heart disease CHD
Example 2.
Coefficient Odds Ratio
Variables Estimate (exp(coefficient))
Intercept -8.84912 0.000143508
age 0.06590 1.068116101
cigsPerDay 0.01923 1.01941152
totChol 0.00227 1.002274682
sysBP 0.01753 1.017688966
glucose 0.00728 1.00730684
male -0.28072 0.755237497
Scale 1.00000 2.718281828
Example 2.
Interpreting the results: Odds Ratio
• This fitted model shows that, holding all other features constant,
the odds of getting diagnosed with heart disease for males Coefficient Odds Ratio
Variables Estimate (exp(coefficient))
(sex_male = 1)over that of females (sex_male = 0) is exp(-0.28072) =
Intercept -8.84912 0.000143508
0.755237497. In terms of percent change, we can say that the odds
age 0.06590 1.068116101
for males are 100%-76%= 24% lower than the odds for females. cigsPerDay 0.01923 1.01941152
• The coefficient for age says that, holding all others constant, we totChol 0.00227 1.002274682
will see 7% increase in the odds of getting diagnosed with CDH for a sysBP 0.01753 1.017688966
one year increase in age since exp(0.06590) = 1.068116. glucose 0.00728 1.00730684
• Similarly , with every extra cigarette one smokes there is a 2% male -0.28072 0.755237497
increase in the odds of CDH. Scale 1.00000 2.718281828
• For Total cholesterol level and glucose level the change is too
small.
• There is a 1.7% increase in odds for every unit increase in systolic
Blood Pressure.
Conclusions
• Women seem to be more susceptible to heart disease than men. Increase
in age, number of cigarettes smoked per day, and systolic Blood Pressure also
show increasing odds of having heart disease.
• Total cholesterol shows no significant change in the odds of CHD. This could
be due to the presence of 'good cholesterol(HDL) in the total cholesterol
reading. Glucose too causes a very negligible change in odds (0.2%)

Note that this interpretation is based on the post in Kaggle. The data is not
clean. There were 648 cases with NA response which is not practical for
analysis. After cleaning the data the result for men and women is different. s

Method Statement Suspended Ceiling
100% (9)
Method Statement Suspended Ceiling
20 pages
Homework M2 Solution
No ratings yet
Homework M2 Solution
9 pages
STATA Training for staff
No ratings yet
STATA Training for staff
23 pages
Key Final Exam 2015 Weight Translate
No ratings yet
Key Final Exam 2015 Weight Translate
18 pages
Econ 113 Probset2 Sol
No ratings yet
Econ 113 Probset2 Sol
8 pages
E_AI_Lab_EX_2and_3
No ratings yet
E_AI_Lab_EX_2and_3
9 pages
Skill Lab Evidence Based Medicine (Ebm) : Lembar Jawaban
No ratings yet
Skill Lab Evidence Based Medicine (Ebm) : Lembar Jawaban
22 pages
True Regression Model: C - Logincome = Β + Β · B - Years Of Schooling + Β · D - Age + Β · E - Female + Β ·H - Smoker + Β · D - Age ·E - Female + Β · D - Age · H - Smoker + Ε
No ratings yet
True Regression Model: C - Logincome = Β + Β · B - Years Of Schooling + Β · D - Age + Β · E - Female + Β ·H - Smoker + Β · D - Age ·E - Female + Β · D - Age · H - Smoker + Ε
7 pages
Contoh Skill Lab EBM-1
No ratings yet
Contoh Skill Lab EBM-1
23 pages
ASSIGNMENT ON STATISTICAL PROCESS CONTROL Prod
No ratings yet
ASSIGNMENT ON STATISTICAL PROCESS CONTROL Prod
4 pages
HW 3 Solution
No ratings yet
HW 3 Solution
6 pages
GEE and Mixed Models For Longitudinal Data
No ratings yet
GEE and Mixed Models For Longitudinal Data
76 pages
Psych Stat 4
No ratings yet
Psych Stat 4
3 pages
Assignment 2.1
No ratings yet
Assignment 2.1
19 pages
Additional Review Problems With Solutions For The Final
No ratings yet
Additional Review Problems With Solutions For The Final
16 pages
Econ107 Assignment 1 Prep
No ratings yet
Econ107 Assignment 1 Prep
9 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
Summative Assignment: ECOS1101 Undergraduate Programmes 2008/09 Quantitative Methods
No ratings yet
Summative Assignment: ECOS1101 Undergraduate Programmes 2008/09 Quantitative Methods
23 pages
Chapter 7 Dynamic Econometric Models
No ratings yet
Chapter 7 Dynamic Econometric Models
15 pages
Kunal DS
No ratings yet
Kunal DS
92 pages
The T Test Prepared by B.saikiran (12NA1E0036)
No ratings yet
The T Test Prepared by B.saikiran (12NA1E0036)
14 pages
Weka Project1 Sajeena
No ratings yet
Weka Project1 Sajeena
14 pages
Mathematical Statistics With Applications Chapter1 Solution
No ratings yet
Mathematical Statistics With Applications Chapter1 Solution
7 pages
Decision Science - June - 2023
No ratings yet
Decision Science - June - 2023
8 pages
Final Prac So Ln
No ratings yet
Final Prac So Ln
26 pages
Standard Deviation Calculator
No ratings yet
Standard Deviation Calculator
1 page
Ttest
No ratings yet
Ttest
14 pages
Univariate and Multivariate Analysis - Jupyter Notebook
No ratings yet
Univariate and Multivariate Analysis - Jupyter Notebook
5 pages
Mathematical Statistics With Applications 7th Edition, Wackerly CH 1 Solution Manual
No ratings yet
Mathematical Statistics With Applications 7th Edition, Wackerly CH 1 Solution Manual
7 pages
Assignment Answers Decision Science
No ratings yet
Assignment Answers Decision Science
6 pages
Mathematical Model Related To Human Life Expectancy: Mr. M.Asick Ali And, Dr.P.S.Sehikuduman
No ratings yet
Mathematical Model Related To Human Life Expectancy: Mr. M.Asick Ali And, Dr.P.S.Sehikuduman
6 pages
Mann Whitney U Test
No ratings yet
Mann Whitney U Test
19 pages
1categorical Data Analysis (Chi Square) June 2022
No ratings yet
1categorical Data Analysis (Chi Square) June 2022
194 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Count Data 2012
No ratings yet
Count Data 2012
20 pages
ECON3334 Midterm Fall2022 Solution
No ratings yet
ECON3334 Midterm Fall2022 Solution
6 pages
Week 6 2-Sample Hypothesis Testing and CI Part4
No ratings yet
Week 6 2-Sample Hypothesis Testing and CI Part4
9 pages
The T Test Prepared by B.saikiran (12NA1E0036)
No ratings yet
The T Test Prepared by B.saikiran (12NA1E0036)
14 pages
Biostatistics Assignments
No ratings yet
Biostatistics Assignments
10 pages
Answered Sheets Combined
No ratings yet
Answered Sheets Combined
52 pages
Assignment_Hypothesis Testing (Part 2) Copy
No ratings yet
Assignment_Hypothesis Testing (Part 2) Copy
5 pages
IB PPT 11 SL Data PDF
No ratings yet
IB PPT 11 SL Data PDF
38 pages
Logistic Regression in Minitab
No ratings yet
Logistic Regression in Minitab
4 pages
Mod 5
No ratings yet
Mod 5
24 pages
Unit - I Chap-4 Model Evaluation and Development
No ratings yet
Unit - I Chap-4 Model Evaluation and Development
35 pages
10 - Matching - Causal Inference For The Brave and True
No ratings yet
10 - Matching - Causal Inference For The Brave and True
9 pages
LECTURE NOTE CHM 242 2021
No ratings yet
LECTURE NOTE CHM 242 2021
33 pages
Logit Models: Context/ Introduction
No ratings yet
Logit Models: Context/ Introduction
19 pages
Sample Solution
No ratings yet
Sample Solution
4 pages
0.1 Practice Problems For Mid Sem
No ratings yet
0.1 Practice Problems For Mid Sem
8 pages
0feaf24f-6a96-4279-97a1-86708e467593 (1)
No ratings yet
0feaf24f-6a96-4279-97a1-86708e467593 (1)
7 pages
Sample Size Merwyn
No ratings yet
Sample Size Merwyn
40 pages
Bill Bartlett - 1 1 1
No ratings yet
Bill Bartlett - 1 1 1
68 pages
LECTURE note 213
No ratings yet
LECTURE note 213
57 pages
PLCOall2014wQT Tammemagi CRT1DEC14 Locked
No ratings yet
PLCOall2014wQT Tammemagi CRT1DEC14 Locked
9 pages
Biostats Tutorial - Week 3 Worksheet 2024
No ratings yet
Biostats Tutorial - Week 3 Worksheet 2024
12 pages
Model Specification
No ratings yet
Model Specification
27 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Rvt & Veterinary Assistant Job Aid
From Everand
Rvt & Veterinary Assistant Job Aid
TaLethia RVT CBOM
No ratings yet
Labs & Imaging for Primary Eye Care: Optometry In Full Scope
From Everand
Labs & Imaging for Primary Eye Care: Optometry In Full Scope
John R Martinelli
No ratings yet
1) Epidemiology PPT 1
No ratings yet
1) Epidemiology PPT 1
22 pages
Master of Optometry (Syllabus) 2018
No ratings yet
Master of Optometry (Syllabus) 2018
26 pages
Grade 5 DLL MAPEH 5 Q3 Week 2
No ratings yet
Grade 5 DLL MAPEH 5 Q3 Week 2
3 pages
Human Behavior and The Interior Environment
No ratings yet
Human Behavior and The Interior Environment
31 pages
Prostitution: Prostitution, The Practice of Engaging in Relatively Indiscriminate Sexual Activity, in General With
No ratings yet
Prostitution: Prostitution, The Practice of Engaging in Relatively Indiscriminate Sexual Activity, in General With
2 pages
SDS - Nasiol PerShoes v1.0
No ratings yet
SDS - Nasiol PerShoes v1.0
10 pages
Ernest Hemingway
No ratings yet
Ernest Hemingway
7 pages
ECG Signal Final Project-1
No ratings yet
ECG Signal Final Project-1
9 pages
Chapter 02 - Cultural Diversity
No ratings yet
Chapter 02 - Cultural Diversity
6 pages
Perdev LQ 1 Reviewer
No ratings yet
Perdev LQ 1 Reviewer
6 pages
Spyder DT25 Elevated Working Plaftorm Exit Rev. 00
100% (1)
Spyder DT25 Elevated Working Plaftorm Exit Rev. 00
17 pages
Health and Safety Annual Report - Appendix 1
No ratings yet
Health and Safety Annual Report - Appendix 1
17 pages
Avocado FS Diabetes-1
No ratings yet
Avocado FS Diabetes-1
4 pages
How Do You Feel While Doing The Activity - Hanapin Sa Google
No ratings yet
How Do You Feel While Doing The Activity - Hanapin Sa Google
1 page
Panelist Thesis Questions
No ratings yet
Panelist Thesis Questions
16 pages
A Quiz For Coaches, Athletes, and Parents: Marl Each of The Following Statements As True (T) or False (F)
No ratings yet
A Quiz For Coaches, Athletes, and Parents: Marl Each of The Following Statements As True (T) or False (F)
1 page
Community Support Program: D Management S, B U, L C
No ratings yet
Community Support Program: D Management S, B U, L C
7 pages
Nvov Health Report and Sports Equipment
No ratings yet
Nvov Health Report and Sports Equipment
2 pages
Understanding Your Hepatitis B Blood Tests
No ratings yet
Understanding Your Hepatitis B Blood Tests
2 pages
Prevention of Hospital-Acquired Infections: A Practical Guide 2nd Edition
No ratings yet
Prevention of Hospital-Acquired Infections: A Practical Guide 2nd Edition
72 pages
Format for Writing Community Health Nursing Report
No ratings yet
Format for Writing Community Health Nursing Report
2 pages
Practice Unit 3 - Paola Delgado, Alejandro Lopez, Ada Valera and Luna Valera
No ratings yet
Practice Unit 3 - Paola Delgado, Alejandro Lopez, Ada Valera and Luna Valera
10 pages
Acute and Chronic Toxicity Testing With Daphnia SP.: Emilia Jonczyk
No ratings yet
Acute and Chronic Toxicity Testing With Daphnia SP.: Emilia Jonczyk
2 pages
In-W610 (Merit LMC 6 - Er70s-6) - 1
No ratings yet
In-W610 (Merit LMC 6 - Er70s-6) - 1
2 pages
Technical and Quality Standards and Practices in Healthcare Facilities
No ratings yet
Technical and Quality Standards and Practices in Healthcare Facilities
7 pages
Tests For Ten: Unit 5 - The Generation Gap Name: Date
No ratings yet
Tests For Ten: Unit 5 - The Generation Gap Name: Date
6 pages
APA Handbook of Clinical Psychology - Applications and Methods
100% (1)
APA Handbook of Clinical Psychology - Applications and Methods
15 pages
Tapendar Project
No ratings yet
Tapendar Project
73 pages
CV Royke Tony Kalalo
No ratings yet
CV Royke Tony Kalalo
6 pages