Module 4 - Logistic Regression - Afterclass1b
Module 4 - Logistic Regression - Afterclass1b
1
Ask the experts!
2
Experts are human
§ Experts are limited by memory and time
3
Replicating expert assessment
§ Can we develop analytical tools that replicate expert assessment on a large
scale?
4
Claims data
§ Electronically available
§ Standardized, well-established codes
§ Not 100% accurate (human generated)
§ Under-reporting is common (since the job of recording is tedious)
§ When the hospitals/doctors are evaluated according to patients’
experiences, they may misreport
§ Claims for hospital visits can be vague
5
Creating the dataset: Claims samples
Claims
Sample
6
Creating the dataset: Expert Review
Claims Expert
Sample Review
7
Creating the dataset: Expert Assessment
8
Creating the dataset: Variable extraction
9
Creating the dataset: Variable extraction
§ Dependent Variable
– Quality of care
§ Independent Variables
– Diabetes treatment
– Patient demographics
– Healthcare utilization
– Providers
– Claims
– Prescriptions
10
Data preview
Variable Description
MemberID A unique identifier for each observation.
ERVisits The number of times the patient visited the emergency room.
OfficeVisits The number of times the patient visited any doctor’s office
Narcotics The number of prescriptions the patient had for narcotics
ProviderCount The number of providers that saw or treated the patient.
NumberClaims The total number of medical claims the patient had.
StartedOnCom Whether or not the patient was started on a combination of drugs to treat their
bination diabetes
PoorCare Whether or not the patient received poor care (1 if the patient had poor care, 0
otherwise)
11
Predicting quality of care
§ How can we extend the idea of linear regression to situations where the
outcome variable is categorical?
– Only want to predict 1 or 0
– Could round outcome to 1 or 0
– But we can do better with logistic regression
12
Quick Question
Y
–
– The winner of an election with two candidates binary
The day of the week with the highest revenue
1
X
– binary
– The number of daily car thefts in New York City N
13
Quick Question
14
Why linear regression fails?
15
Why linear regression fails?
16
Logistic function
G( -
z) =
0
G(z) =
1
1
&(!) =
1 +0* #$ +
o
=
I 8
! = #! + #" %
↑
⑰x
-
O
office visits
30 + ,
x
,
+
zx+ - -
S
1
1 + * #(&!'&"()
17
Interpretation
D
1
+(, = 1|%) =
1 + * #(&!'&"()
↑ -en
. /=01 +. /=21 =1
-
-
18
P(Y 11X)
=
#espots
Logistic Regression Model
-
nee
0 0
M
. /=0< =
-
-
0 + exp −logit
=
0 + exp − =+ + =, <
* ,=13 -
(Bo+ Xi + Xt 3
§ Odds =
. . . .
- * ,=03
1st
– Odds > 1 if Y= 1 is more likely
– Odds < 1 if Y= 0 is more likely
en =
Potpx
. /=0<
&logit = ln
- -
. /=2<
=E
=+ + =, 3 I
PxX
-nee
– Interpretation: For each unit increase in X, Logit will increase by &! (holding
everything else constant in the multivariate regression).
-
19
Decision Boundary (Prediction)
⑳
1
+ ,=1% = ≥ 0.5
-
-
1 + * # &!'&"(
Predict y
=
1 if P(Y 11) 20 5
= .
p(1/x) =+3)
↳ One-variable
Equivalently, predict “y=1” if ⑭ Y 0 Y 1 Predict
Y 1 if Ply 1x) <
=
=
>
=
>
#! + #" % ≥ 0 0 5
>
PredictY=l if
.
,>D 3: 0
-
-
- -
8
#! + #" %" + #- %- ≥ 0
in
-
0
20
Quick Question
And we have an observation with the following values for the independent
variables:
%" = 1, %- = 5
-
– What is the value of the Logit for this observation?
– What is the value of the Odds for this observation?
– What is the value of P(y=1) for this observation?
= .
+ .
0 36
expl-1)
=
Odds
exp(Logit)
.
= =
27
21 P(Y
=
1) =
-Logit =
0 .
Estimation of parameters (Not required)
min SSE (bo .
b ... -
(
How can we estimate H! , H" ? ↑ ↑
-
JK. = H! + H" %. . -
"
• Logistic regression: JK. = &(H! + H" %. ) where &(!) = #$
-
"'1
22
Maximum likelihood estimation (Not required)
-
Under what values of H! , H" , the likelihood of obtaining our sample is the
highest?
TP(y
bo bi -
=
1Px >
, ,
23
Data preview
Variable Description
MemberID A unique identifier for each observation.
ERVisits The number of times the patient visited the emergency room.
OfficeVisits The number of times the patient visited any doctor’s office
Narcotics The number of prescriptions the patient had for narcotics
ProviderCount The number of providers that saw or treated the patient.
NumberClaims The total number of medical claims the patient had.
StartedOnCom Whether or not the patient was started on a combination of drugs to treat their
bination diabetes
PoorCare Whether or not the patient received poor care (1 if the patient had poor care, 0
otherwise)
24
Data preview
§ Frequency table: count how many cases of poor care and good care
§ Baseline model: predict the most frequent outcome for all observations
– Predict all patients are receiving good care
– Accuracy: 98/131 = 74.8%
25
Create training and testing sets
§ Training dataset: used to build model
§ Testing dataset: used to test the model’s out-of-sample accuracy
§ If there is no chronological order to the observations, we randomly assign
observations to the training set or testing set.
§ Training data set (75% of the data)
26
&
Build a logistic regression model 131 -If
§ Used data for 99 patients to build the model (75% of the data)
27
Build a logistic regression model
§ Used data for 99 patients to build the model (75% of the data)
Logit = −2.646 + 0.082∗OfficeVisits + 0.076∗NarcoUcs
§
(0)?
-
Are higher values in these variables indicative of poor care (1) or good care
§ Now that we have a model, how do we evaluate the quality of the model?
logit
=
Int5] = +
P
.
x, + exz
NY Logit
.
Pr 0 .
+ x) =
28
Build a logistic regression model
Estimated Coefficients Standard Errors of
Logit = -2.646+0.082*OfficeVisits + Coefficients
0.076*Narcotics H0: Coefficient = 0 versus HA: Coefficient ≠ 0
zstat=(Estimated Coefficient – 0)/std.error
29
Logit =
BotBiX +X
, :
elgit=odds. Keep
:
x2 ,
x1 +
Xi+ 1
↓
30 /x , +1) 2xz
e
Cit
+ +
office visits
,
-P2X +
Estimated Coefficients -
e ,
Logit = -2.646+0.082*OfficeVisits +
0.076*Narcotics
->
Exp(0.082) = 1.085 =
85 .
% increase
Odds = exp(Logits)
32
Classification of the prediction
33
Threshold value
§ Often selected based on which errors are “worse”
34
Selecting a threshold value
§ Confusion/Classification matrix
Predicted = 0 Predicted = 1
Actual = 0 True Negatives(TN) False Positives(FP)
35
Selecting a threshold value
§ A different threshold value changes the types of errors
§ Quantify this trade-off using
– true positive rate (or sensitivity)
– true negative rate (or specificity)
§ True positive rate measures percentage of actual y=1 cases that we classify
correctly
§ True negative rate measures percentage of actual y=0 cases that we classify
correctly
36
Selecting a threshold value
O
§ Threshold ⇑
– Predicted positive cases ⇓
TP
-
True positive rate ⇓ - rate=
Actual 1
q
#
.
§ Threshold ⇓
– Predicted positive cases ⇑
q True positive rate ⇑
– Predicted negative cases ⇓
q True negative rate ⇓
36
Prediction results
§ Threshold = 0.5
Predicted = 0 Predicted = 1
Actual = 0 70 4 -> 74
Actual = 1 15
I ①TP
10 - 15
-
.
10915
- -
37
Prediction results
§ Threshold = 0.7
Predicted = 0 Predicted = 1
Actual = 0 73 1
Actual = 1 17 8
§ Threshold = 0.2
Predicted = 0 Predicted = 1
Actual = 0 54 20
Actual = 1 9 16
00
Predicted = 0 Predicted = 1
Actual = 0 15 10 v
Actual = 1 5 20
Predicted = 0 Predicted = 1
Actual = 0 20 5
Actual = 1 10 15
39
Measure performance of logistic regression
Evaluate predictive ability of the logistic regression model
§ Classification/confusion matrix
o Accuracy of the model = # of correct predictions / # of data points
o # of false positive errors: predict 1 but actually 0
o # of false negative errors: predict 0 but actually 1
41
Receiver Operating Characteristic (ROC) curve
• Threshold value = 1
Predicted = 0 Predicted = 1
Actual = 0 TN FP = 0
Actual = 1 FN TP = 0
§ True positive rate = 0/(0+FN) = 0
§ True negative rate = TN/(TN+0) = 1
43
– False positive rate = 1 - True negative rate = 0
Receiver Operating Characteristic (ROC) curve
45
Selecting a threshold using ROC
46
Area Under the ROC Curve (AUC)
0.775
47
Area Under the ROC Curve (AUC)
48
Area Under the ROC Curve (AUC)
§ AUC measures model’s ability to distinguish between two outcomes (based
on observed characteristics)
§ 0 ≤ AUC ≤ 1 (perfect classifier)
§ Diagonal line: prediction by random guessing (baseline)
– Predicting positive with probability q irrespective of any observed
characteristic gives the point (q, q)
– AUC = 0.5
§ The farther ROC is from the diagonal line, the larger the AUC is, the better
the model is.
49
Measures of accuracy
§ Confusion matrix:
Predicted = 0 Predicted = 1
Actual = 0 TN FP
Actual = 1 FN TP
§ N: number of observations
50
Making predictions
§ Just like in linear regression, we want to make predictions on a test set to
compute out-of-sample metrics
51
Making predictions
§ If we use a threshold value of 0.3, we get the following confusion matrix
Predicted Good Care Predicted Poor Care
Actually Good Care 19 5
Actually Poor Care 2 6
§ Out-of-sample accuracy of (19+6)/32 = 78.3%
52
Summary
§ An expert-trained model can accurately identify diabetics receiving low-quality
care
– Out-of-sample accuracy of 81.3%, if the threshold equals 0.5
– Identifies most patients receiving poor care
§ In practice, the probabilities returned by the logistic regression model can be used
to prioritize patients for intervention
53
Takeaway messages
§ While humans can accurately analyze small amounts of information,
models allow larger scalability
§ Models can integrate assessments of many experts into one final unbiased
and unemotional prediction
54