0% found this document useful (0 votes)

132 views

R Code Default Data PDF

Logistic regression models were fit on a dataset containing information on customer default using different predictors. A model using balance and student (Model 3) had the best fit. This model correctly classified 97.3% of observations and had a misclassification rate of 2.68%. The area under the ROC curve was also good.

Uploaded by

Shubham Wadhwa 23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views

R Code Default Data PDF

Uploaded by

Shubham Wadhwa 23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Logistic Regression

Soumya Roy

# Load the R library "ISLR"

library(ISLR)
# Attach the "Default" data set available in "ISLR" library
attach(Default)
# Name of the variables in "Default" data set
names(Default)

## [1] "default" "student" "balance" "income"

# Dimension of the "Default" data set

dim(Default)

## [1] 10000 4

# Descriptive Summary of the data set

summary(Default)

## default student balance income

## No :9667 No :7056 Min. : 0.0 Min. : 772
## Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340
## Median : 823.6 Median :34553
## Mean : 835.4 Mean :33517
## 3rd Qu.:1166.3 3rd Qu.:43808
## Max. :2654.3 Max. :73554

##Boxplots
boxplot(balance~default,col=(c("red","blue")),xlab="Default",ylab="Balance",m
ain="Balance vs Default")
boxplot(income~default,col=(c("red","blue")),xlab="Default",ylab="Balance",ma
in="Income vs Default")
## Barplot
T=table(default,student)
T

## student
## default No Yes
## No 6850 2817
## Yes 206 127

P=prop.table(T,margin=2)
P

## student
## default No Yes
## No 0.97080499 0.95686141
## Yes 0.02919501 0.04313859

barplot(P[2,],col=c("red","blue"),xlab="Student",ylab="Default Rate") #Second

Row of P gives the default rate

# Fitting a logistic regression model using the predictors "balance"

# The function "glm()" fits generalized linear models, a class of models that
includes logistic regression as a special case
# The function "glm()" is similar to that of "lm()", except that we have to p
ass on the argument "family=binomial" in order to fit a #logistic regression
model
mod_1=glm(default~balance,data=Default,family=binomial)
summary(mod_1)

##
## Call:
## glm(formula = default ~ balance, family = binomial, data = Default)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2697 -0.1465 -0.0589 -0.0221 3.7589
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.065e+01 3.612e-01 -29.49 <2e-16 ***
## balance 5.499e-03 2.204e-04 24.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2920.6 on 9999 degrees of freedom
## Residual deviance: 1596.5 on 9998 degrees of freedom
## AIC: 1600.5
##
## Number of Fisher Scoring iterations: 8
# Fitting a logistic regression model using the predictors "student"
mod_2=glm(default~student,data=Default,family=binomial)
summary(mod_2)

##
## Call:
## glm(formula = default ~ student, family = binomial, data = Default)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.2970 -0.2970 -0.2434 -0.2434 2.6585
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.50413 0.07071 -49.55 < 2e-16 ***
## studentYes 0.40489 0.11502 3.52 0.000431 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2920.6 on 9999 degrees of freedom
## Residual deviance: 2908.7 on 9998 degrees of freedom
## AIC: 2912.7
##
## Number of Fisher Scoring iterations: 6

# Fitting a logistic regression model using the predictors "balance", "studen

t", and "income"
mod_3=glm(default~balance+student+income,data=Default,family=binomial)
summary(mod_3)

##
## Call:
## glm(formula = default ~ balance + student + income, family = binomial,
## data = Default)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.4691 -0.1418 -0.0557 -0.0203 3.7383
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.087e+01 4.923e-01 -22.080 < 2e-16 ***
## balance 5.737e-03 2.319e-04 24.738 < 2e-16 ***
## studentYes -6.468e-01 2.363e-01 -2.738 0.00619 **
## income 3.033e-06 8.203e-06 0.370 0.71152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2920.6 on 9999 degrees of freedom
## Residual deviance: 1571.5 on 9996 degrees of freedom
## AIC: 1579.5
##
## Number of Fisher Scoring iterations: 8

# Getting the odds ratio and their 95% CI

require(MASS)

## Loading required package: MASS

exp(cbind(coef(mod_3), confint(mod_3)))

## Waiting for profiling to be done...

## 2.5 % 97.5 %
## (Intercept) 1.903854e-05 7.074481e-06 0.0000487808
## balance 1.005753e+00 1.005309e+00 1.0062238757
## studentYes 5.237317e-01 3.298827e-01 0.8334223982
## income 1.000003e+00 9.999870e-01 1.0000191246

# Hosmer-Lemeshow Test for checking the model

library(ResourceSelection)

## ResourceSelection 0.3-5 2019-07-22

default_new=ifelse(default=="Yes", 1, 0)
hoslem.test(default_new, fitted(mod_3))

##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: default_new, fitted(mod_3)
## X-squared = 3.6823, df = 8, p-value = 0.8846

# Using the "predict()" function to obtain the probabilities of the form "P(Y
=1|X)"
# The "type=response" ensures the output of the form "P(Y=1|X)", rather than
other information such as the logit
mod_3.probs=predict(mod_3,type="response")
# Printing first ten predicted probabilities
mod_3.probs[1:10]

## 1 2 3 4 5
6
## 1.428724e-03 1.122204e-03 9.812272e-03 4.415893e-04 1.935506e-03 1.989518e
-03
## 7 8 9 10
## 2.333767e-03 1.086718e-03 1.638333e-02 2.080617e-05

# Using the "contrast()" function to check the dummy variable created by R

contrasts(default)

## Yes
## No 0
## Yes 1

# Conversion of probabilities into class labels

mod_3.pred=rep("No",10000)
mod_3.pred[mod_3.probs>.5]="Yes"

# Creating Confusion Matrix to check how many observations are correctly or i

ncorrectly classified
table(mod_3.pred,default)

## default
## mod_3.pred No Yes
## No 9627 228
## Yes 40 105

# Calculating the fraction of days for which the prediction was correct
mean(mod_3.pred==default)

## [1] 0.9732

# Calculating the misclassification rate

mean(mod_3.pred!=default)
## [1] 0.0268

# Changing the cut-off

# Conversion of probabilities into class labels
mod_3.pred=rep("No",10000)
mod_3.pred[mod_3.probs>.2]="Yes"

# Creating Confusion Matrix to check how many observations are correctly or i

ncorrectly classified
table(mod_3.pred,default)

## default
## mod_3.pred No Yes
## No 9390 130
## Yes 277 203

# Calculating the fraction of days for which the prediction was correct
mean(mod_3.pred==default)

## [1] 0.9593

# Calculating the misclassification rate

mean(mod_3.pred!=default)

## [1] 0.0407

# ROC Plot
library(pROC)

## Type 'citation("pROC")' for a citation.

##
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':

##
## cov, smooth, var

R=roc(default,mod_3.probs)

## Setting levels: control = No, case = Yes

## Setting direction: controls < cases

plot(roc(default,mod_3.probs),col="blue",legacy.axes = TRUE)

## Setting levels: control = No, case = Yes

## Setting direction: controls < cases
coords(R, "best", ret = "threshold")

## Warning in coords.roc(R, "best", ret = "threshold"): The 'transpose' argum

ent
## to FALSE by default since pROC 1.16. Set transpose = TRUE explicitly to re
vert
## to the previous behavior, or transpose = TRUE to silence this warning. Typ
e
## help(coords_transpose) for additional information.

## threshold
## 1 0.03120876

# Model Seletion
library(MASS)
stepAIC(mod_3,trace=F)

##
## Call: glm(formula = default ~ balance + student, family = binomial,
## data = Default)
##
## Coefficients:
## (Intercept) balance studentYes
## -10.749496 0.005738 -0.714878
##
## Degrees of Freedom: 9999 Total (i.e. Null); 9997 Residual
## Null Deviance: 2921
## Residual Deviance: 1572 AIC: 1578

# Lift and Gain Charts

library(funModeling)

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Loading required package: ggplot2

##
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':

##
## format.pval, units

## funModeling v.1.9.4 :)
## Examples and tutorials at livebook.datascienceheroes.com
## / Now in Spanish: librovivodecienciadedatos.ai

Default$mod_3.probs=predict(mod_3,type="response")
gain_lift(data=Default, score='mod_3.probs', target='default')
## Population Gain Lift Score.Point
## 1 10 78.38 7.84 0.07092534738
## 2 20 91.89 4.59 0.02104190396
## 3 30 96.70 3.22 0.00880320034
## 4 40 98.80 2.47 0.00401693056
## 5 50 99.10 1.98 0.00196619538
## 6 60 99.70 1.66 0.00094485119
## 7 70 100.00 1.43 0.00044286132
## 8 80 100.00 1.25 0.00017553872
## 9 90 100.00 1.11 0.00005139724
## 10 100 100.00 1.00 0.00001025695

Assignment 3: Logistic Regression (Individual Submission)
0% (1)
Assignment 3: Logistic Regression (Individual Submission)
3 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Team Foxrock - Littlefield Simulation Report
No ratings yet
Team Foxrock - Littlefield Simulation Report
10 pages
Assignment 2 Due Date: Sep 29, 2020
No ratings yet
Assignment 2 Due Date: Sep 29, 2020
2 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
An R Companion To Applied Regression 2nd Edition
No ratings yet
An R Companion To Applied Regression 2nd Edition
538 pages
Print Out
No ratings yet
Print Out
17 pages
Final Project Strategic Management and Policy
No ratings yet
Final Project Strategic Management and Policy
10 pages
PGP09168 At&t
No ratings yet
PGP09168 At&t
11 pages
American Hospital Supply Corporation Case Study
0% (2)
American Hospital Supply Corporation Case Study
6 pages
SCM
No ratings yet
SCM
2 pages
1 Simulation Case Tri State Corp
0% (1)
1 Simulation Case Tri State Corp
2 pages
LPP Sensitivity Report
No ratings yet
LPP Sensitivity Report
7 pages
IPL PA-nik
100% (1)
IPL PA-nik
6 pages
Instructions - Please Read The Instructions Carefully! Contains Information About Bonuses and Penalties!!!
No ratings yet
Instructions - Please Read The Instructions Carefully! Contains Information About Bonuses and Penalties!!!
3 pages
Soal Capital Budgeting Chapter 11
No ratings yet
Soal Capital Budgeting Chapter 11
1 page
VAR Package Pricing at Mission Hospital
No ratings yet
VAR Package Pricing at Mission Hospital
6 pages
Prof. Gautam Sinha VGSOM, IIT Kharagpur
No ratings yet
Prof. Gautam Sinha VGSOM, IIT Kharagpur
67 pages
What Are The Ethical and Privacy Issues That Harrah's Should Be Concerned About
No ratings yet
What Are The Ethical and Privacy Issues That Harrah's Should Be Concerned About
5 pages
Sample Problems-DMOP
No ratings yet
Sample Problems-DMOP
5 pages
Solution - Canonical Decision Problem
No ratings yet
Solution - Canonical Decision Problem
24 pages
Problem Set 2 3 4 Or-2
100% (1)
Problem Set 2 3 4 Or-2
5 pages
Metropolitan Research Inc. Case Study
No ratings yet
Metropolitan Research Inc. Case Study
6 pages
Case 9-2 Innovative Engineering Co
No ratings yet
Case 9-2 Innovative Engineering Co
4 pages
Exercises On Asset Analysis
100% (1)
Exercises On Asset Analysis
2 pages
Monte Carlo Simulation and Queuing
No ratings yet
Monte Carlo Simulation and Queuing
11 pages
CompXM Study Guide
No ratings yet
CompXM Study Guide
1 page
Dharwar Drilling Society
No ratings yet
Dharwar Drilling Society
3 pages
Four Steps To Forecast Total Market Demand
No ratings yet
Four Steps To Forecast Total Market Demand
2 pages
The Spring Field Nor
No ratings yet
The Spring Field Nor
15 pages
Exam Case - II Analysis of Impact of Autorickshaws On Public
No ratings yet
Exam Case - II Analysis of Impact of Autorickshaws On Public
7 pages
Ie202 SS1
No ratings yet
Ie202 SS1
8 pages
KVA Anusha - PGP12021 - BA
100% (1)
KVA Anusha - PGP12021 - BA
8 pages
Shell Attempts To Returrn To Premiere Status JIA MERCADO
No ratings yet
Shell Attempts To Returrn To Premiere Status JIA MERCADO
7 pages
Sample Problems
No ratings yet
Sample Problems
9 pages
Hampton Machine Tool Company - A Case Study
0% (1)
Hampton Machine Tool Company - A Case Study
6 pages
Problem Sheet II - Confidence Interval, Sample Size
No ratings yet
Problem Sheet II - Confidence Interval, Sample Size
4 pages
SQC Simulation Solution
No ratings yet
SQC Simulation Solution
2 pages
Atlantic Computers Examples For Case
No ratings yet
Atlantic Computers Examples For Case
3 pages
Marketing Decision Making: Gyaan Kosh Term 4
No ratings yet
Marketing Decision Making: Gyaan Kosh Term 4
18 pages
Capsim Success Measures
No ratings yet
Capsim Success Measures
10 pages
What Do You Think Hilton Leadership Should Do After The Blackstone Acquisition? Should They Further Invest in CRM or Simply Maintain The Status Quo?
No ratings yet
What Do You Think Hilton Leadership Should Do After The Blackstone Acquisition? Should They Further Invest in CRM or Simply Maintain The Status Quo?
1 page
Case Study On Cabo San Viejo
No ratings yet
Case Study On Cabo San Viejo
15 pages
Employee Engagement Initiatives JP Morgan
No ratings yet
Employee Engagement Initiatives JP Morgan
8 pages
Retail Credit Scoring
No ratings yet
Retail Credit Scoring
9 pages
General Pump Company
No ratings yet
General Pump Company
1 page
Demand Region Production and Transportation Cost Per 1,000,000 Units
No ratings yet
Demand Region Production and Transportation Cost Per 1,000,000 Units
4 pages
Course Material BM QT 2019 PDF
No ratings yet
Course Material BM QT 2019 PDF
44 pages
Case Data
0% (1)
Case Data
6 pages
Case Analysis
No ratings yet
Case Analysis
14 pages
Problem Set
No ratings yet
Problem Set
6 pages
The Company: Strength Weakness
No ratings yet
The Company: Strength Weakness
10 pages
CH 032
No ratings yet
CH 032
57 pages
01 TShirt Sales Finished
No ratings yet
01 TShirt Sales Finished
7 pages
Household Products
No ratings yet
Household Products
2 pages
Case Analysis
No ratings yet
Case Analysis
11 pages
Group2 ITT Case
100% (1)
Group2 ITT Case
16 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
No ratings yet
Homework 3: Jiawei Li Sahil Bhagat Shahrzad Baraeinezhad
16 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Machine Learning in Medicine - A Practical Introduction
No ratings yet
Machine Learning in Medicine - A Practical Introduction
18 pages
0-Cheatsheet Capstone Part 1
No ratings yet
0-Cheatsheet Capstone Part 1
4 pages
Jadan Et Al 2022 Forest Ecosystems
No ratings yet
Jadan Et Al 2022 Forest Ecosystems
9 pages
(Ebook) Medical Statistics at a Glance by Aviva Petrie, Caroline Sabin ISBN 9781405180511, 140518051X - The full ebook version is just one click away
100% (1)
(Ebook) Medical Statistics at a Glance by Aviva Petrie, Caroline Sabin ISBN 9781405180511, 140518051X - The full ebook version is just one click away
49 pages
An Introduction to Generalized Linear Models, Second Edition - 2nd Edition Readable Ebook Download
100% (9)
An Introduction to Generalized Linear Models, Second Edition - 2nd Edition Readable Ebook Download
14 pages
A Catalog of Noninformative Priors
No ratings yet
A Catalog of Noninformative Priors
44 pages
Modelling Claim Frequency in Vehicle Insurance: Jiří Valecký
No ratings yet
Modelling Claim Frequency in Vehicle Insurance: Jiří Valecký
7 pages
Bayesian Methods for Management and Business Pragmatic Solutions for Real Problems 1st Edition Eugene D. Hahn all chapter instant download
No ratings yet
Bayesian Methods for Management and Business Pragmatic Solutions for Real Problems 1st Edition Eugene D. Hahn all chapter instant download
55 pages
Data Analytics For Non-Life Insurance Pricing
No ratings yet
Data Analytics For Non-Life Insurance Pricing
240 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
119 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
f21 Pratice Midterm 2 Solution (Part I)
No ratings yet
f21 Pratice Midterm 2 Solution (Part I)
7 pages
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
No ratings yet
An Illustrated Guide To The Poisson Regression Model - by Sachin Date - Towards Data Science
25 pages
Bauer., Et Al 2017
No ratings yet
Bauer., Et Al 2017
10 pages
Generalized linear mixed models modern concepts methods and applications 1st Edition Stroup - Read the ebook online or download it for a complete experience
No ratings yet
Generalized linear mixed models modern concepts methods and applications 1st Edition Stroup - Read the ebook online or download it for a complete experience
54 pages
Advanced Regression Models with SAS and R_revised 1st Edition Olga Korosteleva - Download the ebook now for instant access to all chapters
100% (1)
Advanced Regression Models with SAS and R_revised 1st Edition Olga Korosteleva - Download the ebook now for instant access to all chapters
50 pages
Innovation, Sustainable HRM and Customer Satisfaction
No ratings yet
Innovation, Sustainable HRM and Customer Satisfaction
9 pages
A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables
No ratings yet
A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables
18 pages
Journal of Biogeography - 2006 - Ara Jo - Five or So Challenges For Species Distribution Modelling
No ratings yet
Journal of Biogeography - 2006 - Ara Jo - Five or So Challenges For Species Distribution Modelling
12 pages
Lancaster University - MS Data Science Handbook
No ratings yet
Lancaster University - MS Data Science Handbook
43 pages
Testing The Wisdom of Crowds in The Field: Transfermarkt Valuations and International Soccer Results Thomas Peeters
No ratings yet
Testing The Wisdom of Crowds in The Field: Transfermarkt Valuations and International Soccer Results Thomas Peeters
13 pages
IAI CS1 Syllabus 2024
No ratings yet
IAI CS1 Syllabus 2024
6 pages
Shaffer Et Al. - 2003 - Foraging Effort in Relation To The Constraints of
No ratings yet
Shaffer Et Al. - 2003 - Foraging Effort in Relation To The Constraints of
9 pages
MBB Bok PDF
No ratings yet
MBB Bok PDF
5 pages
Mastering Predictive Analytics with R 2nd edition Edition Forte All Chapters Instant Download
100% (3)
Mastering Predictive Analytics with R 2nd edition Edition Forte All Chapters Instant Download
81 pages
Vet Research LMM
No ratings yet
Vet Research LMM
29 pages
Etter Et Al. - 2006 - Regional Patterns of Agricultural Land Use and Deforestation in Colombia
No ratings yet
Etter Et Al. - 2006 - Regional Patterns of Agricultural Land Use and Deforestation in Colombia
18 pages
SAS Stat Studio v3.1
No ratings yet
SAS Stat Studio v3.1
69 pages
Statistical Analysis of Contingency Tables (Fagerland, Morten W. Laake, Petter Lydersen Etc.) (Z-Library)
No ratings yet
Statistical Analysis of Contingency Tables (Fagerland, Morten W. Laake, Petter Lydersen Etc.) (Z-Library)
657 pages

R Code Default Data PDF

Uploaded by

R Code Default Data PDF

Uploaded by

Logistic Regression

# Load the R library "ISLR"

## [1] "default" "student" "balance" "income"

# Dimension of the "Default" data set

# Descriptive Summary of the data set

## default student balance income

barplot(P[2,],col=c("red","blue"),xlab="Student",ylab="Default Rate") #Second

# Fitting a logistic regression model using the predictors "balance"

# Fitting a logistic regression model using the predictors "balance", "studen

# Getting the odds ratio and their 95% CI

## Loading required package: MASS

## Waiting for profiling to be done...

# Hosmer-Lemeshow Test for checking the model

## ResourceSelection 0.3-5 2019-07-22

# Using the "contrast()" function to check the dummy variable created by R

# Conversion of probabilities into class labels

# Creating Confusion Matrix to check how many observations are correctly or i

# Calculating the misclassification rate

# Changing the cut-off

# Creating Confusion Matrix to check how many observations are correctly or i

# Calculating the misclassification rate

## Type 'citation("pROC")' for a citation.

## The following objects are masked from 'package:stats':

## Setting levels: control = No, case = Yes

## Setting direction: controls < cases

## Setting levels: control = No, case = Yes

## Warning in coords.roc(R, "best", ret = "threshold"): The 'transpose' argum

# Lift and Gain Charts

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Loading required package: ggplot2

## The following objects are masked from 'package:base':

You might also like