0% found this document useful (0 votes)
116 views18 pages

Logistic Regression Insights

Logistic regression is a classification algorithm used to predict binary outcomes by analyzing the relationship between a categorical dependent variable and one or more categorical or continuous independent variables. It requires that the dependent variable be dichotomous, such as passed/failed, churned/not churned, etc. A confusion matrix is used to evaluate the performance of a logistic regression model, with metrics like accuracy, true/false positive and negative rates, precision, and prevalence.

Uploaded by

Twinkle Killa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views18 pages

Logistic Regression Insights

Logistic regression is a classification algorithm used to predict binary outcomes by analyzing the relationship between a categorical dependent variable and one or more categorical or continuous independent variables. It requires that the dependent variable be dichotomous, such as passed/failed, churned/not churned, etc. A confusion matrix is used to evaluate the performance of a logistic regression model, with metrics like accuracy, true/false positive and negative rates, precision, and prevalence.

Uploaded by

Twinkle Killa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

By

Dr. K.A. Asraar Ahmed


Assistant professor
XIME
asraar@xime.org
▪ Logistic regression is used to analyze relationship between a
categorical dependent variable and metrics or categorical
independent variable.

▪ It is a classification algorithm used to predict binary outcomes for a


given set of independent variable.
▪ Your dependent variable should be measured on a dichotomous scale.
▪ Examples of dichotomous variables include
▪ Customer Churn or not,
▪ Employee will resign or not,
▪ Employee will get Promotion or not,
▪ Loan can be processed or not,
▪ presence of heart disease (two groups: "yes" and "no"),
▪ personality type (two groups: "introversion" or "extroversion"),
▪ body composition (two groups: "obese" or "not obese"), and so forth.
▪ where the dependent variable is "exam performance", measured on
a dichotomous scale – "passed" or "failed" – and you have three
independent variables: "revision time", "test anxiety" and "lecture
attendance").

Independent Variable (Continuous Dependent Variable (Categorical


Data) Data)

Revision time Passed or Failed is measured in


categorical type of data
Test anxiety

Lecture attendance
▪ It is a classification algorithm used to predict binary outcomes for a given set
of independent variable.
▪ Here the dependent variable is Discrete /Categorical
▪ Example to predict customer churn or employee resignation through Yes/No
dependent variable
▪ A confusion matrix is a table that is often used to describe the performance of a
classification model (or "classifier") on a set of test data for which the true
values are known.
▪ The confusion matrix itself is relatively simple to understand, but the related
terminology can be confusing.
▪ There are two possible predicted classes: "yes" and "no". If we were predicting
the customer churn, for example, "yes" would mean they have the customer is
churned , and "no" would mean they .
▪ The classifier made a total of 165 predictions (e.g., 165 customers were being
analysed for the churn rate).
▪ Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55
times.
▪ In reality, 105 customers in the sample have churned, and 60 customers do
not.
▪ True positives (TP): These are cases in which we predicted yes (they have
churned), and they have left.

▪ True negatives (TN): We predicted no, and they were not left/Churn.

▪ False positives (FP): We predicted yes, but actually they dint churned. (Also
known as a "Type I error.")

▪ False negatives (FN): We predicted no, but they actually they have churned.
(Also known as a "Type II error.")
▪ Accuracy: Overall, how often is the classifier correct? (TP+TN)/total =
(100+50)/165 = 0.91
▪ Misclassification Rate: Overall, how often is it wrong? (FP+FN)/total =
(10+5)/165 = 0.09 equivalent to 1 minus Accuracy also known as "Error
Rate“.
▪ True Positive Rate: When it's actually yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95 also known as "Sensitivity" or "Recall“
▪ False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17
▪ True Negative Rate: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83 equivalent to 1 minus False Positive Rate also
known as "Specificity"
▪ Precision: When it predicts yes, how often is it correct? TP/predicted yes =
100/110 = 0.91
▪ Prevalence: How often does the yes condition actually occur in our sample?
actual yes/total = 105/165 = 0.64.

You might also like