0% found this document useful (0 votes)
11 views33 pages

Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy

The document discusses Logistic Regression as a statistical method for binary classification, highlighting its applications in employee attrition prediction, customer churn analysis, credit risk assessment, email spam classification, and disease diagnosis. It explains the working principle, including the use of the sigmoid function to model probabilities, and covers key concepts such as confusion matrix, accuracy, precision, recall, F1 score, ROC curve, and AUC. Additionally, it addresses limitations and considerations for using Logistic Regression, emphasizing its interpretability and effectiveness in various domains.

Uploaded by

Rabjoat Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views33 pages

Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy

The document discusses Logistic Regression as a statistical method for binary classification, highlighting its applications in employee attrition prediction, customer churn analysis, credit risk assessment, email spam classification, and disease diagnosis. It explains the working principle, including the use of the sigmoid function to model probabilities, and covers key concepts such as confusion matrix, accuracy, precision, recall, F1 score, ROC curve, and AUC. Additionally, it addresses limitations and considerations for using Logistic Regression, emphasizing its interpretability and effectiveness in various domains.

Uploaded by

Rabjoat Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Logistic Regression

By : Prof. Dnyanesh Khedekar


Data Analysis and Decision making
Decision needs Data Analysis
• Employee Attrition Prediction:
• Question: Can we predict the likelihood of an employee leaving the
company based on factors such as job satisfaction, performance ratings,
and work hours?
• Application: Logistic Regression can model the probability of employee
attrition, helping HR departments proactively identify high-risk individuals
and implement retention strategies.
Decision needs Data Analysis
• Customer Churn Analysis:
• Question: What factors contribute to customer churn, and can we predict
which customers are at a higher risk of leaving our service?
• Application: Logistic Regression can analyze historical data on customer
behavior, such as usage patterns, customer support interactions, and
contract details, to predict the likelihood of churn.
Decision needs Data Analysis
• Credit Risk Assessment:
• Question: Can we assess the risk of default for a loan applicant based on
their credit score, income, debt-to-income ratio, and other financial
indicators?
• Application: Logistic Regression can help financial institutions evaluate
the creditworthiness of applicants by modeling the probability of loan
default.
Decision needs Data Analysis
• Email Spam Classification:
• Question: How can we automatically classify emails as spam or non-spam
based on features like sender information, subject, and content?
• Application: Logistic Regression can be applied to build a spam filter,
predicting the probability that an email is spam and automatically
filtering out unwanted messages.
Decision needs Data Analysis
• Disease Diagnosis:
• Question: Given patient data such as medical history, age, and test
results, can we predict the likelihood of a patient having a specific
disease?
• Application: Logistic Regression can be used in healthcare to model the
probability of disease occurrence, aiding in early diagnosis and timely
intervention.
“YES” or “NO”
• All the examples we discussed are based prediction of dependent variable
• This variable is always binary ( can hold only 2 values : 0 or 1 )
• Now that we know why , let us understand what and how ?
Agenda

1. Introduction to Logistic Regression


2. Binary Classification Basics
3. Confusion Matrix
4. Accuracy
5. Precision
6. Recall
7. F1 Score
8. ROC Curve
9. Area Under the Curve (AUC)
10. Conclusion
Introduction to Logistic Regression

• Brief overview of Logistic Regression as a classification algorithm.


• Application areas and importance in machine learning.
Regression

• Regression models the relationship between a dependent variable


and independent variables.
• Types:
• Linear Regression: For continuous outcomes.
• Logistic Regression: For binary (or categorical) outcomes.
Quick Recap of Linear Regression

• Purpose: Predict a continuous outcome.


• Model Equation: y = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ + ε
Scatter Plot with Best-Fit Line
Limitations for Classification

• Issue: Linear regression can predict values outside the [0,1] range.
• For binary outcomes (0 or 1), such predictions are not
interpretable as probabilities.
Linear Predictions Outside [0,1]
Transition to Logistic Regression

• Motivation: Model probabilities directly for classification.


• Key Idea: Use a function that “squashes” outputs into the [0,1]
range.
• Instead of predicting a continuous value, predict the probability
of class membership.
The Sigmoid (Logistic) Function

• Definition:
1
𝜎 z =
1 + ⅇ−z

• Maps any real number to (0,1);


• S-shaped curve
Sigmoid Function
Using sigmoid to convert to probabilities
Model Equation:
1
𝜎 z = P(y = 1 | x) =
1 + ⅇ−(β₀ + β₁x₁ + … + βₙxₙ)

• Interpretation: Output is the probability that the dependent


variable equals 1.
Definition :Logistic Regression

• Logistic Regression is a statistical method used for binary classification.


• Objective: Predict the probability that an instance belongs to a particular class.
• Sigmoid Function: Core of Logistic Regression, mapping real-valued numbers to
probabilities (0 to 1).

• Logistic Regression is widely used in various domains, including healthcare,


finance, and marketing.
Logistic Regression : Working Principle

• Probability Prediction: Logistic Regression predicts the probability of the


positive class (class 1).
• Decision Boundary: Threshold value to classify instances into the positive
or negative class.
• Parameters: Coefficients (weights) and bias term determine the decision
boundary.
• Training: Adjust parameters to maximize the likelihood of the observed
data.
Key Features
• Linear Decision Boundary: Logistic Regression assumes a linear
relationship between input features and log-odds of the positive class.
• Interpretability: Coefficients indicate the impact of each feature on the
probability of the positive class.
• Output Interpretation: Predicted probabilities can be converted into class
predictions based on a chosen threshold.
Loss Function in Logistic Regression

• Objective: Understand how Logistic Regression minimizes a loss function


during training.
• Loss Function: Log Loss (Cross-Entropy Loss).
Cost Function
• Cost Function: Derived from the loss function, represents the total error
over the entire dataset.
• Formula:

• Explanation: Measures the difference between predicted probabilities


(Pi) and actual labels (Yi).
Binary Classification Basics

• Reminder that Logistic Regression is commonly used for binary


classification (e.g., spam or not spam).
• Two classes: Positive and Negative.
Confusion Matrix

• Definition of the confusion matrix.


• Elements: True Positive, True Negative, False Positive, False Negative.
Accuracy

• Formula: (TP + TN) / (TP + TN + FP + FN).


• Limitations and considerations.
• Importance of context in interpreting accuracy.
Precision

• Formula: TP / (TP + FP).


• Focus on the proportion of correctly predicted positive instances.
• Emphasis on minimizing false positives.
Recall

• Formula: TP / (TP + FN).


• Focus on the proportion of actual positive instances correctly predicted.
• Emphasis on minimizing false negatives.
F1 Score

• Formula: 2 * (Precision * Recall) / (Precision + Recall).


• Harmonic mean of precision and recall.
• Balancing precision and recall.
Specificity and Sensitivity

• Sensitivity (Recall): TP / (TP + FN).


• Specificity: TN / (TN + FP).
• Explanation of the importance of both metrics in evaluating model
performance.
ROC Curve

• Introduction to the Receiver Operating Characteristic (ROC) curve.


• Plotting the true positive rate (sensitivity) against the false positive rate (
1- specificity).
• Visualization of model performance across different thresholds.
Area Under the Curve (AUC)

• Explanation of the Area Under the Curve.


• AUC as a measure of the classifier's ability to distinguish between positive
and negative instances.
• AUC interpretation: 0.5 (random) to 1 (perfect classifier).
Use Cases and Considerations
• Use Cases: Logistic Regression is effective when the relationship between
features and the log-odds of the positive class is approximately linear.
• Considerations:
• Sensitivity to outliers.
• Assumes independence of features.
• May not perform well if the classes are imbalanced.
• Logistic Regression provides a balance between simplicity and interpretability in
classification tasks.

You might also like