Linear Regression Model
Linear Regression Model
(MBA ZG536)
ON
“Classification Model”
2023mb21003
1. What is the problem statement? What are the independent and dependent
variables? Explain their meaning/significance in one line each.
For our problem statement, Left or Stay are the dependent variables and Gender,
Promotion, Function (Operation/Support/Sales), Martial status, Employee group are
the independent variables.
In this problem statement, the number of the independent variables can change
based on the number of employees in the company and due to the change the Left
or Stay variable will react.
2. Build classification models using 3 different algorithms. The models should have
between 4 to 8 features.
Answer: I have used the below three algorithms to build the classification models:-
i. Logistic Regression - Logistic regression is actually a non-linear
extension of linear regression that allows us to handle classes. This
is achieved by classifying predictions into a given class based on a
probability threshold.
ii. Random Forest - A random forest is an ensemble learning method
that combines the predictions from multiple decision trees to
produce a more accurate and stable prediction.
iii. Decision Tree - Decision trees are a very common way to represent
and visualize possible outcomes of a decision or an action based on
probabilities. In a nutshell, a decision tree is a hierarchical
representation of the outcome space where every node represents
a choice or an action, with leaves representing states of the
outcome.
3. Paste the confusion matrix and classification report for each algorithm below.
Answer: Classification reports:-
a) Logistic Regression
Output :-
b) Random Forest
Output:-
c) Decision Tree
Output:-
Confusion Matrix:-
a) Logistic Regression
Output:-
b) Random Forest
Output:-
c) Decision Tree
Output:-
4. Which metric will you use evaluate these models? Provide the justification
Answer: The coef’s tell how much and in what way did each one of them contribute to the
predicting the target variable. This is a type of Model-driven exploratory data analysis. I will
use the “New Promotion” column as it has the highest feature importance and will select
Logistic regression model because it has the best accuracy as well it is neither over fitted nor
under fitted. Even Random Forest has 1% better accuracy than Logistic regression but
Random Forest is an over fitted model hence I will select Logistic regression.