0% found this document useful (0 votes)
9 views

Linear Regression Model

Linear Regression Model

Uploaded by

2023mb21003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Linear Regression Model

Linear Regression Model

Uploaded by

2023mb21003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Foundations of Data Science

(MBA ZG536)

Experimental Learning Assignment

ON

“Classification Model”

Under the guidance of: Submitted By:


Prof. PRAVIN MHASKE RISHAV SRIVASTAVA

2023mb21003
1. What is the problem statement? What are the independent and dependent
variables? Explain their meaning/significance in one line each.

Answer:- The problem statement is to discuss employee attrition prediction i.e.


predicting that employee will leave the current company (or will resign from the
current company) and I have done this using three different algorithms.

A dependent variable is a variable whose changes and its outcome depend on


another variable. Dependent variables are also called the response or outcome
variables because they represent the outcome of the values we are measuring.

An independent variable is a variable whose outcome or changes do not depend on


another variable. Independent variables are also called predictor variables because
you can use them to predict the outcome of a dependent variable.

For our problem statement, Left or Stay are the dependent variables and Gender,
Promotion, Function (Operation/Support/Sales), Martial status, Employee group are
the independent variables.
In this problem statement, the number of the independent variables can change
based on the number of employees in the company and due to the change the Left
or Stay variable will react.

2. Build classification models using 3 different algorithms. The models should have
between 4 to 8 features.

Answer: I have used the below three algorithms to build the classification models:-
i. Logistic Regression - Logistic regression is actually a non-linear
extension of linear regression that allows us to handle classes. This
is achieved by classifying predictions into a given class based on a
probability threshold.
ii. Random Forest - A random forest is an ensemble learning method
that combines the predictions from multiple decision trees to
produce a more accurate and stable prediction.
iii. Decision Tree - Decision trees are a very common way to represent
and visualize possible outcomes of a decision or an action based on
probabilities. In a nutshell, a decision tree is a hierarchical
representation of the outcome space where every node represents
a choice or an action, with leaves representing states of the
outcome.
3. Paste the confusion matrix and classification report for each algorithm below.
Answer: Classification reports:-
a) Logistic Regression

Output :-

b) Random Forest

Output:-

c) Decision Tree
Output:-

Confusion Matrix:-

a) Logistic Regression

Output:-

b) Random Forest

Output:-

c) Decision Tree

Output:-
4. Which metric will you use evaluate these models? Provide the justification

Answer: The coef’s tell how much and in what way did each one of them contribute to the
predicting the target variable. This is a type of Model-driven exploratory data analysis. I will
use the “New Promotion” column as it has the highest feature importance and will select
Logistic regression model because it has the best accuracy as well it is neither over fitted nor
under fitted. Even Random Forest has 1% better accuracy than Logistic regression but
Random Forest is an over fitted model hence I will select Logistic regression.

You might also like