Linear
Regression
and logistic
regression
Task no . 1:
Explain Linear Regression with example (Lab work)
Answer:
Regression:
Regression analysis is a form of predictive modelling technique, which
investigates the relationship between dependent and independent variables.
Uses of regression:
Three major uses for regression analysis are
Determining the strength of predictors.
Forecasting an effect.
Trend forecasting.
Linear Regression:
Linear Regression is a statistical method that can be used to find the relationship
between predictor features and independent or dependent variables. It solves
regression problem. The graph shows straight line.
Use of linear regression:
Linear regression is used for
Evaluating Trends and Sales Estimates.
Analyzing the impact of Price Changes.
Assessment of risk in financial services and insurance domain.
Formula:
y=mx+c
Where as
y = Output / Dependent Variable.
m = Slope.
X = Input / Independent Variable.
C = Constant / y-intercept.
For example:
x y (Actual Values)
1 3
2 4
3 2
4 4
5 5
First we find mean of x and y values by using formula.
Mean=
∑ of all values
Total Values .
Mean of x:
1+2+3+ 4+5
∑ x= 5
∑ x =3
Mean of y:
3+ 4+ 2+4 +5
∑ y= 5
∑ y=3.6
Now, we use linear equation.
y=mx+c equation (A)
y=m x +c equation (B)
Now we find slope by using formula.
m=
∑ ( x−x)( y − y)
¿¿
Put the values in formula.
m¿ ( 1−3 ) ( 3−3.6 ) + ( 2−3 )( 4−3.6 ) + ( 3−3 )(¿2−3.6
¿
) + ( 4−3 )( 4−3.6 ) +(5−3)(5−3.6)
4
m= =0.4
10
Now, we find c. Put the values in equation (B)
3.6=0.4 (3)¿+ c
c=3.6−1.2
c=2.4
Prediction values of y:
We find prediction values of y by using values of m ,x, and c in equation (A).
y1:
y 1=1 ( 0.4 )+ 2.4=2.8
y2 :
y 2=2 ( 0.4 ) +2.4=0.8+2.4=3.2
y3 :
y 3=3 ( 0.4 ) +2.4=1.2+2.4=3.6
y4 :
y 1=4 ( 0.4 )+ 2.4=1.6+2.4 = 4.0
y5 :
y 5=5 ( 0.4 ) +2.4=2.0+2.4=4.4
R2:
R =∑ ¿ ¿ ¿
2
2
R =¿¿
2
R =0.307
2
R =0.307 ×100 %=30.7 %
R- squared : R-squared value is a statistical measure of how close the data are
to the fitted the regression line. It is also known as coefficient of determination,
or the coefficient of multiple determination.
If value of R2 = 1 its mean no error occur 100 % line is fitted.
If value of R2 = 0 its mean error occur 100 % line is not fitted.
Task no . 2:
Implement Linear Regression from scratch (Lab work)
Task no . 3:
Implement Linear Regression on HeadBrain dataset from scratch (Lab work)
Task no . 4:
Implement Linear Regression using built-in Model (Lab work)
Task no . 5:
Explain and implement Logistic Regression on any dataset (Homework)
Answer:
Logistic Regression:
Logistic Regression produces result in a binary format which is used to predict the outcome
of a categorical dependent variable. It solves classification problems. The graph represent S-
Curve.
So the outcome should be discrete / categorical such as:
0 or 1.
Yes or No.
True or False.
High and Low.
Logistic regression will allow you to analyze the set of variables and predict a categorical
outcome. Since here we need to predict whether she will get into school or not, which is a
classification problem, logistic regression will be used.
Why are we not using Linear regression in this case?
The reason is that linear regression is used to predict a continuous quantity, rather than
categorical one. Here we are going to predict whether or not your sister is going to get into
grad school. So that is clearly a categorical outcome. So when the result in outcome can take
only classes of values, like two classes of values, it is sensible to have a model that predicts
the value as either 0 or 1, or in a probability form ha ranges between zero and one. So linear
regression does not have this ability. If we use linear regression to model a binary outcome,
the resulting model will not predict y values in the range of 0 and 1, because linear
regression works on continuous dependent variables, and not on categorical variables.
That’s why we make use of logistic regression. So we understand that linear regression is
used to predict continuous quantities, and logistic regression is used to predict categorical
quantities .
Logistic regression is primary technique. It is similar to linear regression because it belongs
to generate linear models. It belongs to same class as linear regression , but there is no reason
behind the name logistic regression. Logistic regression is mainly used for classification
purpose because here we will have to predict a dependent variable which is categorical in
nature .
Logistic regression equation is derived from same equation, except we need to make a few
alterations, because the output is only categorical. So logistic regression does not calculate
necessary the outcome as zero or one.
Logistic Regression Equation:
The logistic regression equation is derived from the Straight line Equation.
Equation of straight line :
y=β 0 + β 1 x 1 + β 1 x 2 +…+ ε . Range is from −∞ ¿ ∞ .
whereas
β 0=c =Constant∨ y intercept .
β 1=m=Slope .
x = Input / Independent Variable.
y = Output / Dependent Variable.
Let’s try to reduce the Logistic Regression Equation from Straight Line Equation.
β 0 + β 1 x 1 + β1 x 2 + …
In logistic regression y can be only from 0 to 1.
Now , to get the range of y between 0 and infinity , let’s transform y.
y
y−1
where as y = 0 then and y =1 then infinity.
Now, the range is between 0 to infinity.
Let us transform it further to get range between – (infinity) and (infinity).
y
log [¿ ]=¿ y =β 0+ β1 x 1+ β 1 x2 +… ¿ (Final logistic regression).
1− y
How logistic regression work:
Take a look at this graph.
Now I told you that the outcome in a logistic regression is categorical. Your outcome will
either be 0 or 1, or it will be a probability form ha ranges between zero and one. Now, some
of you might think that why do we have an S curve. We can obviously have a straight line.
We have something known as a Sigmoid curve, because we have values ranging between
zero and one, which will basically show the probability. So may be our output will be 0.7,
which is a probability value. if it is 0.7, it means that your outcome is basically one. So that’s
why we have this sigmoid curve like this.
First we take a linear regression equation.
y=β 0 + β 1 x 1 +ϵ
Represent a relationship between p(X) = Pr(Y=1|X) and X. here because this Pr denotes
Probability and this value basically denotes that the probability of y = 1 , given some value
of x, If you wanted to calculate probability using the linear regression model, then the
probability will look something like
(β 0+ β1 x)
e
P ( X )=
e (β + β x) +1
0 1
Now the next step is to calculate something known as a logic function. Now, the logic
function is nothing, but it is a link function that is represented as an S curve or as a sigmoid
curve that ranges between the value 0and 1 it basically calculates the probability of the output
variable.
(β0 +β1 x)
e
P ( X )= (β0+ β x)
e +1
p (e +1 ) =e
( β0 +β x ) ( β0+ β1 x)
p . e(
β 0+ β x )
+ p=e (
β0 +β 1 x )
p=e(
β 0+ β 1 x )
− p . e(
β0 + β x )
p=e(
β 0+ β 1 x )
( 1− p )
p
=e (
β +β 0 1 x)
( 1− p )
ln
[ p
( 1− p ) ]
=β 0+ β 1 x
Uses of logistic regression:
Logistic regression is used across many scientific fields. In Natural Language Processing
(NLP), it’s used to determine the sentiment of movie reviews, while in Medicine it can be
used to determine the probability of a patient developing a particular disease.
Implement Logistic Regression:
Following are these.
Collect Data: Importing Libraries.
Analyzing Data: Creating different plots to check relationships between them.
Data Wrangling: Clean the data by removing Nan values and unnecessary columns in
data set.
Train and Test Data: Build the model on train data and predict the output on test data.
Accuracy Check: Calculate accuracy to check how accurate your values are.
From Scratch:
Using Built in Commands: