0% found this document useful (0 votes)
22 views

ML DSBA Lab2

This document provides an overview of logistic regression, a machine learning algorithm for binary classification. It describes how logistic regression models the probability of an observation belonging to a class using the logistic function. The goal is to learn weights that maximize the likelihood of the training data. Gradient descent is used to minimize the cost function by adjusting the weights at each step. The document then outlines applying logistic regression to predict student admissions using grades from two courses as features, with the aim of finding the decision boundary between admitted and rejected students.

Uploaded by

Houssam Fouki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

ML DSBA Lab2

This document provides an overview of logistic regression, a machine learning algorithm for binary classification. It describes how logistic regression models the probability of an observation belonging to a class using the logistic function. The goal is to learn weights that maximize the likelihood of the training data. Gradient descent is used to minimize the cost function by adjusting the weights at each step. The document then outlines applying logistic regression to predict student admissions using grades from two courses as features, with the aim of finding the decision boundary between admitted and rejected students.

Uploaded by

Houssam Fouki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

F OUNDATIONS OF M ACHINE L EARNING

M.S C . IN D ATA S CIENCES AND B USINESS A NALYTICS


C ENTRALE S UP ÉLEC
Lab 2: Logistic Regression

Instructor: Fragkiskos Malliaros


TA: Benjamin Mahuu
October 21, 2021

1 Description
In this lab, we will study supervised learning algorithms, and in particular the one of Logistic Regression.
Initially, we discuss the basic characteristics of the algorithm and then we examine how the algorithm
can be applied on real classification problems.

2 Logistic Regression
Logistic regression is a discriminative classification model that can be used to predict the probability of
occurrence of an event. It is a supervised learning algorithm that can be applied to binary or multino-
mial classification problems. Similar to the case of linear regression, each feature has a coefficient θ (i.e.,
a weight), that captures the contribution of the feature to the variable y (recall that in linear regression
with only one feature x, we want to predict the value y of a new instance according to y = θ1 x + θ0 ; the
goal is to learn parameters θ0 and θ1 from the training data). In the case of two-class logistic regression,
variable y does not take continues values (as in linear regression), but actually corresponds to the class
values, namely 0 or 1.
1

That way, the probability that a new instance belongs to 0.9

0.8
class 0 is given by: p(Y = 0|x) = θ1 x + θ0 . In other words, 0.7

the goal of logistic regression is to predict the probability 0.6

p 0.5

that a new instance belongs to class 0 or 1. This is the main 0.4

reason why the probability p(x) cannot be considered as the 0.3

0.2

variable y in a linear regression problem (the value should 0.1

be in the [0, 1] range). 0


−8 −6 −4 −2 0
logit(p)
2 4 6 8

To overcome this problem, we apply the logistic transfor- Figure 1: Logistic function.
mation of probability p(x). Inother words, we replace prob-
p 1
ability p with logit(p) = log . While probability p can take values from 0 to 1, logit(p) ranges
1−p
from −∞ to +∞. Figure 1 shows an example of the logistic function. Observe that the function is
1 Wikipedias lemma for Logit: http://en.wikipedia.org/wiki/Logit.

1
symmetric around 0.5, where the function takes value 0. In our case, the logistic function is useful be-
cause it can take an input with any value from negative to positive infinity, whereas the output always
takes values between zero and one and hence is interpretable as a probability. That way, the logistic
regression model can be expressed as:

p(x)
log = θ0 + θ1 x. (1)
1 − p(x)
Solving for p we have:

eθ0 +θ1 x 1
p(x) = = . (2)
1 + eθ0 +θ1 x 1 + e−(θ0 +θ1 x)
In the case of a binary classification problem (i.e., two classes), we should predict class Y = 1 with
probability p ≥ 0.5 and class Y = 0 with probability p < 0.5. This means guessing class 1 whenever
θ1 x + θ0 is non-negative, and 0 otherwise. So logistic regression gives us a linear classifier. The decision
boundary separating the two predicted classes is the solution of θ1 x + θ0 = 0, which is a point if x
is one dimensional, a line if it is two dimensional, a plane in the case of three features, etc. Logistic
regression not only describes where the boundary between the classes is, but also indicates (via Eq. (2))
that the class probabilities depend on the distance from the boundary in a particular way, and that they
go towards the extremes (0 and 1) more rapidly when kθk is larger.
In the case where we have more than one features, the logistic regression model can be expressed as
follows:

p(x)
log = θ 0 + θ 1 x1 + θ 2 x2 + . . . + θ n xn . (3)
1 − p(x)
Solving for p we have:

eθ0 +θ1 x1 +θ2 x2 +...+θn xn 1


p(x) = θ +θ x +θ x +...+θ x
= −(θ +θ x +θ2 x2 +...+θn xn )
. (4)
1+e 0 1 1 2 2 n n 1+e 0 1 1

Logistic regression learns the weights θi so as to maximize the likelihood of the data. The likelihood
function can be expressed in terms of the Bernoulli distribution:
m
Y
p(Y |θ) = p(xi )yi (1 − p(xi ))1−yi , (5)
i=1

where p(xi ) the predicted probability that xi belongs to the first class (as shown in Eq. 4), yi the class
label of instance i and m the number of instances. Based on this, we can define an error (or cost) function
by taking the negative logarithm of the likelihood and applying the product rule for logarithms:

m
1 X 1   1 
J(θ) = − yi log −(θ +θ x +θ x +...+θ x )
+ 1 − yi log 1 − −(θ +θ x +θ x +...+θ x )
.
m i=1 1+e 0 1 1 2 2 n n 1+e 0 1 1 2 2 n n

(6)
Thus, logistic regression aims to find parameters θ in order to minimize the above error (or cost)
function. This can be achieved taking the gradient of the cost function with respect to parameters
θ0 , θ1 , . . . , θn and applying the gradient descent method:
m
∂J(θ) 1 X 1
− yi xij .

∇J(θ) = = −(θ +θ x +θ x +...+θ x )
(7)
∂θj m i=1 1 + e 0 1 1 2 2 n n

2
2.1 Pipeline of the Task
The goal of this part is to implement the logistic regression method and use it to find the decision
boundary in a classification problem. The problem is to decide the probability that a student will be
admitted to the Master’s program of a university based on the grades of two courses.
The idea is that we already have available the grades
110
for students of previous years, as well as the decision Admitted
100 Not Admitted
(accepted or not). This will be the training dataset for
90
the logistic regression classifier. Figure 2 depicts the
80
training data. The axes correspond to the two courses.

Exam 2 score
70
The blue dots show the grades of the students that 60
were admitted to the Master’s program, while the red 50
× to the students that were rejected. Next, we will ap- 40
ply logistic regression to design a model that predicts 30
the acceptance probability for a student. 20
20 30 40 50 60 70 80 90 100 110
Exam 1 score
The pipeline of the task is in the logistic/main.py
script file. Initially, we load the data contained in the Figure 2: Training data.
data1.txt file. The first two columns correspond to
the two features of the dataset (grades on two courses), while the third one to the class label (admitted
or not). We can also plot the data, as shown in Fig. 2.
# Load t h e d a t a s e t
# The f i r s t two columns c o n t a i n s t h e exam s c o r e s and t h e t h i r d column
# contains the l a b e l .
data = l o a d t x t ( ’ data1 . t x t ’ , d e l i m i t e r = ’ , ’ )

X = data [ : , 0 : 2 ]
y = data [ : , 2 ]

Then, we proceed with the main task. We initialize parameter θ with zero and call the minimize()
optimization function of Python2 , that will minimize the value of the computeCost() function defined
in Eq. (6), with respect to parameter θ. In the minimize() function, we also use the computeGrad()
function, that computes the gradient of the cost function defined in Eq. (7). Also note that the args
parameter contain the training data X and the corresponding class labels. The function returns the
values of parameter θ.
# I n i t i a l i z e f i t t i n g parameters
i n i t i a l t h e t a = zeros ( ( 3 , 1 ) )

# Run m i n i m i z e ( ) t o o b t a i n t h e o p t i m a l t h e t a
R e s u l t = op . minimize ( fun = computeCost , x0 = i n i t i a l t h e t a , a r g s = ( X , y ) , method = ’TNC ’ ,
j a c = computeGrad ) ;
theta = Result . x ;

We can also plot the training data and the decision boundary produced by the logistic regression
method.
# P l o t t h e d e c i s i o n boundary
p l o t x = a r r a y ( [ min ( X [ : , 1 ] ) , max ( X [ : , 1 ] ) ] )
p l o t y = (− 1 . 0 / t h e t a [ 2 ] ) ∗ ( t h e t a [ 1 ] ∗ p l o t x + t h e t a [ 0 ] )
plt . plot ( plot x , plot y )

2 scipy.optimize.minimize function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.


optimize.minimize.html

3
plt . s c a t t e r ( X [ pos , 1 ] , X [ pos , 2 ] , marker= ’ o ’ , c= ’ b ’ )
plt . s c a t t e r ( X [ neg , 1 ] , X [ neg , 2 ] , marker= ’ x ’ , c= ’ r ’ )
plt . x l a b e l ( ’Exam 1 s c o r e ’ )
plt . y l a b e l ( ’Exam 2 s c o r e ’ )
plt . legend ( [ ’ D e c i s i o n Boundary ’ , ’ Not admitted ’ , ’ Admitted ’ ] )
plt . show ( )

Lastly, we perform the classification of the data, i.e., the prediction of the class labels. As we have
described earlier, this can be done using the logistic function that returns a value in the range [0, 1]: if
the predicted value of an instance is greater than 0.5 assign it to the first class (otherwise to the second
one). This is implemented by the predict() function in the predict.py file.
# Compute a c c u r a c y on t h e t r a i n i n g s e t
p = p r e d i c t ( array ( t h e t a ) , X)
counter = 0
f o r i in range ( y . s i z e ) :
i f p [ i ] == y [ i ] :
c o u n t e r += 1
p r i n t ’ T r a i n Accuracy : %f ’ % ( c o u n t e r / f l o a t ( y . s i z e ) ∗ 1 0 0 . 0 )

2.2 Tasks to be Performed


The goal of the lab is to implement the basic components of the logistic regression classification algo-
rithm:
• Fill in the code of the logistic function in the sigmoid.py file. Here we will use the following
logistic sigmoid function3 :

1
S(z) = .
1 + e−z
• Fill in the code of the computeCost() function in the computeCost.py file, based on the for-
mula of Eq. (6). Note that, the terms within the logarithms of Eq. (6) correspond to the sigmoid
(logistic) function of the dot product between the input data X and parameter θ.

• Fill in the code of the computeGrad() function in the computeGrad.py file, based on the for-
mula of Eq. (7).

• Finally, fill in the code in the predict.py file to implement the predict() function. This can
be done by applying the logistic (sigmoid) function to the test data (dot product between test data
X and parameters θ). Note that, the dataset used in the lab is not split into training and test sets;
thus, you are encouraged to apply model evaluation techniques that we have seen in the class,
and in particular the one of cross-validation4 . (For simplicity – but methodologically incorrect approach
which can cause overfitting – you can also measure the accuracy on the training set).

References
[1] Christopher M. Bishop. ”Pattern Recognition and Machine Learning”. Springer-Verlag New York,
Inc., 2006.

[2] Tom M. Mitchell. ”Machine learning”. Burr Ridge, IL: McGraw Hill 45, 1997.
3 Wikipedia’s lemma for http://en.wikipedia.org/wiki/Sigmoid_function.
4 Cross-validation in scikit-learn: http://scikit-learn.org/stable/modules/cross_validation.html

You might also like