0% found this document useful (0 votes)

32 views29 pages

Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG

This document discusses logistic regression, including introducing the logistic function and calculating logits. It explains how changing the threshold in logistic regression impacts model performance, and introduces concepts like confusion matrices, ROC curves, and the AUC. Regularization techniques for logistic regression are also mentioned.

Uploaded by

Akansha Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views29 pages

Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG

Uploaded by

Akansha Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

Notes Logistic Regression -

Standford ML Andrew Ng
Jose Parreno Garcia
March 2018
1 What is Logistic Regression?
1.1 Intuition behind logistic regression
2 Introducing the logistic function
2.1 Steps to calculate logit
2.1.1 Step 1
2.1.2 Additional info for step 1
2.1.3 Step 2
3 Eﬀect of threshold in model performance
3.1 Confusion matrix
3.1.1 Calculating the parameters from the confusion matrix
3.1.2 Small comment on accuracy
3.2 ROC curves
3.2.1 AUC - Area under the curve
4 Hypothesis
5 Understanindg the sigmoid function
5.1 EXAMPLE 1: General sigmoid function
5.2 EXAMPLE 2: Linear boundary
5.3 EXAMPLE 3: Circular boundary
6 Cost function
6.1 Simplified cost function
6.2 Choosing the ?? parameters: using gradient descent
7 Multiclass classification
7.1 Overfitting/underfitting
8 Introducing regularization
8.1 Regularised linear regression
8.2 Regularised logistic regression
8.3 Final intuitation for the regularization parameter
9 Visualising the data
9.1 How to fix this?
9.2 Calculations
9.3 Eﬀect of lambda

https://joparga3.github.io/standford_logistic_regression/ Page 1 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

library(knitr)

1 What is Logistic Regression?

Both linear and logistic regression are used to predict certain results taking into consideration previous
historical data. The main diﬀerence between them is the type of prediction each one does: 1. Linear
regression ??? predicts any value. In order to do this, linear regression takes as input independent values
CONTINUOUS VARIBLES, in other words, variables that can take any value, and hence, our prediction will
also be a continuous variable and can take any value. 2. Logistic regression ??? it predicts the probability
of bounded possible outcomes, in other words, logistic regression is a CLASSIFICATION algorithm.

As mentioned, logistic regression is one of main and most simple CLASSIFICATION algorithms. It extends
the idea of linear regression to cases where the dependent variable, y, only has two possible outcomes,
called classes (careful, this only applies with binary logistic regression; multiple logistic regression deals
with situations where the outcome can have three or more possible types (e.g., “disease A” vs. “disease B”
vs. “disease C”)).

Examples of dependent variables that could be used with logistic regression are predicting whether a new
business will succeed or fail, predicting the approval or disapproval of a loan, and predicting whether a
stock will increase or decrease in value. These are all called classification problems, since the goal is to
figure out which class each observation belongs to.

1.1 Intuition behind logistic regression

The following images show diﬀerent classification examples.

https://joparga3.github.io/standford_logistic_regression/ Page 2 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

https://joparga3.github.io/standford_logistic_regression/ Page 3 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

For us humans, it would be really easy to diﬀerentiate groups in each of the datasets, and we would
immediately be able to draw a boundary that separated them, but, how can a machine do this? Moving
onwards in this blog, we will be focusing on the example shown in the 3 figure, which represents a binary
classification.

2 Introducing the logistic function

The logistic function is a model of the well-known sigmoid function, and the mathematical function which
represent these is the following:

For the sake of curiosity, just mention that the logistic function is used to describe many real-world
situations, for example, population growth. This is easily understood by looking at the normalised graph:
the initial stages suﬀer an exponential growth, but after some time, due to the competition for certain

https://joparga3.github.io/standford_logistic_regression/ Page 4 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

resources (bottle neck), the growth rate decreases until it gets to a stalemate and there is no growth.

This is really cool, but how is this going to help us determine the probability of a classifying data into
diﬀerent classes?

2.1 Steps to calculate logit

2.1.1 Step 1
The first step is to compute the probability that an observation belongs to class 1, using the Logistic
Response Function. In this case, our t parameter (instead of being a temporal response), will be the linear
regression equation:

The coeﬃcients, or ?? values, are selected to maximize the likelihood of predicting a high probability for
observations actually belonging to class 1, and predicting a low probability for observations actually
belonging to class 0. See the graph to hint the coeﬃcients we would like to achieve:

2.1.2 Additional info for step 1

Another way to understanding the probabilities to classify elements into classes is by using ODDS:

https://joparga3.github.io/standford_logistic_regression/ Page 5 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

This called the “Logit” and looks like linear regression.

2.1.3 Step 2
In the second step of logistic regression, a threshold value is used to classify each observation into one of
the classes. For example, if we chose a threshold of 0.5, that would mean that if P(y = 1) > 0.5 , then the
observation would be classified into class 1, and the rest into class 0.

The threshold value is something the user can specify, so, which is the best value for it? This election often
depends on error preferences and there are two types of errors that we need to take into consideration:
false positives, and false negatives.

A false positive error is made when the model predicts class 1, but the observation actually belongs to
class 0. A false negative error is made when the model predicts class 0, but the observation actually
belongs to class 1. The perfect model would classify all classes correctly: all 1´s (or trues) as 1´s, and all 0´s
(or false) as 0´s. We would have then FN = FP = 0!! Fantastic!

3 Effect of threshold in model

performance
As we all know, the perfect predictive model doesn’t exist. So what happens when we choose diﬀerent
threshold values?

https://joparga3.github.io/standford_logistic_regression/ Page 6 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

1. Higher threshold value. Classify as 1 if P(y=1) > 0.7. The model is being more restrictive when
classifying as 1´s, and therefore, more False Negative errors will be made.
2. Lower threshold value. . Classify as 1 if P(y=1) > 0.3. The model is now being less strict and we are
classifying more examples as class 1, therefore, we are making more false positives errors.

Ok, this is all very good to know, but how to choose the strategy to follow? How do we know if we want to
make more FN errors or more FP errors? A fair answer to the previous question would be that to know
which strategy to follow can initially depend on your experience and risk-averse problem you are trying to
solve. Lets present 2 examples:

Example 1: Treating patients.

One application where decision-makers often have an error preference is in disease prediction. Suppose
you built a model to predict whether or not someone will develop heart disease in the next 10 years. We
will consider class 1 to be the outcome in which the person does develop heart disease, and class 0 the
outcome in which the person does not develop heart disease. If you pick a high threshold, you will tend to
make more false negative errors, which means that you predicted that the person would not develop heart
disease, but they actually did. If you pick a lower threshold, you will tend to make more false positive
errors, which means that you predicted they would develop heart disease, but they actually did not. In this
case, a false positive error is often preferred. Unnecessary resources might be spent treating a patient who
did not need to worry, but you did not let as many patients go untreated (which is what a false negative
error does).

Example 2: Email junkbox.

Now, let’s consider spam filters. Almost every email provider has a built in spam filter that tries to detect
whether or not an email message is spam. Let’s classify spam messages as class 1 and non-spam
messages as class 0. Then if we build a logistic regression model to predict spam, we will probably want
to select a high threshold. Why? In this case, a false positive error means that we predicted a message
was spam, and sent it to the spam folder, when it actually was not spam. We might have just sent an
important email to the junk folder! On the other hand, a false negative error means that we predicted a
message was not spam, when it actually was. This creates a slight annoyance for the user (since they have
to delete the message from the inbox themselves) but at least an important message was not missed.

To understand better how to choose the threshold value, we make use of the confusion matrix and
the ROC curve.

3.1 Confusion matrix

A confusion matrix, also known as a contingency table or an error matrix, is a specific table layout that
allows visualization of the performance of an algorithm. Each column of the matrix represents the
instances in a predicted class, while each row represents the instances in an actual class. The name stems
from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabelling
one as another). We have actually seen a confusion matrix in this paper before, with the image of true
positives, true negatives, false positives and false negatives. Here goes another equivalent image:

https://joparga3.github.io/standford_logistic_regression/ Page 7 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

Numeric example

Imagine we have created a classification system that has been trained to distinguish between cats, dogs
and rabbits, a confusion matrix will summarize the results of testing the algorithm for further inspection.
Assuming a sample of 27 animals - 8 cats, 6 dogs, and 13 rabbits, the resulting confusion matrix could
look like the one below.

3.1.1 Calculating the parameters from the confusion matrix

. SENSITIVITY: measures the proportion of positives which are correctly identified as such . SPECIFICITY:
measures the proportion of negatives which are correctly identified as such . ACCURACY: measures the
proportion of positives and negatives that have been correctly labelled.

3.1.2 Small comment on accuracy

Needless to say, that if you test diﬀerent thresholds and you achieve better accuracies, obviously you are
doing the correct thing. But, you have to take into consideration a small weakness if you use accuracy as
your analysis tool: it will yield misleading results if the data set is unbalanced (that is, when the number of
samples in diﬀerent classes vary greatly). For example, if there were 95 cats and only 5 dogs in the data
set, the classifier could easily be biased into classifying all the samples as cats. The overall accuracy
would be 95%, but in practice the classifier would have a 100% recognition rate for the cat class but a 0%

https://joparga3.github.io/standford_logistic_regression/ Page 8 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

recognition rate for the dog class. Therefore, if you goal was to be build a machine that was able to classify
both types, using accuracy wouldn’t be useful, because dogs wouldn’t be detected. This is why, you
always have to study the results of the data and not rely solely on a single number or parameter.

3.2 ROC curves

In statistics, a receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the
performance of a binary classifier system as its discrimination threshold is varied. The curve is created by
plotting the true positive rate (sensitivity) against the false positive rate (1 - specificity) at various threshold
settings.

The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the
ROC space, representing 100% sensitivity (no false negatives) and 100% specificity (no false positives).
The (0,1) point is also called a perfect classification.

3.2.1 AUC - Area under the curve

The ROC curve motivates an important metric for classification problems: the AUC, or Area Under the
Curve. The AUC of a model gives the area under the ROC curve, and is a number between 0 and 1. The
higher the AUC, the more area under the ROC curve, and the better the model. The AUC of a model can
be interpreted as the model’s ability to distinguish between the two diﬀerent classes. If the model were
handed two random observations from the dataset, one belonging to one class and one belonging to the
other class, the AUC gives the proportion of the time when the observation from class 1 has a higher
predicted probability of being in class 1. If you were to just guess which observation was which, this would
be an AUC of 0.5. So a model with an AUC greater than 0.5 is doing something smarter than just guessing,
but we want the AUC of a model to be as close to 1 as possible.

https://joparga3.github.io/standford_logistic_regression/ Page 9 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

https://joparga3.github.io/standford_logistic_regression/ Page 10 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

https://joparga3.github.io/standford_logistic_regression/ Page 11 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

4 Hypothesis
Recall that the hypothesis expression for linear regression was:

Logistic regression cannot rely solely on a linear expression to classify, and in addition to that, using a
linear classifier boundary requires the user to establish a threshold where the predicted continuous
probabilities would be grouped into the diﬀerent classes. This is why logistic regression makes use of the
sigmoid function. Let me present the logistic regression hypothesis function:

https://joparga3.github.io/standford_logistic_regression/ Page 12 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

5 Understanindg the sigmoid function

5.1 EXAMPLE 1: General sigmoid function

Let’s say we decide to establish a threshold of 0.5 (just to adapt to the sigmoid function cut in the Y-axis).
So:

In order to achieve this:

https://joparga3.github.io/standford_logistic_regression/ Page 13 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

5.2 EXAMPLE 2: Linear boundary

We have calculated the θ parameters of the linear expression and have the following data:

Using the same analysis as we did for the general sigmoid function:

5.3 EXAMPLE 3: Circular boundary

We have calculated the θ parameters of the linear expression and have the following data:

Using the same analysis as we did for the general sigmoid function:

https://joparga3.github.io/standford_logistic_regression/ Page 14 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

IMPORTANT NOTE! From these 3 examples we observe that the decision boundary is not directly
defined by the training dataset, but by the θ parameters.

6 Cost function
Now that we know what the expression of our logistic regression hypothesis is, we need to know how to
define the cost function in order to evaluate the errors a logistic model is going to make. Recall the cost
function for linear regression:

If we minimized this function applying our new h? ?(x(i) ) hypothesis we cannot assure that we will
T
converge the global minimum of the cost function! As h ? ?(x) = 1/(1 + e(?? X) ) is not linear we might end
up in a local minimum. Therefore, our new cost function will be:

https://joparga3.github.io/standford_logistic_regression/ Page 15 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

6.1 Simplified cost function

Thankfully, as we are dealing with a binary classification problem and y can only be 0 or 1, then the cost
function can be simplified to the following expression:

https://joparga3.github.io/standford_logistic_regression/ Page 16 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

6.2 Choosing the ?? parameters: using gradient

descent
The gradient descent iterative process used in logistic regression is exactly the same than the one used for
linear regression. The only diﬀerence between both, is the input hypothesis. Therefore, the gradient
descent algorithm is again:

7 Multiclass classification
Until now we have been only focusing on binary classification, but what happens when instead of
classifying with a Yes or a No, we want to classify with many classes? For example, classifying your emails
in family, work, travel, bills, etc? The answer to this is very simple: we run binary classification for all
possible classes and then pick the one with the highest value.

https://joparga3.github.io/standford_logistic_regression/ Page 17 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

https://joparga3.github.io/standford_logistic_regression/ Page 18 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

7.1 Overfitting/underfitting
The best way to understand what is over and underfitting is showing 3 simple examples:

Underfitting: clearly the model is too simple and requires additional features to adjust better to the data
Overfitting: the model adjust incredibly well to the training data, but if that model is applied to the unseen
data it will fail to generalise new examples.

https://joparga3.github.io/standford_logistic_regression/ Page 19 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

How to know if our model is overfitting? 1. In very simple models were we only use 2 parameters, we can
plot the data (just like we did above) and check if the model is under or overfitting. 2. But with model were
there number of features is really high, we cannot take this approach as the visualisation of the data would
be extremely complicated. 2.1. This can be solved by feature reduction to then plot the data, but this will
be explained in further posts. 2.2. It can be also solved by REGULARIZATION.

8 Introducing regularization
Suppose we have the previous models and that the regression models are the following:

The general idea would be that we want to make the high order parameters (??3 and ??4) really small to
have a simple model that would be less prone to overfitting.

Perfect, by doing this we have reduced our function to a second order model which is simpler. But again,
in this case we know which the high order terms are. How do we then deal with the general case were we
might have 100 features? How do we know which is/are the high order terms? In order to tackle this issue,
we introduce the REGULARIZATION PARAMETER.

8.1 Regularised linear regression

Cost function:

https://joparga3.github.io/standford_logistic_regression/ Page 20 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

Gradient descent:

8.2 Regularised logistic regression

Cost function:

Gradient descent:

https://joparga3.github.io/standford_logistic_regression/ Page 21 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

8.3 Final intuitation for the regularization

parameter
Even though in following posts we will be looking into this regularization parameters in more details, I
wanted to show you a simple example on how can λ aﬀect the model. Suppose we have the same
example as before and we choose a huge value for λ :

9 Visualising the data

In this case we will learn implement regularized logistic regression to predict whether microchips from a
fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to
ensure it is functioning correctly. Suppose you are the product manager of the factory and you have the
test results for some microchips on two diﬀerent tests. From these two tests, you would like to determine
whether the microchips should be accepted or rejected. To help you make the decision, you have a
dataset of test results on past microchips, from which you can build a logistic regression model.

Data sample:

https://joparga3.github.io/standford_logistic_regression/ Page 22 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

Plotting the data:

The figure shows that our dataset cannot be separated into positive and negative examples by a straight-
line through the plot. Therefore, a straight-forward application of logistic regression will not perform well on
this dataset since logistic regression will only be able to find a linear decision boundary.

9.1 How to fix this?

1. One way is to create more features more each data point, ie, creating polynomial features for each
x1 and x2. For example, we could create a vector. What have we achieved with this? Well, firstly we
had only a 2 dimensional vector with x1 and x2, and now we have 28 dimensional vector. Obviously,
applying a straight forward logistic regression on a 28 dimensional vector will have a more complex
decision boundary and will appear nonlinear. HOWEVER, this strategy is much more susceptible to
overfitting.

https://joparga3.github.io/standford_logistic_regression/ Page 23 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

2. To try to apply a non-linear decision boundary and at the same time tackle the overfitting issue that
a high dimensional vector can introduce, we will introduce regularization parameters to our logistic
regression model.

9.2 Calculations

To show a numeric example of how the choosing / optimization of the theta values aﬀect the cost of the
model, let´s say we decide that THETA parameters are all equal to zero:

This is our starting point to compare if the model is actually converging to a minimum. In linear regression,
we understood how gradient descent worked. In this case, I want to present another way to deal with the
optimization of the theta parameters that minimize the cost function. This method is a function already
built in Matlab called FMINUNC. Here is the code to use it:

% Set regularization parameter lambda to 1 (you should vary this) lambda = 1;

% Set Options options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, 400);

% Optimize [theta, J, exit_flag] = … fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

https://joparga3.github.io/standford_logistic_regression/ Page 24 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

fminunc will call the cost function (in this case, you have to code this function yourself. You can implement
the simplified cost function for logistic regression shown above), a random initial theta, and some
debugging parameters like the maximum iterations we want and that we want fminunc to return both the
theta parameters and the cost function.

The results show that we have improved the model by choosing some thetas that have drastically reduced
the magnitude of the cost function. Let´s plot the decision boundary that the logistic regression model has
created:

The graph shows the linear boundary the logistic regression model has created. It pretty much divides
correctly most example, although there are some were the model incorrectly predicts the real outcome. Let
´s evaluate the accuracy of these training examples (in this case we haven´t produced a test set in which to
produce accuracy tests of unseen data). As we saw in logistic theory, the logistic regression model returns
probabilities of each of the examples belonging to diﬀerent classes. To finally classify in a binary

https://joparga3.github.io/standford_logistic_regression/ Page 25 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

classification model like this one, we need to establish a threshold. Let´s evaluate the performance of the
model with different thresholds. These different selection will show the effect the threshold has in the
model´s accuracy and which one would suit best your specific problem.

Taking into consideration the nature of the problem, if we want to detect those microchips that have
defects we need to decide which threshold are we going to apply to the model. This decision would need
to take into consideration many factors such as chip manufacturing costs (just in case we throw away non-
defective chips that were labelled as defective), costs related to delivering bad quality chips etc. Let´s say
that we want to detect as many defective chips as possible. To do that we will like to only categorise as 1´s
(accepted) the ones that have a very high probability, in other words, their quality parameters are nearly
perfect. If we are very selective when categorising as 1´s, then we are allowing more chips to be labelled
as defective. Our accuracy would decrease, but you would be flagging nearly all (if not all) defective chips.
Again, this is just a really simplified way of thinking about quality checks but it is useful to understand
threshold selection and the logistic regression classification model.

https://joparga3.github.io/standford_logistic_regression/ Page 26 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

9.3 Effect of lambda

For “educational” purposes, just as I did with different threshold selections, in this case we will see the
effect of lambda in the logistic regression model. In the theory section, we learnt that lambda acts as a
parameter that controls high order features, in other words, it control under and overfitting behaviours. Let
´s see what happens to the model when choosing different lamda parameters.

https://joparga3.github.io/standford_logistic_regression/ Page 27 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

With a regularization parameter lambda = 0, no high order features are controlled or attenuated, and
therefore the model is much more prone to suﬀer overfitting.

https://joparga3.github.io/standford_logistic_regression/ Page 28 of 29
Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

https://joparga3.github.io/standford_logistic_regression/ Page 29 of 29

Classification
100% (2)
Classification
105 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Lecture Notes Chapt13
No ratings yet
Lecture Notes Chapt13
15 pages
Logistic Regression[2]
No ratings yet
Logistic Regression[2]
36 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Module 2
No ratings yet
Module 2
92 pages
Chapter 10 Logistic Reg (Python)
No ratings yet
Chapter 10 Logistic Reg (Python)
29 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
ML Tutorial
No ratings yet
ML Tutorial
45 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Binary Logistic (5)
No ratings yet
Binary Logistic (5)
29 pages
Logistic Regression by IntuitiveAI v2.5
No ratings yet
Logistic Regression by IntuitiveAI v2.5
8 pages
lec20
No ratings yet
lec20
16 pages
3-Intro-to-Logistic-Regression-LT
No ratings yet
3-Intro-to-Logistic-Regression-LT
18 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
MLS - Logistic Regression
No ratings yet
MLS - Logistic Regression
13 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
ML Algo
No ratings yet
ML Algo
36 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Logistic+regression Data
No ratings yet
Logistic+regression Data
13 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
Unit II
100% (1)
Unit II
13 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Chapter 4 Statistical Classification Methods
No ratings yet
Chapter 4 Statistical Classification Methods
63 pages
Rao K.S - Introduction To Partial Differential Equations (2011) PDF
93% (15)
Rao K.S - Introduction To Partial Differential Equations (2011) PDF
499 pages
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
No ratings yet
Session9-LogisticRegression_a6c5bc556df30fa3eb779e22e464a08a - Copy
33 pages
lec43
No ratings yet
lec43
9 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Logistic Regression
No ratings yet
Logistic Regression
1 page
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic regression
No ratings yet
Logistic regression
12 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
States, State Graphs, and Transition Testing: Unit Iv
No ratings yet
States, State Graphs, and Transition Testing: Unit Iv
42 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Certainty Risk and Uncertainty
No ratings yet
Certainty Risk and Uncertainty
4 pages
Univariate Data-Lesson Ppt
No ratings yet
Univariate Data-Lesson Ppt
41 pages
Quant Checklist 463 by Aashish Arora For Bank Exams 2024
100% (1)
Quant Checklist 463 by Aashish Arora For Bank Exams 2024
102 pages
AMC 10 Statistics and Probability Problems - Year Wise - Cheenta Academy
No ratings yet
AMC 10 Statistics and Probability Problems - Year Wise - Cheenta Academy
18 pages
17431315881771757076
No ratings yet
17431315881771757076
30 pages
17498187711371771427
No ratings yet
17498187711371771427
22 pages
Symmetry
No ratings yet
Symmetry
16 pages
Deriv.DBot
No ratings yet
Deriv.DBot
6 pages
STATES dANCES
No ratings yet
STATES dANCES
8 pages
A3 solutions
No ratings yet
A3 solutions
2 pages
CET_Module 6-42-62
No ratings yet
CET_Module 6-42-62
21 pages
CH 2 1
No ratings yet
CH 2 1
49 pages
Motion in A Plane All Derivations Download Free
100% (1)
Motion in A Plane All Derivations Download Free
2 pages
Application of Linear Algebra in Electronics
No ratings yet
Application of Linear Algebra in Electronics
15 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
1 - ROTATIONAL DYNAMICS_50551129_2025_01_26_00_12
No ratings yet
1 - ROTATIONAL DYNAMICS_50551129_2025_01_26_00_12
11 pages
Chapter6 (Physical Symbol)
No ratings yet
Chapter6 (Physical Symbol)
18 pages
2nd Periodic Exam in Religion
No ratings yet
2nd Periodic Exam in Religion
46 pages
Hooke's Law (Multiple Choice) QP
No ratings yet
Hooke's Law (Multiple Choice) QP
8 pages
It 5
No ratings yet
It 5
22 pages
Examination: Communication and Coding
No ratings yet
Examination: Communication and Coding
46 pages
IntroMIMO 10 Things To Know
No ratings yet
IntroMIMO 10 Things To Know
24 pages
Grade 9 Math Activity - Term 2 (2023-2024)
No ratings yet
Grade 9 Math Activity - Term 2 (2023-2024)
4 pages
gr10t2 Trigonometry Cartesian Plane
No ratings yet
gr10t2 Trigonometry Cartesian Plane
4 pages
Asn1 Ber
No ratings yet
Asn1 Ber
27 pages
Machine Learning OBE Question Paper 2020
0% (1)
Machine Learning OBE Question Paper 2020
3 pages
Unofficial Transcript
No ratings yet
Unofficial Transcript
4 pages
Class Vii Worksheet
No ratings yet
Class Vii Worksheet
11 pages
CSE 211 (Theory of Computation) : Atif Hasan Rahman
No ratings yet
CSE 211 (Theory of Computation) : Atif Hasan Rahman
19 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
C#FAQ
No ratings yet
C#FAQ
34 pages
Create Database ABACO
No ratings yet
Create Database ABACO
3 pages
UP2 HW CH 23 S Electric Fields
No ratings yet
UP2 HW CH 23 S Electric Fields
5 pages
Test Bank For Quantitative Analysis For Management 12th Edition
No ratings yet
Test Bank For Quantitative Analysis For Management 12th Edition
12 pages
Chapter 11
No ratings yet
Chapter 11
11 pages
Tabela Momentos Inercia
No ratings yet
Tabela Momentos Inercia
1 page
AI for Good: Applications in Sustainability, Humanitarian Action, and Health
From Everand
AI for Good: Applications in Sustainability, Humanitarian Action, and Health
Juan M. Lavista Ferres
No ratings yet
Untangling Logistic Regression: A Comprehensive Guide
From Everand
Untangling Logistic Regression: A Comprehensive Guide
Pasquale De Marco
No ratings yet

Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG

Uploaded by

Lecture - 6.2 - Logistic Regression - Standford ML Andrew NG

Uploaded by

Notes Logistic Regression - Standford ML Andrew Ng 03/02/20, 11(18 AM

Notes Logistic Regression -

1 What is Logistic Regression?

1.1 Intuition behind logistic regression

2 Introducing the logistic function

2.1 Steps to calculate logit

2.1.2 Additional info for step 1

This called the “Logit” and looks like linear regression.

3 Effect of threshold in model

Example 1: Treating patients.

Example 2: Email junkbox.

3.1 Confusion matrix

3.1.1 Calculating the parameters from the confusion matrix

3.1.2 Small comment on accuracy

3.2 ROC curves

3.2.1 AUC - Area under the curve

5 Understanindg the sigmoid function

In order to achieve this:

5.2 EXAMPLE 2: Linear boundary

5.3 EXAMPLE 3: Circular boundary

6.1 Simplified cost function

6.2 Choosing the ?? parameters: using gradient

8.1 Regularised linear regression

8.2 Regularised logistic regression

8.3 Final intuitation for the regularization

9 Visualising the data

Plotting the data:

9.1 How to fix this?

% Set regularization parameter lambda to 1 (you should vary this) lambda = 1;

% Set Options options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, 400);

% Optimize [theta, J, exit_flag] = … fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

9.3 Effect of lambda

You might also like