Deep Learning Week 204-4
Deep Learning Week 204-4
A supervised learning algorithm for binary classification problems, where the target variable 𝑦 can only take two values, typically represented as 0 and 1. The
objective is to find the optimal parameters that minimize the discrepancy between the predicted probabilities and the actual binary outcomes in the training dataset.
Given an input feature vector 𝑥 (e.g., representing an image like that
Example There is a truck (1) used in the last lesson), the goal is to develop an algorithm that outputs
There is no a truck (0) a prediction, denoted as 𝑦,( which represents the estimated probability
Logistic regression takes an input features vector (𝑥), multiplies that the true label 𝑦 is equal to 1, given the input features 𝑥. In other
it by the weights vector (𝑊), adds the bias term (𝑏), and then words, if 𝑥 is an image, 𝑦( indicates the likelihood that the image depicts
applies the sigmoid function (𝜎) to the result. This produces the a truck. Given 𝑥, 𝑦( = 𝑃 𝑦 = 1 𝑥 , where 0 ≤ 𝑦( ≤ 1.
output (𝑦),
% which represents the predicted probability of the input Input features vector (𝑥): An n-dimensional vector that represents the input data. Each element of the vector corresponds to a specific feature of the input. The
dimension of the vector (𝑛! ) is determined by the number of features. ->𝑥 ∈ ℝ"! , where 𝑛! is the number of features.
belonging to the positive class. The training process involves
Training labels (𝑦): It is the true label or target variable associated with each input example. In binary classification, 𝑦 is either 0 or 1, indicating the class to which
learning the optimal values for the weights (𝑊) and bias (𝑏) the input belongs (e.g., truck or not truck ).-> 𝑦 ∈ 0,1
based on the provided training labels (𝑦). Weights (𝑊): an n-dimensional vector that represents the learned parameters of the model. Each element of the weights vector corresponds to the
importance or contribution of its respective feature in making the classification decision.-> w ∈ ℝ"! , where 𝑛! is the number of features.
Threshold (𝑏): A scalar value that represents the bias term or the intercept of the logistic regression model. It helps to shift the decision boundary of
the model.->𝑏 ∈ ℝ
In logistic regression, we want to predict the probability of an input Output (𝑦):
7 The predicted probability of the input belonging to the positive class (usually denoted as class 1). It is calculated by applying the sigmoid
belonging to a particular class. To do this, we use a linear function function to the weighted sum of the input features and the bias term. -> 𝑦7 = 𝜎(𝑤 # 𝑥 + 𝑏)
𝑤 # 𝑥 + 𝑏 that combines the input features (𝑥) with learned weights Sigmoid function (𝜎): A mathematical function that maps any real-valued input to a value between 0 and 1. In logistic regression, the sigmoid function
is used to squash the weighted sum of the input features and the bias term, transforming it into a probability value. The formula for the sigmoid
(𝑤) and a bias term (𝑏). This linear function can output any real function is given as 𝑠 = 𝜎 𝑤 # 𝑥 + 𝑏 = 𝜎(𝑧) =
$%&
$
"#
, where 𝑧 is the input to the function.
number.
However, since we are looking for a
Key Observations of the Sigmoid Function probability, we need to map the output of
If 𝑧 is a large positive number, the sigmoid function outputs a value close to 1. This means that when the linear the linear function to a value between 0
function gives a large positive output, the sigmoid function indicates a high probability of the input belonging to the
positive class. and 1. That's where the sigmoid function
If 𝑧 is a small or large negative number, the sigmoid function outputs a value close to 0. This means that when the linear comes in. It takes the output of the linear (z) (-z)
function gives a large negative output, the sigmoid function indicates a low probability of the input belonging to the positive function and squashes it into the range
class.
[0, 1]. This ensures that the output can
If 𝑧 is equal to 0, the sigmoid function outputs a value of 0.5. This represents an equal probability of the input belonging to either class.
be interpreted as a probability. The sigmoid function has an S-shaped curve that smoothly transitions
By applying the sigmoid function to the output of the linear function, we can obtain a probability estimate between 0 and 1, which is suitable for binary classification tasks. from 0 to 1, with the steepest change occurring around 𝑧 = 0.