Samatrix Kaa Kaam
Samatrix Kaa Kaam
Logistic regression is a statistical method used for binary classification problems, where the goal is to
predict a binary outcome (i.e., 0 or 1). It is a type of generalized linear model that uses a logistic function
to model the relationship between the dependent variable (the binary outcome) and one or more
independent variables (predictors).
The logistic regression model uses a logistic function, which is an S-shaped curve that maps any real-
valued input to a value between 0 and 1. The logistic function is defined as:
p(x) = 1 / (1 + e^(-z))
where p(x) is the predicted probability of the binary outcome (i.e., 1), x is the input variable or predictor,
and z is the weighted sum oftheinputvariables.Q-
2Differentiatebetweenlinearregressionandlogisticregression.
Linear regression and logistic regression are both statistical methods used to model relationships
between a dependent variable and one or more independent variables. However, there are some
important differences between the two methods.
Dependent variable:
Linear regression is used for continuous numerical outcomes, whereas logistic regression is used for
binary categorical outcomes.
Functional form:
Linear regression assumes a linear relationship between the dependent variable and independent
variables, while logistic regression assumes a non-linear relationship between the dependent variable
and independent variables through a logistic function.
Output:
Modelinterpretation:
In linear regression, the coefficients represent the change in the dependent variable associated with a
unit change in the independent variable. In logistic regression, the coefficients represent the change in
the log odds of the event occurring associated with a unit change in the independent variable.
Error distribution:
Linear regression assumes a normal distribution of the error term, while logistic regression assumes a
binomial distribution of the error term.
In logistic regression, the cost function (also known as the log loss or cross-entropy loss) is a measure of
the difference between the predicted probabilities and the actual binaryoutcomesofthe trainingdata.
Thegoalofthe costfunction is to minimize thedifference between thepredicted probabilities and the
actual outcomes by adjusting the values of the weights in the logistic regression model.
where J(w) is the cost function, m is the number of training examples, yi is the actual binary outcome (0
or 1) for the ith training example, h(xi) is the predicted probability of the binary outcome for the ith
training example, and w is the vector of weights.
The cost function penalizes the model for making incorrect predictions, with a higher penalty for more
confident predictions that turn out to be wrong. It is minimized using an
optimizationalgorithmsuchasgradientdescent,whichiterativelyupdates the weights to find thevalues that
minimize thecost function.
Like any other statistical method, logistic regression has certain assumptions that must be met for it to
be valid and reliable. Here are some of the key assumptions for logistic regression:
Binary Outcome: The dependent variable (outcome) must be binary (e.g., 0 or 1).
Independence of Observations: The observations used in the model should be independent of each
other.
Linearity: The relationship between the independent variables and the log odds of the outcome should
be linear. This assumption can be checked by examining the scatter plot of the log odds of the outcome
versus each independent variable.NoMulticollinearity:Theindependentvariables should be linearly
independent and nothighlycorrelated with each other. Multicollinearity can lead to unstable and
unreliable coefficient estimates.
Large Sample Size: Logistic regression performs best with a large sample size, as it relies on the central
limit theorem to ensure that the coefficients are normally distributed.