0% found this document useful (0 votes)
41 views

Unit 3c Linear Regression

This document discusses different types of linear regression models: - Simple linear regression involves one dependent and one independent variable. - Multiple linear regression involves one dependent variable and two or more independent variables. It allows modeling of more complex relationships between variables. - Ordinary least squares and gradient descent are two common methods for fitting linear regression models to minimize error between predicted and actual values. Multiple linear regression allows modeling relationships between a dependent variable and multiple factors that influence it.

Uploaded by

Madhav Chaudhary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Unit 3c Linear Regression

This document discusses different types of linear regression models: - Simple linear regression involves one dependent and one independent variable. - Multiple linear regression involves one dependent variable and two or more independent variables. It allows modeling of more complex relationships between variables. - Ordinary least squares and gradient descent are two common methods for fitting linear regression models to minimize error between predicted and actual values. Multiple linear regression allows modeling relationships between a dependent variable and multiple factors that influence it.

Uploaded by

Madhav Chaudhary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Regression

Simple and Multiple Linear Regression, Nonlinear Regression, Logistic


Regression
Supervised/Unsupervised Learning

• Supervised Learning It is the machine learning task of inferring


a function from training & testing data
Example: Classification & Regression Problems

• Unsupervised Learning refers to the problem of trying to find


hidden structure in unlabeled data
Example: Clustering Problems
You have to only deal with ……..
Examples
• Regression Problem
• Prediction of wheat production
• Prediction of rainfall
• Point prediction of Stock Exchange
• Prediction of Price

• Classification Problems
• Prediction of cancer
• Win prediction of Sh. Narendra Modi ji
• Diabetic Prediction
• Classification of e-mail
Understand the Data……..
Classification & Regression
Data Set : UCI Library
Google “uci dataset”
Linear Regression
A statistical method that is used for predictive analysis
Makes predictions for continuous/real or numeric variables such as sales, salary,
age, product price, etc.

Shows a linear relationship between a dependent (y) and one or more


independent (X) variables, hence called as linear regression

Dependent variable is continuous in nature

linear regression shows the linear relationship, which means it finds how the
value of the dependent variable is changing according to the value of the
independent variable
Linear regression performs the task to predict a
dependent variable value (y) based on a given
independent variable (x)

So, this regression technique finds out a linear


relationship between x (input) and y(output).
Hence, the name is Linear Regression

When the value of x increases, the value of y is


likewise increasing

X (input) is the work experience and Y (output)


is the salary of a person

The regression line is the best fit line for our


model
Simple Linear Regression

If there is a single input variable (x), such linear regression is called simple linear regression
y= b*x+a
or
y= a0+a1*x
• Y= Dependent Variable (Target Variable) (ESTIMATED OUTPUT)
• X= Independent Variable (predictor Variable) (INPUT)
• a0= Intercept of the line (Gives an additional degree of freedom) (regression coefficient)
• a1 = Linear regression coefficient (scale factor to each input value) (SLOPE) (regression coefficient)
The motive of the linear regression algorithm is to find the best values for a_0 and a_1
X: amount of fertilizer, y: size of crop (every time we add a unit to X, the dep variable y increases
proportionally)
Multiple Linear Regression
If there is more than one input variable, such linear regression is called multiple linear
regression
y= a + b_1*x_1 + b_2*x_2 + b_3*x_3+… + b_n*x_n
or
y= a_0 + a_1*x_1 + a_2*x_2+.. + a_n*x_n

Eg: Add amount of sunlight and rainfall in a growing season to the fertilizer variable, with all 3
affecting y

X_1: amount of fertilizer,


X_2: amount of sunlight,
X_3: amount of rainfall,
y: size of crop
y= 0.9 + 1.2*x1 + 2*x2 + 4*x3

Which feature is more important?


Example, Result (y)
quiz(x1), Assignments (x2), Project(x3)

A Linear Regression model’s main aim is to find the best fit


linear line and the optimal values of intercept and
coefficients such that the error is minimized

Error is the difference between the actual value and


Predicted value and the goal is to reduce this difference
• x is our independent variable which is plotted
on the x-axis
• y is the dependent variable which is plotted
on the y-axis
•Black dots are the data points i.e the actual
values
•a0 is the intercept which is 10
• a1 is the slope of the x variable
•The blue line is the best fit line predicted by
a1
the model i.e the predicted values lie on the
blue line.

The vertical distance between the data point


and the regression line is known as error or
a0 residual

Each data point has one residual and the sum


of all the differences is known as the Sum of
Residuals/Errors
Residual/Error = Actual values – Predicted Values

Sum of Residuals/Errors = Sum(Actual- Predicted Values)

Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2


Cost Function (J)
• To figure out the best possible values for a_0 and a_1 which would provide the best fit line for the data
points
• Since we want the best values for a_0 and a_1, we convert this search problem into a minimization problem
where we would like to minimize the error between the predicted value and the actual value

This provides the average squared error over all the data points. Therefore, this cost function is also known as
the Mean Squared Error (MSE) function
Now, using this MSE function we are going to change the values of a_0 and a_1 such that the MSE value settles
at the minima
Linear regression: A sample curve-fitting

• Y = m*x + b

• Method of least
square
• Gradient descent
approach

16
Ordinary Least Squares (OLS) Method

y= m*x+b

Linear Regression Simplified - Ordinary Least Square vs Gradient Descent: https://towardsdatascience.com/linear-


regression-simplified-ordinary-least-square-vs-gradient-descent-48145de2cf76
https://shakewingo.github.io/GD-vs-OLS/
Gradient Descent

• Gradient descent is a method of updating a_0 and a_1 to reduce the cost function (MSE)
• The idea is that we start with some values for a_0 and a_1 and then we change these
values iteratively to reduce the cost

In the gradient descent


algorithm, the number of steps
you take is the learning rate

This decides on how fast the


algorithm converges to the
minima
Gradient Descent
To find these gradients, we take partial derivatives with respect to a_0 and a_1
Gradient Descent

To find these gradients, we take partial derivatives with respect to a_0 and a_1

The partial derivates are the gradients and they are used to update the values of
a_0 and a_1

Alpha is the learning rate which is a hyper-parameter that you must specify

A smaller learning rate could get you closer to the minima but takes more time to
reach the minima, a larger learning rate converges sooner but there is a chance
that you could overshoot the minima
Multiple/Multivariate Linear
Regression
Simple Linear Regression

One dependent variable and only one independent variable

If “income” is explained by the “education” of an individual then the


regression is expressed in terms of simple linear regression as follows:

Income= b0+ b1*Education + e


e is the model random error

It’s the simplest form of Linear Regression that is used when there is a single input
variable for the output variable
Multiple Linear Regression

We will see how multiple input variables together


influence the output variable, while also learning how
the calculations differ from that of Simple LR model
Multiple Linear Regression

Instead of one, there will be two or more independent variables

If we have an additional variable (let’s say “experience”) in the previos equation, then it
becomes a multiple regression:

Income= b0+ b1*Education+ b2*Experience+ e

It’s a form of linear regression that is used when there are two or more predictors
Multiple Linear Regression

Multiple linear regression is used to estimate the relationship between two or more
independent variables and one dependent variable. You can use multiple linear regression
when you want to know:

 How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added
affect crop growth)
 The value of the dependent variable at a certain value of the independent variables
(e.g. the expected yield of a crop at certain levels of rainfall, temperature, and
fertilizer addition)
Applications

• You are a public health researcher interested in social factors that


influence heart disease. You survey 500 towns and gather data on the
percentage of people in each town who smoke, the percentage of
people in each town who bike to work, and the percentage of people in
each town who have heart disease

• Because you have two independent variables and one dependent


variable, and all your variables are quantitative, you can use multiple
linear regression to analyze the relationship between them
Applications

• Prediction of used-car prices based on make, model, year, shift and color
[make, model, year, shi , color] → car prices

• Prediction of the price for a house in the market based on location, lot
size, number of bed rooms, neighborhood characteristics etc.
[loca on, lot size, # bed rooms, crime rate, school ra ngs] → house prices
Simple vs. Multi Linear Regression

Can we use simple linear regression to study our output against all independent
variables separately? NO

Some of these factors will affect the price of the house positively
For example: more the area, the more the price
On the other hand, factors like distance from the workplace, and the crime rate can
influence your estimate of the house negatively

Running separate simple linear regressions will lead to different outcomes when we
are interested in just one
There may be an input variable that is itself correlated with or dependent on some
other predictor. This can cause wrong predictions and unsatisfactory results
Multiple Linear Regression

The line of best fit through the data points is a


straight line, rather than a curve or some sort
of grouping factor
Multiple Linear Regression

Y is the output variable, and X terms are the


corresponding input variables

Notice that this equation is just an extension of


Simple Linear Regression, and each predictor
has a corresponding slope coefficient (β).
Multiple Linear Regression

The first β term (βo) is the intercept constant


(unit value) and is the value of Y in absence of
all predictors (i.e when all X terms are 0)

It may or may or may not hold any significance


in a given regression problem. It’s generally
there to give a relevant nudge to the line/plane
of regression
Multiple Linear Regression

β1, β2, ---βn- Slopes of coefficients wrt x1,


x2, …. xn

If value of x1 has been increased by 1 unit:


β1 says that by how much value it is going to
affect the price (output)
i.e., the effect that increasing the value of the
independent variable has on the predicted y
value
Cost Function
It is a function that assigns a cost to instances where the model deviates from the
observed data. In this case, cost is the sum of squared errors.

The summation of square of difference between our predicted value and the actual value
divided by twice of length of data set
A smaller mean squared error implies a better performance

• m is the number of training example


• ½ is a constant that helps cancel 2 in derivative of the function when doing calculations for
gradient descent
Understanding of Dataset
• The advertising data set consists of
the sales of a product in 200
different markets, along with
advertising budgets for three
different media: TV, radio, and
newspaper.
• The first row of the data says that
the advertising budgets for TV,
radio, and newspaper were $230.1k,
$37.8k, and $69.2k respectively, and
the corresponding number of units
that were sold was 22.1k (or 22,100)
Applying Multi-Regression

In Simple Linear Regression, we can see how each advertising medium affects
sales when applied without the other two media

However, in practice, all three might be working together to impact net sales. We
did not consider the combined effect of these media on sales.

Multiple Linear Regression solves the problem by taking account of all the
variables in a single expression.
Applying Multi-Regression

Finding the values of these constants(β) is what regression model does by


minimizing the error function and fitting the best line or hyperplane (depending
on the number of input variables)

This is done by minimizing the Residual Sum of Squares (RSS), which is obtained
by squaring the differences between actual and predicted outcomes.
Applying Multi Regression to get the coefficients
• If we fix the budget for TV &
Output: newspaper, then increasing the radio
Intercept 2.938889 budget by $1000 will lead to an
increase in sales by around 189
TV 0.045765 units(0.189*1000)
radio 0.188530
newspaper -0.001037
• Similarly, by fixing the radio &
newspaper, we infer an approximate
rise of 46 units of products per $1000
increase in the TV budget
Applying Multi Regression to get the coefficients
• If there is unit increase in sales value, then
we need to increase the expenditure of TV
Output: by 0.045 units and so on.
Intercept 2.938889 • For the newspaper budget, since the
coefficient is quite negligible (close to zero),
TV 0.045765 it’s evident that the newspaper is not
radio 0.188530 affecting the sales.
newspaper -0.001037 • In fact, it’s on the negative side of zero(-
0.001) which, if the magnitude was big
enough, could have meant that this agent is
rather causing the sales to fall.
• So we don’t have to do that much
expenditure
Applying Multi Regression to get the coefficients

Output:
If we run Simple Linear Regression using just
Intercept 2.938889
the newspaper budget against sales, we’ll
observe the coefficient value of around 0.055,
TV 0.045765
which is quite significant in comparison to what
radio 0.188530
we saw above
newspaper -0.001037
Collinearity
How these variables are correlated with each other?

 In multiple linear regression, it is possible that some of the independent variables


are actually correlated with one another, so it is important to check these before
developing the regression model

 If two independent variables are too highly correlated, then only one of them
should be used in the regression model.

ad = pd.read_csv("Advertising.csv")
ad.corr()

Download Dataset from: https://www.statlearning.com/s/Advertising.csv


Example Code link: https://towardsdatascience.com/multiple-linear-regression-8cf3bee21d8b
Collinearity- How these variables are correlated with each other?
 A situation in which two or more input variables are linearly related
 The independent variables are not too highly correlated with each other
Close to 1: Strong Correlation (Ignore)
Close to 0: Weak Correlation

Correlation Matrix for advertising data

Correlation Heatmap for Advertising Data


Dummy Variables (One Hot Encoding)
As we know in the Multiple Regression Model we use a lot of categorical data. Using Categorical Data is a
good method to include non-numeric data into the respective Regression Model. Categorical Data refers to
data values that represent categories-data values with the fixed and unordered number of values, for
instance, gender(male/female). In the regression model, these values can be represented by Dummy
Variables.
These variables consist of values such as 0 or 1 representing the presence and absence of categorical values.
Feature Selection
Forward Selection: We start with a model without any predictor and just the intercept
term. We then perform simple linear regression for each predictor to find the best
performer(lowest RSS). We then add another variable to it and check for the best 2-
variable combination again by calculating the lowest RSS(Residual Sum of Squares). After
that the best 3-variable combination is checked, and so on. The approach is stopped
when some stopping rule is satisfied.

Backward Selection: We start with all variables in the model, and remove the variable
that is the least statistically significant. This is repeated until a stopping rule is reached.
For instance, we may stop when there is no further improvement in the model score.
R squared
 The coefficient of determination (R-squared) is a statistical metric that is used to measure
how much of the variation in outcome can be explained by the variation in the
independent variables

 R2 by itself can't thus be used to identify which predictors should be included in a model
and which should be excluded

 R² (between 0 to 1) is the measure of the degree to which variance in data is explained by


the model. Mathematically, it’s the square of the correlation between actual and
predicted outcomes

 0 indicates that the outcome cannot be predicted by any of the independent variables and
1 indicates that the outcome can be predicted without error from the independent
variables
R squared

R² closer to 1 indicates that the model is good and explains the variance in data well. A value
closer to zero indicates a poor model.
Implement multi-linear regression to calculate
price

3000 sqr ft area, 3 bedrooms, 40 year old ------------- > PRICE?????

2500 sqr ft area, 4 bedrooms, 5 year old ------------- > PRICE?????

https://github.com/codebasics/py/blob/master/ML/2_linear_reg_multivariate/2_linear_regression_multivariate.ipynb
Nonlinear Regression
Nonlinear Regression

 Non-linear regression is a method to model a non-linear relationship between


the dependent variable and a set of independent variables

 Simple linear regression relates two variables (X and Y) with a straight line (y =
mx + b), while nonlinear regression relates the two variables in a nonlinear
(curved) relationship

48
Nonlinear Regression

If the data shows a curvy trend, then linear regression will not produce very
accurate results when compared to a non-linear regression because, as the
name implies, linear regression presumes that the data is linear

49
Why Nonlinear Regression?

These data points correspond to


China’s GDP from 1960–2014

The first column is the year and


the second is China’s
corresponding annual gross
domestic income in US dollars for
that year

50
Why Nonlinear Regression?

First, can GDP be predicted based on time?

Second, can we use a simple linear regression to model it?

If the data shows a curvy trend, then linear regression would not produce very
accurate results when compared to a non-linear regression
Simply because, as the name implies, linear regression presumes that the data is
linear

51
Why Nonlinear Regression?
The scatter plot shows that there
seems to be a strong relationship
between GDP and time, but the
relationship is not linear
The growth starts off slowly, then
from 2005 onward, the growth is
very significant
Finally, it decelerates slightly in
the 2010
It looks like either a logistical or
exponential function
So, it requires a special estimation
method of the non-linear
regression procedure
52
Methods to handle nonlinear data

 Polynomial Regression

 Linearization of non-linear problems

53
Polynomial Regression

54
Polynomial Regression

Essentially any relationship that is not linear can be termed as non-linear and is
usually represented by the polynomial of degrees (maximum power of )

= + 1 + 2
2 +…… + k
k

= + 1Z1+ 2Z2+…… + kZk  MULTIPLE LINEAR REGRESSION

55
Implementation

https://www.geeksforgeeks.org/python-implementation-of-polynomial-
regression/

56
Logistic Regression
Classification
 Classification is a process of categorizing a given set of data into classes

 The process starts with predicting the class of given data points

 The classes are often referred to as target, label or categories


Binary vs. Multi-Classification
Classification Vs. Regression
Parameter CLASSIFICATION REGRESSION

The mapping function is used for Mapping Function is used for


Basic mapping values to predefined the mapping of values to
classes continuous output

Involves prediction of Discrete values Continuous values

by measurement of root
Method of calculation by measuring accuracy
mean square error

Logistic Regression, Naïve Bayes, K- Simple Linear Regression,


Example Algorithms Nearest Neighbor, Support Vector Multi Linear Regression, Non-
Machines, Decision Trees, etc. linear Regression etc.
Logistic Regression
A learning algorithm that we use when the output labels Y in a supervised learning
problem are all either zero or one

FOR BINARY CLASSIFICATION PROBLEM


Logistic Regression

When the dependent variable(target) is categorical

For example,
To predict whether an email is spam (1) or not (0)

Linear Regression Continuous (can’t apply)

Logistic Regression model builds a regression model to predict the probability that a
given data entry belongs to the category numbered as “1”
Linear vs. Logistic Regression
Sigmoid Function (Threshold=0.5)
Sigmoid Function
Sigmoid Function σ(z) =

z = wi xi + b
= w1x1 + w2x2 +… wnxn + b
= w1x1 + w2x2 +… wnxn + b
= wTx+b

y= ( ) = ( )
Sigmoid Function
 If ‘z’ goes to infinity,
Y(predicted) will become 1

 If ‘z’ goes to negative infinity,


Y(predicted) will become 0

If z is very large-> will be close to


zero
So Sigmoid σ(Z) =1
If z is very small or very large negative number:
So Sigmoid σ(Z) =0

*Keep b and W as separate parameters


Cost Function

To train the parameters W and b of logistic regression, we need to define


cost function

The Weights (W) are trained so as to minimize the cost defined as Mean
Squared Error (MSE)
Loss (Error) Function

To measure how well our algorithm is doing

Loss function would be the square root error:

L(y',y) = 1/2 (y' - y)^2

We want this as small as possible

But we won't use this notation because it leads us to optimization problem which is non
convex, means it contains local optimum points
Loss (Error) Function

If we try to use the cost function of the


linear regression in ‘Logistic Regression’
then it would be of no use as it would end
up being a non-convex function with many
local minimums, in which it would be
very difficult to minimize the cost value and
find the global minimum
Loss (Error) Function

So, Gradient Descent may not find global


optimum

We need to define to measure how good


our output y-hat is when the true label is y
Loss (Error) Function

So, we define a different loss


function for logistic regression that
plays a similar role as squared error,
that will give us an optimization
problem (convex)

L(y',y) = - (y*log(y') + (1-y)*log(1-y'))

*We want this as small as possible


Loss (Error) Function

L(y',y) = - (y*log(y') + (1-y)*log(1-y'))


For single training example

CASE 1: If y = 1 ==> L(y',1) = -log(y') ==> we want y' to be the largest ==> y' biggest value
is 1
(But y' can’t be larger that 1 (sigmoid function). So we want it to be close to 1

CASE 2: If y = 0 ==> L(y',0) = -log(1-y') ==> we want 1-y' to be the largest ==> y' to be
smaller as possible because it can only has 1 value
Loss (Error) Function
For single training example
For logistic regression, the Cost function is defined as:
−log(y') if y = 1
−log(1−y') if y = 0

If y=1, when it predicts correctly (positive class in 100% probability), the


sample cost j is 0; the cost keeps increasing when it is less likely to be the
positive class; when it incorrectly predicts that there is no chance to be the
positive class, the cost is indefinitely high
Loss (Error) Function
For single training example
For logistic regression, the Cost function is defined as:
−log(y') if y = 1
−log(1−y') if y = 0

If y=0, when it predicts correctly (negative class in 100% probability), the


sample cost j is 0; the cost keeps increasing when it is more likely to be the
positive class; when it incorrectly predicts that there is no chance to be the
negative class, the cost is indefinitely high
Loss (Error) Function
Cost Function

J(w) = L(y ,y)

J(w) = y log(y ) + (1−y) log(1−y )

 The loss function computes the error for a single training example
 The cost function is the average of the loss functions of the entire
training set
 It is also known as logarithmic loss or log loss
Gradient Descent

• Now the question arises, how do we reduce the cost value


• This can be done by using Gradient Descent
• The main goal of Gradient descent is to minimize the cost value. i.e. min J(w).

J(w) = y log(y ) + (1−y) log(1−y )


Gradient Descent

• To minimize the cost function, we need to run the gradient descent function on
each parameter i.e.

w = w - alpha * dw
Gradient Descent

• We want to predict w and b that minimize the cost function

• Our cost function is convex

• First we initialize w and b to 0,0 or initialize them to a random value in the convex
function and then try to improve the values the reach minimum value

• In Logistic regression we generally always use 0,0 instead of random.


Gradient Descent
Gradient Descent
Gradient Descent

• The gradient decent algorithm repeats: w = w - alpha * dw


where alpha is the learning rate (how big a step we take on each iteration or GD)
dw is the derivative of w (Change to w), The derivative is also the slope of w

• The derivative give us the direction to improve our parameters

• The actual equations we will implement:

 w = w - alpha * d(J(w,b) / dw) (how much the function slopes in the w direction)
 b = b - alpha * d(J(w,b) / db) (how much the function slopes in the b direction)
Gradient Descent
Lower value of “alpha” is preferred, because if the learning rate is a big number then we may
miss the minimum point and keep on oscillating in the convex curve
Derivatives (slope of function)

• Derivative of a linear line is its slope

• ex. f(a) = 3a = d(f(a))/d(a) = 3

• if a = 2 then f(a) = 6

• if we move little bit a = 2.001 then f(a) = 6.003


means
that we multiplied the derivative (Slope) to the moved area and added it to the
last result
• In straight line, derivative or slope is same everywhere
Derivatives
• f(a) = a^2 ==> d(f(a))/d(a) = 2a
a = 2 ==> f(a) = 4
a = 2.0001 ==> f(a) = 4.0004 approx.

• f(a) = a^3 ==> d(f(a))/d(a) = 3a^2

• f(a) = log(a) ==> d(f(a))/d(a) = 1/a

To conclude, Derivative is the slope and slope is different in different points in the
function
Computational Graph

J(a,b,c) = 3(a+bc)

u = bc
v = a+u
J = 3v
Computational Graph

In Logistic Regression, J is the cost function that we try to minimize


Path: Left to Right

But for derivatives there is Right to Left computation (Backward) to yield derivative of
final output variable
Derivatives with Computational Graph

Calculus chain rule says: If x -> y -> z (x effects y and y effects z)

Then d(z)/d(x) = d(z)/d(y) * d(y)/d(x)

dJ/dV=?
If we take value of V and change it little bit, How would value of J change?
Derivatives with Computational Graph

=??? =3
Derivatives with Computational Graph

a = 5 = 5.001
v = 11 = 11.001
=1 J =33 = 33.003

=?

= = 3*1= 3
Derivatives with Computational Graph
Derivatives with Computational Graph
Logistic Regression Derivatives

We want to modify the parameters w and b to reduce the loss (L)

So compute derivative with this loss (means backward)


Logistic Regression Derivatives

We want to modify the parameters w and b to reduce the loss (L)

So compute derivative with this loss (means backward)


Logistic Regression Derivatives

Then from right to left we will calculate derivations compared to the result:

d(a) = d(L)/d(a) = -(y/a) + ((1-y)/(1-a))


d(z) = d(L)/d(z) = a - y
d(W1) = X1 * d(z)
d(W2) = X2 * d(z)
d(b) = d(z)
Logistic Regression Derivatives

w1 = w1 - alpa * dw1
w2 = w2 - alpa * dw2
b = b - alpa * db

We are done with computing derivatives and implement GD w.r.t a single training example
For m training examples-> divide by m
Logistic Regression Pseudo code
J = 0; dw1 = 0; dw2 =0; db = 0; # Devs.
w1 = 0; w2 = 0; b=0; # Weights
for i = 1 to m
# Forward pass
z(i) = W1*x1(i) + W2*x2(i) + b
a(i) = Sigmoid(z(i))
J += (-Y(i)*log(a(i)) + (1-Y(i))*log(1-a(i)))

# Backward pass
dz(i) = a(i) - Y(i)
dw1 += dz(i) * x1(i)
dw2 += dz(i) * x2(i)
db += dz(i)
J /= m
dw1/= m
dw2/= m
db/= m

# Gradient descent
w1 = w1 - alpa * dw1
w2 = w2 - alpa * dw2
b = b - alpa * db
Practical: Implement logistic regression to
predict Heart Disease

https://www.w3schools.com/python/python_ml_logistic_regression.asp

https://www.kaggle.com/datasets/dileep070/heart-disease-prediction-using-logistic-regression

You might also like