ML-U2-Regression
ML-U2-Regression
Regression Algorithms
There are many different types of regression algorithms, but some of the most
common include:
● Linear Regression
○ Linear regression is one of the simplest and most widely used statistical
independent and dependent variables. This means that the change in the
● Polynomial Regression
algorithm that is used for classification tasks, but it can also be used for
regression tasks. SVR works by finding a hyperplane that minimizes the sum
decision tree to predict the target value. A decision tree is a tree-like structure
that consists of nodes and branches. Each node represents a decision, and
each branch represents the outcome of that decision. The goal of decision tree
regression is to build a tree that can accurately predict the target value for
decision trees to predict the target value. Ensemble methods are a type of
different subset of the training data. The final prediction is made by averaging
Applications of Regression
other features.
data.
data.
each suited for different types of data and different types of relationships.
1. Linear Regression
2. Polynomial Regression
3. Stepwise Regression
7. Ridge Regression
8. Lasso Regression
9. ElasticNet Regression
Linear Regression
linear approach for modeling the relationship between the criterion or the
response given the values of the predictors. For linear regression, there is
independent variables.
between input features (X) and target values (y). This code is used for
This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for simple
linear regression is:
y=β0+β1X
where:
● Y is the dependent variable
● β0 is the intercept
● β1 is the slope
This involves more than one independent variable and one dependent
variable. The equation for multiple linear regression is:
y=β0+β1X1+β2X2+………βnXn
\where:
● β0 is the intercept
The goal of the algorithm is to find the best Fit Line equation that can predict the
In regression set of records are present with X and Y values and these
values for all the data points. The difference is squared to ensure that
MSE=1n∑i=1n(yi–yi^)2
Here,
score.
MAE=1n∑i=1n∣Yi–Yi^∣
Here,
The square root of the residuals’ variance is the Root Mean Squared Error.
It describes how well the observed data points match the expected
RMSE=RSSn=∑i=2n(yiactual−yipredicted)2n
her than dividing the entire number of data points in the model by the
number of degrees of freedom, one must divide the sum of the squared
residuals to obtain an unbiased estimate. Then, this figure is referred to as
RMSE=RSSn=∑i=2n(yiactual−yipredicted)2(n−2)
can fluctuate when the units of the variables vary since its value is
R2=1−(RSSTSS)
● RSS=∑i=2n(yi−b0−b1xi)2
Total Sum of Squares (TSS): The sum of the data points’ errors
from the answer variable’s mean is known as the total sum of
squares, or TSS.
● TSS=∑(y−yi‾)2
model.
AdjustedR2=1–((1−R2).(n−1)n−k−1)
Here,
● R2 is coeeficient of determination
The regression line formula used in statistics is the same used in algebra:
y = mx + b
y = vertical axis
CORRELATION COEFFICIENT
Correlation is a statistical measure that describes the extent to which two
variables are related to each other. It quantifies the direction and strength
of the linear relationship between variables. Generally, a correlation
● Positive Correlation
● Zero Correlation
● Negative Correlation
and the most popular formula to get the statistical correlation coefficient.
given by coefficient r-value between -1 and +1) and the existence (given
data points tend to follow a straight line when plotted together. It’s
because two variables are related, it doesn’t mean one causes the
scores.
read.
outdoor
temperature
decreases, heating
2. Direction of Relationship:
vice versa.
3. Significance:
t-test for correlation coefficient, with the null hypothesis stating that the true
● If the p-value is less than the chosen significance level (e.g., 0.05), the
4. Scatterplot Examination:
● Visual inspection of a scatterplot can provide additional insights into
5. Caution:
observed between two variables, it does not necessarily mean that changes in
6. Sample Size:
correlations.
7. Context Dependence:
● The interpretation of r should consider the specific context and
subject matter of the study. What is considered a strong or weak correlation may
vary depending on the field of research and the variables under investigation.
Examples:Calculate the correlation coefficient for the following table with the