0% found this document useful (0 votes)
75 views

Regression

Regression analysis is a statistical method used to understand relationships between variables. It can be used to model and predict future outcomes. There are different types of regression including linear, multiple linear, and nonlinear regression. Linear regression finds the linear relationship between a dependent variable and one or more independent variables to generate a predictive model. Multiple linear regression extends this to include multiple independent variables. The goal is to estimate coefficients that minimize error between predicted and actual dependent variable values.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Regression

Regression analysis is a statistical method used to understand relationships between variables. It can be used to model and predict future outcomes. There are different types of regression including linear, multiple linear, and nonlinear regression. Linear regression finds the linear relationship between a dependent variable and one or more independent variables to generate a predictive model. Multiple linear regression extends this to include multiple independent variables. The goal is to estimate coefficients that minimize error between predicted and actual dependent variable values.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Regression

What is Regression Analysis?

Regression analysis is a set of statistical methods used for the estimation of relationships between a
dependent variable and one or more independent variables. It can be utilized to assess the strength of
the relationship between variables and for modelling the future relationship between them.

The process that is adapted to perform regression analysis helps to understand which factors are
important, which factors can be ignored, and how they are influencing each other.
Regression analysis includes several variations, such as linear, multiple linear, and
nonlinear. The most common models are simple linear and multiple linear

•Dependent Variable: This is the variable that we are trying to understand or forecast.

•Independent Variable: These are factors that influence the analysis or target variable and provide us
with information regarding the relationship of the variables with the target variable.

Regression is concerned with specifying the relationship between a single numeric dependent variable
(the value to be predicted) and one or more numeric independent variables (the predictors).
Regression analysis is used for prediction and forecasting. This has a substantial overlap to the field of
machine learning. This statistical method is used across different industries such as,

•Financial Industry- Understand the trend in the stock prices, forecast the prices, evaluate risks in the
insurance domain

•Marketing- Understand the effectiveness of market campaigns, forecast pricing and sales of the
product. 

•Manufacturing- Evaluate the relationship of variables that determine to define a better engine to
provide better performance

•Medicine- Forecast the different combination of medicines to prepare generic medicines for diseases.
Linear Regression, multiple regression

The simplest of all regression types is Linear Regression where it tries to establish
relationships between Independent and Dependent variables. The Dependent
variable considered here is always a continuous variable.

If there is only a single independent variable, this is known as simple linear


regression, otherwise it is known as multiple regression. Both of these models
assume that the dependent variable is continuous.
What is Linear Regression?
Linear Regression is a predictive model used for finding the linear relationship between a dependent
variable and one or more independent variables.

Here, ‘Y’ is our dependent variable, which is a


continuous numerical and we are trying to understand
how does ‘Y’ change with ‘X’.
Examples of Independent & Dependent Variables:

• x is Rainfall and y is Crop Yield

• x is Advertising Expense and y is Sales

• x is sales of goods and y is GDP

If the relationship with the dependent variable is in the form of single variables, then it is known as
Simple Linear Regression
Simple Linear Regression

X —–> Y

 Multiple Linear Regression


Simple linear regression
Simple linear regression defines the relationship between a dependent variable and a single
independent predictor variable using a line denoted by an equation in the following form:

y = α + βx

The intercept, α (alpha), describes where the line crosses the y axis, while the slope, β (beta), describes
the change in y given an increase of x
Positive relationship , Negative relationship
Suppose we know that the estimated regression parameters in
the equation for the shuttle launch data are:
• a = 4.30 • b = -0.057
Hence, the full linear equation is y = 4.30 – 0.057x. Ignoring for
a moment how these numbers were obtained, we can plot the
line on the scatterplot:
Ordinary least squares estimation
In order to determine the optimal estimates of α and β, an estimation method known as ordinary least
squares (OLS) was used. In OLS regression, the slope and intercept are chosen such that they
minimize the sum of the squared errors, that is, the vertical distance between the predicted y value and
the actual y value. These errors are known as residuals.
In plain language, this equation defines e (the error) as the difference between the actual y value and the
predicted y value. The error values are squared and summed across all points in the data.

The caret character (^) above the y term is a commonly used feature of statistical notation. It indicates
that the term is an estimate for the true y value. This is referred to as the y-hat.
It can be shown using calculus that the value of b that results in the minimum squared error is:
covariance
covariance is a measure of the relationship between two random variables.

In other words, it is essentially a measure of the variance between two variables.


However, the metric does not assess the dependency between variables.
•Positive covariance: Indicates that two variables tend to move in the same direction.

•Negative covariance: Reveals that two variables tend to move in inverse directions.
The correlation between two variables is a number that indicates how closely their relationship follows
a straight line. Without additional qualification, correlation refers to Pearson's correlation coefficient,
which was developed by the 20th century mathematician Karl Pearson. The correlation ranges between
-1 and +1. The extreme values indicate a perfectly linear relationship, while a correlation close to zero
indicates the absence of a linear relationship.

Where:
•ρ(X,Y) – the correlation between the variables X and Y
•Cov(X,Y) – the covariance between the variables X and
Y
•σX – the standard deviation of the X-variable
•σY – the standard deviation of the Y-variable
Multiple linear regression

•Most real-world analyses have more than one independent variable

•Multiple linear regression most of the time use regression for a numeric prediction task.

•EXAMPLE:
• Yield, rainfall, temperature
The strengths and weaknesses of multiple linear regression
y changes by the amount βi for each unit increase in xi . The intercept is then the expected value of y
when the independent variables are all zero.

Since the intercept is really no different than any other regression parameter, it can also be denoted as
β0 (pronounced beta-naught) as shown in the following equation
The dependent variable is now a vector, Y, with a row for every example. The independent variables
have been combined into a matrix, X, with a column for each feature plus an additional column of '1'
values for the intercept term. The regression coefficients β and errors ε are also now vectors
The goal now is to solve for the vector β that minimizes the sum of the squared errors between the
predicted and actual y values. Finding the optimal solution requires the use of matrix algebra;

The best estimate of the vector β can be computed as:

You might also like