Least Square Regression Line
Last Updated :
07 Jul, 2021
Given a set of coordinates in the form of (X, Y), the task is to find the least regression line that can be formed.
In statistics, Linear Regression is a linear approach to model the relationship between a scalar response (or dependent variable), say Y, and one or more explanatory variables (or independent variables), say X.
Regression Line: If our data shows a linear relationship between X and Y, then the straight line which best describes the relationship is the regression line. It is the straight line that covers the maximum points in the graph.
Examples:
Input: X = [95, 85, 80, 70, 60]
Y = [90, 80, 70, 65, 60]
Output: Y = 5.685 + 0.863*X
Explanation:
The graph of the data given below is:
X = [95, 85, 80, 70, 60]
Y = [90, 80, 70, 65, 60]
The regression line obtained is Y = 5.685 + 0.863*X
The graph shows that the regression line is the line that covers the maximum of the points.
Input: X = [100, 95, 85, 80, 70, 60]
Y = [90, 95, 80, 70, 65, 60]
Output: Y = 4.007 + 0.89*X
Approach:
A regression line is given as Y = a + b*X where the formula of b and a are given as:
b = (n?(xiyi) - ?(xi)?(yi)) ÷ (n?(xi2)-?(xi)2)
a = y? - b.x?
where x? and y? are mean of x and y respectively.
- To find regression line, we need to find a and b.
- Calculate a, which is given by a = (\sum yi)/n - b * (\sum xi)/n
- Calculate b, which is given by
b = (n*\sum(xi*yi) - \sum (xi)* \sum (yi))/(n*\sum (xi)^{2}-(\sum xi)^{2}) - Put value of a and b in the equation of regression line.
Below is the implementation of the above approach.
C++
// C++ program to find the
// regression line
#include<bits/stdc++.h>
using namespace std;
// Function to calculate b
double calculateB(int x[], int y[], int n)
{
// sum of array x
int sx = accumulate(x, x + n, 0);
// sum of array y
int sy = accumulate(y, y + n, 0);
// for sum of product of x and y
int sxsy = 0;
// sum of square of x
int sx2 = 0;
for(int i = 0; i < n; i++)
{
sxsy += x[i] * y[i];
sx2 += x[i] * x[i];
}
double b = (double)(n * sxsy - sx * sy) /
(n * sx2 - sx * sx);
return b;
}
// Function to find the
// least regression line
void leastRegLine( int X[], int Y[], int n)
{
// Finding b
double b = calculateB(X, Y, n);
int meanX = accumulate(X, X + n, 0) / n;
int meanY = accumulate(Y, Y + n, 0) / n;
// Calculating a
double a = meanY - b * meanX;
// Printing regression line
cout << ("Regression line:") << endl;
cout << ("Y = ");
printf("%.3f + ", a);
printf("%.3f *X", b);
}
// Driver code
int main()
{
// Statistical data
int X[] = { 95, 85, 80, 70, 60 };
int Y[] = { 90, 80, 70, 65, 60 };
int n = sizeof(X) / sizeof(X[0]);
leastRegLine(X, Y, n);
}
// This code is contributed by PrinciRaj1992
Java
// Java program to find the
// regression line
import java.util.Arrays;
public class GFG {
// Function to calculate b
private static double calculateB(
int[] x, int[] y)
{
int n = x.length;
// sum of array x
int sx = Arrays.stream(x).sum();
// sum of array y
int sy = Arrays.stream(y).sum();
// for sum of product of x and y
int sxsy = 0;
// sum of square of x
int sx2 = 0;
for (int i = 0; i < n; i++) {
sxsy += x[i] * y[i];
sx2 += x[i] * x[i];
}
double b = (double)(n * sxsy - sx * sy)
/ (n * sx2 - sx * sx);
return b;
}
// Function to find the
// least regression line
public static void leastRegLine(
int X[], int Y[])
{
// Finding b
double b = calculateB(X, Y);
int n = X.length;
int meanX = Arrays.stream(X).sum() / n;
int meanY = Arrays.stream(Y).sum() / n;
// calculating a
double a = meanY - b * meanX;
// Printing regression line
System.out.println("Regression line:");
System.out.print("Y = ");
System.out.printf("%.3f", a);
System.out.print(" + ");
System.out.printf("%.3f", b);
System.out.print("*X");
}
// Driver code
public static void main(String[] args)
{
// statistical data
int X[] = { 95, 85, 80, 70, 60 };
int Y[] = { 90, 80, 70, 65, 60 };
leastRegLine(X, Y);
}
}
Python3
# Python program to find the
# regression line
# Function to calculate b
def calculateB(x, y, n):
# sum of array x
sx = sum(x)
# sum of array y
sy = sum(y)
# for sum of product of x and y
sxsy = 0
# sum of square of x
sx2 = 0
for i in range(n):
sxsy += x[i] * y[i]
sx2 += x[i] * x[i]
b = (n * sxsy - sx * sy)/(n * sx2 - sx * sx)
return b
# Function to find the
# least regression line
def leastRegLine(X,Y,n):
# Finding b
b = calculateB(X, Y, n)
meanX = int(sum(X)/n)
meanY = int(sum(Y)/n)
# Calculating a
a = meanY - b * meanX
# Printing regression line
print("Regression line:")
print("Y = ", '%.3f'%a, " + ", '%.3f'%b, "*X", sep="")
# Driver code
# Statistical data
X = [95, 85, 80, 70, 60 ]
Y = [90, 80, 70, 65, 60 ]
n = len(X)
leastRegLine(X, Y, n)
# This code is contributed by avanitrachhadiya2155
C#
// C# program to find the
// regression line
using System;
using System.Linq;
class GFG{
// Function to calculate b
private static double calculateB(int[] x,
int[] y)
{
int n = x.Length;
// Sum of array x
int sx = x.Sum();
// Sum of array y
int sy = y.Sum();
// For sum of product of x and y
int sxsy = 0;
// Sum of square of x
int sx2 = 0;
for(int i = 0; i < n; i++)
{
sxsy += x[i] * y[i];
sx2 += x[i] * x[i];
}
double b = (double)(n * sxsy - sx * sy) /
(n * sx2 - sx * sx);
return b;
}
// Function to find the
// least regression line
public static void leastRegLine(int []X, int []Y)
{
// Finding b
double b = calculateB(X, Y);
int n = X.Length;
int meanX = X.Sum() / n;
int meanY = Y.Sum() / n;
// Calculating a
double a = meanY - b * meanX;
// Printing regression line
Console.WriteLine("Regression line:");
Console.Write("Y = ");
Console.Write("{0:F3}",a );
Console.Write(" + ");
Console.Write("{0:F3}", b);
Console.Write("*X");
}
// Driver code
public static void Main(String[] args)
{
// Statistical data
int []X = { 95, 85, 80, 70, 60 };
int []Y = { 90, 80, 70, 65, 60 };
leastRegLine(X, Y);
}
}
// This code is contributed by gauravrajput1
JavaScript
<script>
// Javascript program to find the
// regression line
// Function to calculate b
function calculateB(x,y)
{
let n = x.length;
// sum of array x
let sx = x.reduce((a, b) => a + b, 0);
// sum of array y
let sy =y.reduce((a, b) => a + b, 0)
// for sum of product of x and y
let sxsy = 0;
// sum of square of x
let sx2 = 0;
for (let i = 0; i < n; i++) {
sxsy += x[i] * y[i];
sx2 += x[i] * x[i];
}
let b = (n * sxsy - sx * sy)
/ (n * sx2 - sx * sx);
return b;
}
// Function to find the
// least regression line
function leastRegLine(X,Y)
{
// Finding b
let b = calculateB(X, Y);
let n = X.length;
let meanX = X.reduce((a, b) => a + b, 0) / n;
let meanY = Y.reduce((a, b) => a + b, 0) / n;
// calculating a
let a = meanY - b * meanX;
// Printing regression line
document.write("Regression line:<br>");
document.write("Y = ");
document.write( a.toFixed(3));
document.write(" + ");
document.write( b.toFixed(3));
document.write("*X");
}
// Driver code
// statistical data
let X = [95, 85, 80, 70, 60 ];
let Y = [90, 80, 70, 65, 60];
leastRegLine(X, Y);
// This code is contributed by ab2127
</script>
Output: Regression line:
Y = 5.685 + 0.863*X
Similar Reads
Program to find slope of a line Given two coordinates, find the slope of a straight line. Examples: Input : x1 = 4, y1 = 2, x2 = 2, y2 = 5 Output : Slope is -1.5 Approach: To calculate the slope of a line you need only two points from that line, (x1, y1) and (x2, y2). The equation used to calculate the slope from two points is: Be
3 min read
Represent a given set of points by the best possible straight line Find the value of m and c such that a straight line y = mx + c, best represents the equation of a given set of points (x_1 , y_1 ), (x_2 , y_2 ), (x_3 , y_3 ), ......., (x_n , y_n ), given n >=2. Examples: Input : n = 5 x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 4, x_5 = 5y_1 = 14, y_2 = 27, y_3 = 40, y_4
10 min read
Find minimum value of y for the given x values in Q queries from all the given set of lines Given a 2-dimensional array arr[][] consisting of slope(m) and intercept(c) for a large number of lines of the form y = mx + c and Q queries such that each query contains a value x. The task is to find the minimum value of y for the given x values from all the given sets of lines. Examples: Input: a
12 min read
Regression Analysis and the Best Fitting Line using C++ This article discusses the basics of linear regression and its implementation in the C++ programming language. Regression analysis is the common analysis method that is used by data scientists for the prediction of values corresponding to some input data. The simple regression analysis method is lin
7 min read
Find minimum y coordinates from set of N lines in a plane Given N lines in a plane in the form of a 2D array arr[][] such that each row consists of 2 integers(say m & c) where m is the slope of the line and c is the y-intercept of that line. You are given Q queries each consist of x-coordinates. The task is to find the minimum possible y-coordinate cor
15+ min read
What is Regression Line? A regression line is a fundamental concept in statistics and data analysis used to understand the relationship between two variables. It represents the best-fit line that predicts the dependent variable based on the independent variable. This article will explain the concept of the regression line,
9 min read