Linear Regression with a Known Fixed Intercept in R
Last Updated :
03 Sep, 2024
Linear regression is a fundamental statistical method used to model the relationship between a dependent variable and one or more independent variables. Typically, in linear regression, both the intercept and slope are estimated from the data. However, there are situations where the intercept is known beforehand and should be fixed at a specific value during the regression analysis. This article will guide you through the theory and practical implementation of performing linear regression with a known fixed intercept in R Programming Language.
Fixed Intercept in Linear Regression
There are scenarios where the intercept is predetermined and should not be estimated from the data. For example, in certain scientific or engineering applications, theoretical models might dictate a specific intercept. In such cases, we need to fit the linear regression model while keeping the intercept fixed at the known value.
Implementing Linear Regression with a Fixed Intercept in R
R provides several ways to fit a linear regression model with a fixed intercept. Here, we will explore the most common methods.
1. Implementing Linear Regression with a Fixed Intercept Using lm()
with Offset
One way to fix the intercept is by using the offset()
function within the lm()
function. The offset()
function allows you to fix a term in the model formula, effectively removing its estimation from the regression process.
R
# Sample data
set.seed(123)
x <- rnorm(100)
y <- 3 + 2 * x + rnorm(100, sd = 0.5)
# Known intercept
fixed_intercept <- 3
# Linear regression with fixed intercept using offset
model <- lm(y ~ x + offset(fixed_intercept - x * 0))
# Summary of the model
summary(model)
Output:
Call:
lm(formula = y ~ x + offset(fixed_intercept - x * 0))
Residuals:
Min 1Q Median 3Q Max
-0.95367 -0.34175 -0.04375 0.29032 1.64520
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05140 0.04878 -1.054 0.295
x 1.97376 0.05344 36.935 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4854 on 98 degrees of freedom
Multiple R-squared: 0.933, Adjusted R-squared: 0.9323
F-statistic: 1364 on 1 and 98 DF, p-value: < 2.2e-16
offset(fixed_intercept - x * 0)
fixes the intercept at the known value (3).- The
x * 0
term ensures that the offset is only applied to the intercept, not the slope.
2. Manually Adjusting the Dependent Variable
Another method involves manually adjusting the dependent variable by subtracting the known intercept, and then fitting the model without an intercept.
R
# Adjust the dependent variable by subtracting the known intercept
y_adjusted <- y - fixed_intercept
# Linear regression without intercept
model <- lm(y_adjusted ~ x - 1)
# Summary of the model
summary(model)
Output:
Call:
lm(formula = y_adjusted ~ x - 1)
Residuals:
Min 1Q Median 3Q Max
-1.00049 -0.39505 -0.08999 0.23467 1.58812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 1.96819 0.05321 36.99 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4856 on 99 degrees of freedom
Multiple R-squared: 0.9325, Adjusted R-squared: 0.9319
F-statistic: 1368 on 1 and 99 DF, p-value: < 2.2e-16
- The dependent variable
y
is adjusted by subtracting the known intercept. - The model is then fitted without an intercept using
- 1
in the formula, ensuring that only the slope is estimated.
3. Using Matrix Algebra
For those familiar with matrix algebra, you can fit a linear model with a fixed intercept by directly solving the normal equations.
R
# Manually constructing the design matrix
X <- cbind(rep(1, length(x)), x)
X[,1] <- fixed_intercept # Fix the intercept
# Solve the normal equations
beta <- solve(t(X) %*% X) %*% t(X) %*% y
# Extract the slope (the first element is the fixed intercept)
fixed_slope <- beta[2]
# Print the slope
print(fixed_slope)
Output:
[1] 1.973764
This method provides more control over the linear regression process, allowing for the explicit fixing of the intercept. Let's go through a complete practical example where we fit a linear regression model with a known fixed intercept.
Example: Predicting Weight from Height with a Fixed Intercept
Suppose you have a dataset of individuals' heights and weights, and you want to predict weight from height. Based on prior knowledge, you know that the intercept should be fixed at 50.
R
# Sample data
height <- c(150, 160, 170, 180, 190)
weight <- c(55, 60, 65, 70, 75)
# Known intercept
fixed_intercept <- 50
# Adjust the dependent variable by subtracting the known intercept
weight_adjusted <- weight - fixed_intercept
# Fit the model without an intercept
model <- lm(weight_adjusted ~ height - 1)
# Summary of the model
summary(model)
Output:
Call:
lm(formula = weight_adjusted ~ height - 1)
Residuals:
1 2 3 4 5
-8.6598 -4.5704 -0.4811 3.6082 7.6976
Coefficients:
Estimate Std. Error t value Pr(>|t|)
height 0.09107 0.01701 5.354 0.00587 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.488 on 4 degrees of freedom
Multiple R-squared: 0.8775, Adjusted R-squared: 0.8469
F-statistic: 28.66 on 1 and 4 DF, p-value: 0.005871
- The
weight_adjusted
is the weight with the fixed intercept subtracted. - The linear regression is performed without an intercept, and only the slope is estimated.
- The output of the
summary(model)
function will provide the slope, which can be used to describe the relationship between height and weight with the fixed intercept.
Conclusion
Performing linear regression with a known fixed intercept in R is straightforward using various methods such as the offset()
function, manual adjustment of the dependent variable, or matrix algebra. Understanding how to implement these techniques is crucial for situations where theoretical or empirical reasons dictate a fixed intercept in your model. By mastering these methods, you can enhance the accuracy and interpretability of your linear regression models in R.
Similar Reads
Machine Learning Tutorial
Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.It can
5 min read
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Linear Regression in Machine learning
Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output, mea
15+ min read
Support Vector Machine (SVM) Algorithm
Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. While it can handle regression problems, SVM is particularly well-suited for classification tasks. SVM aims to find the optimal hyperplane in an N-dimensional space to separate data
10 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
K means Clustering â Introduction
K-Means Clustering is an Unsupervised Machine Learning algorithm which groups unlabeled dataset into different clusters. It is used to organize data into groups based on their similarity. Understanding K-means ClusteringFor example online store uses K-Means to group customers based on purchase frequ
4 min read
K-Nearest Neighbor(KNN) Algorithm
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for classification but can also be used for regression tasks. It works by finding the "k" closest data points (neighbors) to a given input and makesa predictions based on the majority class (for classification) or th
8 min read
Logistic Regression in Machine Learning
In our previous discussion, we explored the fundamentals of machine learning and walked through a hands-on implementation of Linear Regression. Now, let's take a step forward and dive into one of the first and most widely used classification algorithms â Logistic RegressionWhat is Logistic Regressio
12 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Backpropagation in Neural Network
Backpropagation is also known as "Backward Propagation of Errors" and it is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network. In this article we will explore what
10 min read