Open In App

Plot Logistic Regression Line Over Heat Plot in R

Last Updated : 26 Jun, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Plotting a logistic regression line over a heat plot can be a powerful way to visualize the relationship between predictor variables and a binary outcome. This article will guide you through the steps to create such a visualization in R.

What is a Logistic Regression Line?

In logistic regression, the logistic regression line is the decision boundary that separates the two classes. Instead of a straight line as in linear regression, the logistic regression line is an S-shaped curve created by the logistic function.

Heat Plot in R

Heat plots, also known as heatmaps, are powerful visualization tools used to represent data matrices. They are particularly useful for displaying the magnitude of data at the intersection of two variables, making patterns and correlations easy to identify.

Now we will discuss step by step Plot Logistic Regression Line Over Heat Plot in R Programming Language.

Step 1: Generate Example Data

First, we need to create some example data for the logistic regression model.

R
# Load necessary libraries
library(ggplot2)
library(dplyr)

# Set seed for reproducibility
set.seed(123)

# Generate example data
n <- 1000
data <- data.frame(
  x1 = rnorm(n),
  x2 = rnorm(n),
  y = as.factor(rbinom(n, 1, prob = 0.5))
)

# View the first few rows of the data
head(data)

Output:

           x1          x2 y
1 -0.56047565 -0.99579872 0
2 -0.23017749 -1.03995504 1
3  1.55870831 -0.01798024 1
4  0.07050839 -0.13217513 1
5  0.12928774 -2.54934277 0
6  1.71506499  1.04057346 0

Step 2: Fit the Logistic Regression Model

Next, we fit a logistic regression model using the example data.

R
# Fit logistic regression model
logistic_model <- glm(y ~ x1 + x2, data = data, family = binomial)

# Summary of the logistic regression model
summary(logistic_model)

Output:

Call:
glm(formula = y ~ x1 + x2, family = binomial, data = data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.04175    0.06336  -0.659    0.510
x1          -0.06587    0.06418  -1.026    0.305
x2          -0.02970    0.06298  -0.472    0.637

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1385.8  on 999  degrees of freedom
Residual deviance: 1384.4  on 997  degrees of freedom
AIC: 1390.4

Number of Fisher Scoring iterations: 3

Step 3: Create the Heat Plot

To create a heat plot, we calculate the density of the data points and plot them using ggplot2.

R
# Calculate density of data points
data$density <- with(data, MASS::kde2d(x1, x2, n = 50)$z)

# Plot heat map
heat_plot <- ggplot(data, aes(x = x1, y = x2, fill = density)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "blue") +
  theme_minimal() +
  labs(title = "Heat Plot with Logistic Regression Line",
       x = "x1",
       y = "x2")

# Display the heat plot
print(heat_plot)

Output:

Heatmap-
Plot Logistic Regression Line Over Heat Plot in R

Step 4: Add Logistic Regression Line

To add the logistic regression line, we need to create a grid of values over which to predict the probability of the outcome and plot the contour lines for a specific probability threshold.

R
# Create a grid of values
grid <- expand.grid(
  x1 = seq(min(data$x1), max(data$x1), length.out = 100),
  x2 = seq(min(data$x2), max(data$x2), length.out = 100)
)

# Predict probabilities using the logistic model
grid$prob <- predict(logistic_model, newdata = grid, type = "response")

# Add contour lines to the heat plot
final_plot <- heat_plot + 
  geom_contour(data = grid, aes(x = x1, y = x2, z = prob, color = after_stat(level)), 
               breaks = c(0.5)) +
  scale_color_gradient(low = "red", high = "darkred", guide = "none") +
  labs(title = "Heat Plot with Logistic Regression Line")

# Display the final plot
print(final_plot)

Output:

Heatmap-With-Logistic-Regression-Line
Plot Logistic Regression Line Over Heat Plot in R
  • Data Generation: We create a data frame with two predictor variables (x1, x2) and a binary outcome variable (y).
  • Logistic Regression Model: We fit a logistic regression model using the glm function with the binomial family.
  • Heat Plot: We calculate the density of the data points using kde2d from the MASS package and convert it to a data frame. We then plot a heat map using geom_tile in ggplot2.
  • Logistic Regression Line: We create a grid of predictor values, predict the probabilities of the outcome using the logistic regression model, and add contour lines to the heat plot for a specific probability threshold (e.g., 0.5). We use the after_stat(level) notation as recommended by ggplot2

Conclusion

Plotting a logistic regression line over a heat plot is a powerful visualization technique for understanding the relationship between predictor variables and a binary outcome. By following the steps outlined in this guide, you can create such plots in R using ggplot2 and MASS packages. This approach can be particularly useful in fields like epidemiology, social sciences, and marketing, where binary outcomes are common.


Next Article

Similar Reads