Open In App

Comprehensive Guide to Scatter Plot using ggplot2 in R

Last Updated : 03 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Scatter plot uses dots to represent values for two different numeric variables and is used to observe relationships between those variables. To plot the Scatter plot we will use we will be using the geom_point() function. This function is available in ggplot2 package which is a free and open-source visualization package widely used in R.

This package can be installed using the R function install. packages(). We can use below command to download it.

R
install.packages("ggplot2")

For example: We are using the ggplot2 library to create a scatter plot of the Sepal.Length vs. Sepal.Width from the iris dataset.

R
library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()

Output:

gh

Basic Scatterplot with ggplot2 in R

Syntax :

geom_point(size, color, fill, shape, stroke)

Parameters :

  • size : Size of Points
  • color : Color of Points/Border
  • fill : Color of Points
  • shape : Shape of Points in range from 0 to 25
  • stroke : Thickness of point border
  • Return : It creates Scatter plots.

1. Scatter plot with groups

Here we will use distinguish the values by a group of data ( factor level data). aes() function controls the color of the group and it should be factor variable.

Syntax: 

aes(color = factor(variable))

We are creating a scatter plot of Sepal.Length vs. Sepal.Width from the iris dataset and using the geom_point() function to color the points based on different values of Sepal.Width, treating it as a factor.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point(aes(color = factor(Sepal.Width)))

Output:

gh

Basic Scatterplot with ggplot2 in R

2. Changing color in Scatter plot

Here we use aes() methods color attributes to change the color of the data points with specific variables. We are creating a scatter plot to color the points based on the Species variable.

R
ggplot(iris) +
    geom_point(aes(x = Sepal.Length, 
                   y = Sepal.Width, 
                   color = Species))

Output:

gh

Basic Scatterplot with ggplot2 in R

3. Changing Shape of Data points in a Scatter plot

To change the shape of the data points we will use shape attributes with aes() methods. We are creating a scatter plot to differentiate points by both shape and color based on the Species variable.

R
ggplot(iris) +
    geom_point(aes(x = Sepal.Length, y = Sepal.Width, 
                   shape = Species , color = Species))

Output:

gh

Basic Scatterplot with ggplot2 in R

4. Changing the size aesthetic in Scatter plot

To change the aesthetic or data points we will use size attributes in aes() methods. Here, we are creating a scatter plot to set the size of all points to a constant value of 0.5.

R
ggplot(iris) +
    geom_point(aes(x = Sepal.Length,
                   y = Sepal.Width,
                   size = .5))

Output:

gh

Basic Scatterplot with ggplot2 in R

5. Label points in Scatter plot

To deploy the labels on the data point we will use label into the geom_text() methods. Like in this example, we are creating a scatter plot and customizing the colors of the points based on the Species variable with a manual color palette. Labels are added to the points with geom_text() and the plot is further customized with titles, axis labels and a minimal theme. The legend is positioned to the right.

R
library(ggplot2)
color_palette <- c("blue", "green", "red")

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 3) +
  geom_text(aes(label = Species), 
            position = position_nudge(x = 0.05, y = 0.05),
            size = 3, 
            show.legend = FALSE) +

  scale_color_manual(values = color_palette) +
  theme_minimal() +
  
  ggtitle("Sepal Length vs. Sepal Width") +
  xlab("Sepal Length") +
  ylab("Sepal Width") +
  theme(legend.position = "right")

Output:

gh

Basic Scatterplot with ggplot2 in R

Regression lines in Scatter plot with ggplot2 in R

Regression models a target prediction value supported independent variables and mostly used for finding out the relationship between variables and forecasting. In R we can use the stat_smooth() function to smoothen the visualization.

Example: We are creating a scatter plot of Sepal.Length vs. Sepal.Width from the iris dataset and adding a linear regression line using stat_smooth() with the lm method to show the best-fit line.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() + 
    stat_smooth(method=lm)

Output:

gh

Basic Scatterplot with ggplot2 in R

Syntax:

stat_smooth(method=”method_name”, formula=fromula_to_be_used, geom=’method name’)

Parameters:  

  • method: It is the smoothing method (function) to use for smoothing the line
  • formula: It is the formula to use in the smoothing function
  • geom: It is the geometric object to use display the data

1. Using stat_mooth with LOESS mode in a Scatter plot

We are creating a scatter plot and adding a smoothing line using stat_smooth() which automatically selects the smoothing method (default is LOESS) to fit the data.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() + 
    stat_smooth()

Output:

gh

Basic Scatterplot with ggplot2 in R

Alternative Method:

The geom_smooth() function to represent a regression line and smoothen the visualization.  

Syntax:

geom_smooth(method=”method_name”, formula=fromula_to_be_used)

Parameters:

  • method: It is the smoothing method (function) to use for smoothing the line
  • formula: It is the formula to use in the smoothing function

Example: We are creating a scatter plot and adding a smoothing line using geom_smooth() which automatically selects the smoothing method (default is LOESS) to fit the data.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() + 
    geom_smooth()

Output:

gh

Basic Scatterplot with ggplot2 in R

In order to show the regression line on the graphical medium with help of geom_smooth() function, we pass the method as “loess” and the formula used as y ~ x.

2. Intercept and slope in a Scatter plot

We are creating a scatter plot and adding a customized straight line with a specified intercept of 37, slope of -5, in red color, dashed linetype, and size 1.5 using geom_smooth().

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() + 
    geom_smooth(intercept = 37, slope = -5, color="red", 
                 linetype="dashed", size=1.5)

Output:

gh

Basic Scatterplot with ggplot2 in R

3. Change the point color, shape and size manually

The scale_fill_manual, scale_size_manual, scale_shape_manual, scale_linetype_manual are builtin types which is assign desired colors to categorical data we use one of them scale_color_manual() function which is used to scale (map). 

Syntax : 

  • scale_shape_manualValue) for point shapes
  • scale_color_manual(Value) for point colors
  • scale_size_manual(Value) for point sizes

Parameter :

  • values : A set of aesthetic values to map the data. Here we take desired set of colors.

Example: We are creating a scatter plot and coloring the points based on the Species variable. A linear regression line is added using geom_smooth() with no confidence interval (se=FALSE) and extended across the full range. Custom shapes and colors are applied to the points and the legend is positioned at the top.

R
library(ggplot2)

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point() +
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
  scale_shape_manual(values=c(3, 16, 17))+ 
  scale_color_manual(values=c('pink','yellow', 'green'))+
  theme(legend.position="top")

Output:

gh

Basic Scatterplot with ggplot2 in R

4. Marginal rugs to a Scatter plot with ggplot2 in R

To add marginal rugs to the scatter plot we will use geom_rug() methods. We are creating a scatter plot and adding marginal rugs using geom_rug() to show the distribution of values along the x and y axes.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    geom_rug()

Output:

gh

Basic Scatterplot with ggplot2 in R

Scatter plots with the 2-D density estimation

To create density estimation in scatter plot we will use geom_density_2d() methods and geom_density_2d_filled() from ggplot2.

Example: We are creating a scatter plot and adding a 2D density contour plot using geom_density_2d() to visualize the density of data points in the plot.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    geom_density_2d()

Output:

gh

Basic Scatterplot with ggplot2 in R

Syntax:

ggplot( aes(x)) + geom_density_2d( fill, color, alpha)

Parameters:

  • fill: background color below the plot
  • color: the color of the plotline
  • alpha: transparency of graph

1. Adding aesthetics to the 2-D density estimations

We are creating a scatter plot and adding a semi-transparent 2D density contour plot using geom_density_2d(alpha = 0.5) and filling the contours with colors using geom_density_2d_filled() to visualize the data density in the plot.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    geom_density_2d(alpha = 0.5)+
    geom_density_2d_filled()

Output:

gh

Basic Scatterplot with ggplot2 in R

2. Scatter plots with ellipses

To add a circle or ellipse around a cluster of data points, we use the stat_ellipse() function. This function automatically computes the circle/ellipse radius to draw around the cluster of points by categorical data. Like in this example, we are creating a scatter plot and adding ellipses using stat_ellipse() to show the confidence region or distribution of data points for each group in the dataset.

R
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    stat_ellipse()

Output:

gh

Basic Scatterplot with ggplot2 in R

In this article, we explored how to use scatter plots using ggplot2 in R Programming Language.



Next Article
Article Tags :

Similar Reads