Open In App

Histogram in R using ggplot2

Last Updated : 02 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

A histogram is an approximate representation of the distribution of numerical data. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. It is used to display the shape and spread of continuous sample data.

Plotting Histogram using ggplot2 in R

We can use the ggplot2 library in R to plot an histogram. The geom_histogram() function is an in-built function of the ggplot2 module.

1. Creating Sample Data

We’re setting a seed for reproducibility and creating a data frame with simulated income data for two groups: Average Female income and Average Male income. Each group has 20,000 values generated from a normal distribution.


R
set.seed(123)
df <- data.frame(
   gender=factor(rep(c(
     "Average Female income ", "Average Male incmome"), each=20000)),
   Average_income=round(c(rnorm(20000, mean=15500, sd=500), 
                          rnorm(20000, mean=17500, sd=600)))   
)  
head(df)

Output : 

sample_data

Sample Data

2. Ploting a Histogram

We’re loading the ggplot2 package and creating a histogram of the Average_income variable from the data frame using ggplot(). This helps visualize the distribution of income values across both groups.

R
install.packages("ggplot2")
library(ggplot2)

ggplot(df, aes(x=Average_income)) + geom_histogram()

Output:

Histogram in R using ggplot2Geeksforgeeks

Histogram in R using ggplot2

Customize the Histogram

There are several customizations that can be made to a histogram as per the needs.

1. Changing the border color of the Histogram

The color argument within color in this modified code is set to “black” to indicate the border color of the histogram bars.

R
ggplot(df, aes(x = Average_income)) +
  geom_histogram(color = "black", fill = "steelblue") +
  labs(x = "Average Income", y = "Frequency") +
  ggtitle("Histogram of Average Income") +
  theme_minimal()

Output:

ing

Histogram in R using ggplot2

2. Changing the Bin Width of Histogram

We’re using ggplot() to plot a histogram of Average_income, setting binwidth = 1 to create more detailed income intervals. This gives a clearer view of how the income values are distributed.

R
ggplot(df, aes(x=Average_income)) +    

   geom_histogram(binwidth=1)

Output:

Histogram in R using ggplot2Geeksforgeeks

Histogram in R using ggplot2

3. Changing colors of the Histogram

We’re creating a histogram of Average_income with white borders and a red fill using ggplot(). This enhances the visual contrast and makes the distribution easier to interpret.

R
plot <- ggplot(df, aes(x=Average_income)) +   
   geom_histogram(color="white", fill="red")

plot

Output:

Histogram in R using ggplot2Geeksforgeeks

Histogram in R using ggplot2

4. Add Descriptive Statistics to Histogram Using geom_vline()

We are creating a histogram of Average_income by gender with overlapping bars, customizing the bin width and transparency. We add vertical dashed and dotted lines for the mean and median using geom_vline(), and customize colors with scale_fill_manual() and scale_color_manual(). The plot is simplified with theme_minimal(), and the title, labels, and legend position are adjusted for clarity.

R
histogram_plot <- ggplot(df, aes(x = Average_income, fill = gender)) +
  geom_histogram(binwidth = 500, position = "identity", alpha = 0.7) +
  geom_vline(aes(xintercept = mean(Average_income, na.rm = TRUE), color = gender),
             linetype = "dashed", size = 1) +
  geom_vline(aes(xintercept = median(Average_income, na.rm = TRUE), color = gender),
             linetype = "dotted", size = 1) +
  scale_fill_manual(values = c("blue", "green")) +
  scale_color_manual(values = c("red", "black")) +
  theme_minimal() +
  ggtitle("Distribution of Average Income by Gender") +
  xlab("Average Income") +
  ylab("Frequency") +
  theme(legend.position = "top")

print(histogram_plot)

Output:

gh

Basic ggplot2 Histogram in R

5. Plotting Probability Densities of Histogram

We are creating a histogram with a density plot overlay to visualize the distribution of Average_income. We use geom_histogram() to create the bars, with density values on the y-axis, and add a vertical dashed line for the mean using geom_vline(). A density curve is added with geom_density() to highlight the overall distribution shape. We customize the plot with a title, axis labels, and apply a minimal theme.

R
ggplot(df, aes(x = Average_income, y = after_stat(density))) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "lightblue", 
                 color = "black", alpha = 0.7) +
  geom_vline(aes(xintercept = mean(Average_income, na.rm = TRUE)), color = "red", 
             linetype = "dashed", size = 1.5) +
  geom_density(color = "black", size = 1.5, alpha = 0.5) +
  
  ggtitle("Distribution of Home Prices") +
  xlab("Price") +
  ylab("Density") +
  theme_minimal()

Output:

gh

Histogram in R using ggplot2

6. Plotting Histogram Based on Groups

We are creating a histogram of Sepal.Length from the iris dataset, with colors based on the Species column. The bars are outlined in black with a transparency of 0.7, and we use scale_fill_manual() to customize the color palette for each species. The plot includes a title, axis labels, and uses a minimal theme.

R
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
  geom_histogram(bins = 30, color = "black", alpha = 0.7) +
  
  ggtitle("Distribution of Sepal Length by Species") +
  xlab("Sepal Length") +
  ylab("Frequency") +
  
  scale_fill_manual(values = c("blue", "pink", "red")) +
  theme_minimal()

Output:

gh

Histogram in R using ggplot2

Alternative: Plotting Histogram of Sepal Length (Faceted by Species)

We are creating a histogram of Sepal.Length from the iris dataset, with colors based on the Species column. The plot is faceted by Species, allowing each species to have its own histogram with free scales. We customize the labels and apply a minimal theme

R
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
  geom_histogram(bins = 30, color = "black", alpha = 0.7) +
  
  facet_wrap(~Species, scales = "free") +

  ggtitle("Histogram of Sepal Length by Species") +
  xlab("Sepal Length") +
  ylab("Frequency") +
  theme_minimal()

Output:

gh

Histogram in R using ggplot2

In this article, we explored how to create histograms in R using the ggplot2 package, covering basic plotting, customization, and enhancements to effectively visualize data distributions.



Next Article

Similar Reads