Ignore Outliers in ggplot2 Boxplot in R
Last Updated :
30 Jun, 2021
In this article, we will understand how we can ignore or remove outliers in ggplot2 Boxplot in R programming language.
Removing/ ignoring outliers is generally not a good idea because highlighting outliers is generally one of the advantages of using box plots. However, sometimes extreme outliers, on the other hand, can alter the size and obscure other characteristics of a box plot, therefore it's best to leave them out in those circumstances. We can remove outliers in R by setting the outlier.shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted. You can adjust the axis by using the coord_cartesian() function.
For creating Boxplot with outliers we require two functions one is ggplot() and the other is geom_boxplot()
Dataset Used: Crop_recommendation
Let us first create a regular boxplot, without removing any outliers so that the difference becomes apparent.
Example:
R
# Loading
library(ggplot2)
# loading data set and storing it in ds variable
ds <- read.csv("c://crop//archive//Crop_recommendation.csv", header = TRUE)
ds
# create a boxplot by using geom_boxplot()
# function of ggplot2 package with outliers
box_plot_crop<-ggplot(data=ds, aes( y=rainfall))
box_plot_crop+geom_boxplot()
Output:
Now, for removing the outliers, you can use the outlier.shape to NA argument.
Syntax:
geom_boxplot(outlier.shape = NA)
You can change the axis directly with the coord_cartesian() function since ggplot2 does not automatically adjust the axes. In the coord_catesian() you can set the limit of the axes by using the argument ylim or xlim.
Syntax:
coord_cartesian( xlim = NULL, ylim = NULL, expand = TRUE, default = FALSE, clip = "on )
Parameters:
- xlim, ylim-> set the limits of x and y-axis and also allows zooming in and zoom out.
- expand- It is TRUE by default, and if it is TRUE then it increases the limit by a small amount to ensure that data and axes do not overlap. and if it is FALSE then the limit is taken from the exact data or the xlim/ ylim.
- default- used for checking is this is the default coordinate system
- clip- It checks Should the drawing be cropped to fit the plot panel
Example:
R
# Loading
library(ggplot2)
# loading data set and storing it in ds variable
ds <- read.csv("c://crop//archive//Crop_recommendation.csv", header = TRUE)
ds
# remove outliers or create boxplot without outliers
box_plot_crop<-ggplot(data=ds, aes(y = rainfall))
box_plot_crop+ geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim = c(50, 300))
Output:
Similar Reads
Coloring boxplot outlier points in ggplot2 In data visualization using ggplot2, boxplots are effective for summarizing the distribution of numerical data and identifying outliers. Outliers, which are data points that significantly deviate from the rest of the data, can be highlighted for emphasis or further analysis. This article explores ho
4 min read
How To Show Mean Value in Boxplots with ggplot2? In this article, we will discuss how to show mean value in Boxplot with ggplot2 using R programming language. Firstly, we will create a basic boxplot using the geom_boxplot() function of the ggplot2 package and then do the needful, so that the difference is apparent. Syntax: ggplot() + geom_boxplot(
2 min read
How To Reorder Boxplots in R with ggplot2? In this article, we will discuss how to reorder the boxplot with ggplot2 in R Programming Language. To reorder the boxplot we will use reorder() function of ggplot2. Syntax: ggplot(sample_data, aes(x=reorder(name,value),y=value)) By default, ggplot2 orders the groups in alphabetical order. But for b
2 min read
How to Make Grouped Boxplots with ggplot2 in R? In this article, we will discuss how to make a grouped boxplot in the R Programming Language using the ggplot2 package. Boxplot helps us to visualize the distribution of quantitative data comparing different continuous or categorical variables. Boxplots consist of a five-number summary which helps i
3 min read
Change size of outlier labels on boxplot in R The boxplots in R Programming Language are used to label the data and take an assumption about how well distributed it is. The boxplot can be constructed using various data visualization packages in R, like the ggplot2 and the car packages. Outlier refers to the data points located outside the bound
3 min read
How to plot means inside boxplot using ggplot2 in R? In this article, we are going to see how to plot means inside boxplot using ggplot in R programming language. A box plot in base R is used to summarise the distribution of a continuous variable. It can also be used to display the mean of each group. Means or medians can also be computed using a box
4 min read