Open In App

Creating Boxplots Without Outliers in Matplotlib

Last Updated : 27 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Box plots, also known as whisker plots, are a powerful tool for visualizing the distribution of a dataset. They provide a concise summary of the data, highlighting key statistics such as the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum values. Additionally, box plots help in identifying outliers, which are data points that fall significantly outside the main distribution of the data.

Why Hide Outliers in Box Plots?

Outliers can sometimes skew the scale of a box plot, making it difficult to visualize the main body of the data. Here are a few reasons why you might want to hide outliers:

  • Improved Scale: By hiding outliers, the scale of the plot can be adjusted to better represent the majority of the data points, making it easier to analyze the central tendency and dispersion.
  • Focus on Main Distribution: Outliers can be interesting but sometimes distract from the main distribution of the data. Hiding them helps in focusing on the core characteristics of the dataset.
  • Enhanced Readability: A plot without outliers can be more readable, especially when dealing with large datasets or when the outliers are significantly far from the rest of the data.

Hiding Outliers in Box Plots

1. Using Showfliers Parameter

To hide outliers in a box plot using Matplotlib, you can use the showfliers parameter. Here is how you can do it:

Python
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(123)
data = np.random.normal(0, 1, 100)

# Introduce outliers
outliers = np.array([5, 6, 7])  # Add some outliers
data_with_outliers = np.concatenate([data, outliers])

# Create subplots
fig, axs = plt.subplots(1, 2, figsize=(10, 4))

# Boxplot with outliers
axs[0].boxplot(data_with_outliers, showfliers=True)
axs[0].set_title('Boxplot with Outliers')
axs[0].set_ylabel('Value')

# Boxplot without outliers
axs[1].boxplot(data_with_outliers, showfliers=False)
axs[1].set_title('Boxplot without Outliers')
axs[1].set_ylabel('Value')

plt.tight_layout()
plt.show()

Output:

boxplot_outliers
Boxplots Without Outliers

While showfliers=False is the recommended method, there are alternative approaches that can be used, especially in older versions of Matplotlib.

2. Using Empty String for Fliers

In older versions of Matplotlib, you can pass an empty string to the flierprops parameter to hide outliers:

Python
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(123)
data = np.random.normal(0, 1, 100)

# Introduce outliers
outliers = np.array([5, 6, 7])  # Add some outliers
data_with_outliers = np.concatenate([data, outliers])

# Create subplots
fig, axs = plt.subplots(1, 2, figsize=(12, 6))

# Boxplot with outliers
axs[0].boxplot(data_with_outliers)
axs[0].set_title('Boxplot with Outliers')
axs[0].set_ylabel('Value')

# Boxplot without outliers using empty string for flier properties
axs[1].boxplot(data_with_outliers, flierprops=dict(markeredgecolor='none', markerfacecolor='none', markersize=0))
axs[1].set_title('Boxplot without Outliers')
axs[1].set_ylabel('Value')

plt.tight_layout()
plt.show()

Output:

boxplot
Boxplots Without Outliers

However, this method is less intuitive and less commonly used compared to showfliers=False.

Conclusion

Box plots are a powerful tool for data visualization, and hiding outliers can sometimes be necessary to better understand the main distribution of the data. Using Matplotlib, you can easily create box plots without outliers by setting the showfliers parameter to False.


Next Article

Similar Reads