Seaborn is a widely used Python library used for creating statistical data visualizations. It is built on the top of Matplotlib and designed to work with Pandas, it helps in the process of making complex plots with fewer lines of code. It specializes in visualizing data distributions, relationships and categorical data using beautiful styles and color palettes. Common visualizations include line plots, violin plots, heatmaps, etc. These graphical representations help us to better interpret and communicate data insights.
Installing Seaborn for Data Visualization
To install Seaborn we use the pip command. If pip is not installed on your system please refer to our article-
Creating Plots with Seaborn
Let’s see various types of plots with simple code to understand how to use it effectively.
1. Line plot
Lineplot is used to display the relationship between two numeric variables showing how one variable changes over intervals or time. It can also include multiple categories through semantic groupings to compare different groups in the same plot.
Syntax:
sns.lineplot(x=None, y=None, data=None)
Parameters:
- x, y: Numeric input variables. These can be arrays, lists or column names from a DataFrame.
- data: DataFrame containing the data.
Returns: An Axes object with the line plot.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.lineplot(x=df.index, y='Age', data=df)
Output:
Line plot2. Scatter Plot
Scatter plots are used to visualize the relationship between two numerical variables. They help identify correlations or patterns. It can draw a two-dimensional graph.
Syntax:
sns.scatterplot(x=None, y=None, data=None)
Parameters:
- x, y: Input data variables that should be numeric.
- data (optional): Dataset containing the variables.
Returns: An Axes object with the scatter plot.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.scatterplot(x=df.index, y='Age', data=df)
Output:
Scatter Plot3. Box plot
A box plot is the visual representation of the depicting groups of numerical data with their quartiles against continuous/categorical data.
It consists of 5 key statistics: Minimum ,First Quartile or 25% , Median (Second Quartile) or 50%, Third Quartile or 75% and Maximum
Syntax:
sns.boxplot(x=None, y=None, hue=None, data=None)
Parameters:
- x, y, hue: Variables for plotting long-form data.
- data: Dataset to plot. If x and y are absent data is treated as wide-form.
Returns: An Axes object with the box plot.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.boxplot(y='Age', data=df)
Output:
Box plot4. Violin Plot
A violin plot is similar to a boxplot. It shows several quantitative data across one or more categorical variables such that those distributions can be compared.
Syntax:
sns.violinplot(x=None, y=None, hue=None, data=None)
Parameters:
- x, y, hue: Inputs for plotting long-form data.
- data: Dataset for plotting.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.violinplot(y='Age', data=df)
Output:
Violin Plot5. Swarm plot
A swarm plot displays individual data points without overlap along a categorical axis which provides a clear view of distribution density.
Syntax:
sns.swarmplot(x=None, y=None, hue=None, data=None)
Parameters:
- x, y, hue: Inputs for plotting long-form data.
- data: Dataset for plotting.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.swarmplot(x=df.index, y='Age', data=df)
Output:
Swarm plot6. Bar plot
Barplot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars.
Syntax:
sns.barplot(x=None, y=None, hue=None, data=None)
Parameters :
- x, y : Variables or column names for long-form data.
- hue : (optional) Column for color encoding.
- data : (optional) Dataset to plot.
Returns: Axes object with the bar plot.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.barplot(x='Name', y='Age', data=df)
Output:
Bar plot7. Point plot
Point plot show point estimates and confidence intervals using scatter glyphs which represents the central tendency of a numeric variable.
Syntax:
sns.pointplot(x=None, y=None, hue=None, data=None)
Parameters:
- x, y: Inputs for plotting long-form data.
- hue: (optional) column name for color encoding.
- data: Dataframe as a Dataset for plotting.
Return: Axes object with the point plot.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.pointplot(x='Name', y='Age', data=df)
Output:
Point plot8. Count plot
A Count plot displays the number of occurrences of each category using bars to visualize the distribution of categorical variables.
Syntax :
sns.countplot(x=None, y=None, hue=None, data=None)
Parameters :
- x, y: Inputs for plotting long-form data.
- hue: (optional) column name for color encoding.
- data: Dataframe as a Dataset for plotting.
Returns: Axes object with the count plot.
Example:
Python
import pandas as pd
import seaborn as sns
data = {'Name': ['ANSH', 'SAHIL', 'JAYAN', 'ANURAG'],
'Age': [21, 23, 20, 24]}
df = pd.DataFrame(data)
sns.countplot(x='Name', data=df)
Output:
Count plot9. KDE Plot
KDE Plot (Kernel Density Estimate) is used for visualizing the Probability Density of a continuous variable at different values in a continuous variable. We can also plot a single graph for multiple samples which helps in more efficient data visualization.
Syntax:
sns.kdeplot(x=None, *, y=None, vertical=False, palette=None, data=None, **kwargs)
Parameters:
- x, y: Vectors or data keys.
- vertical: Boolean to plot vertically.
- palette: Color palette.
- data: Dataframe
Example:
Python
from sklearn import datasets
import pandas as pd
import seaborn as sns
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=['Sepal_Length',
'Sepal_Width', 'Petal_Length', 'Petal_Width'])
iris_df['Target'] = iris.target
iris_df['Target'].replace([0], 'Iris_Setosa', inplace=True)
iris_df['Target'].replace([1], 'Iris_Versicolor', inplace=True)
iris_df['Target'].replace([2], 'Iris_Virginica', inplace=True)
sns.kdeplot(iris_df.loc[(iris_df['Target'] == 'Iris_Virginica'),
'Sepal_Length'], color='b', shade=True, label='Iris_Virginica')
Output:
KDE PlotHow to Customize Seaborn Plots with Python?
Customizing Seaborn plots increases their readability and visual appeal which makes the data insights clearer and more informative. Here are several ways we can customize our plots in Seaborn:
1. Changing Plot Style and Theme
Seaborn provides several built-in themes that can change the overall look of our plots. These themes help to improve the visual presentation which makes the plots more suitable for different contexts. The available themes include darkgrid, whitegrid, dark, white and ticks.
Python
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("whitegrid")
sns.boxplot(x='species', y='petal_length', data=sns.load_dataset('iris'))
plt.title('Petal Length Distribution by Species')
plt.show()
Output:
Changing Plot Style and Theme2. Customizing Color Palettes
Seaborn allows us to choose from predefined color palettes like deep, muted or bright or we can create our custom palettes using the sns.color_palette() function. Custom palettes can enhance the look of our visualizations by selecting colors that are suitable for our dataset.
Python
custom_palette = sns.color_palette("husl", 8)
sns.set_palette(custom_palette)
sns.violinplot(x='species', y='petal_length', data=sns.load_dataset('iris'))
plt.title('Petal Length Distribution by Species')
plt.show()
Output:
Customizing Color Palettes3. Adding Titles and Axis Labels
Adding descriptive titles and axis labels makes our plots more understandable and informative. Using Matplotlib's plt.title(), plt.xlabel() and plt.ylabel() to set titles and axis labels.
Python
sns.scatterplot(x='sepal_length', y='sepal_width', data=sns.load_dataset('iris'))
plt.title('Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()
Output:
Adding Titles and Axis LabelsWe can adjust the figure size using plt.figure(figsize=(width,height)) to control the plot's dimensions. This allows for better customization to fit different presentation or reports.
Python
plt.figure(figsize=(10, 6))
sns.lineplot(x='year', y='passengers', data=sns.load_dataset('flights'))
plt.title('Number of Passengers Over Time')
plt.show()
Output:
Adjusting Figure Size and Aspect Ratio5. Adding Markers to Line Plots
Markers can be added to Seaborn line plots using the marker
argument to highlight data points. For example adding circular markers to the line plot using sns.lineplot(x='x', y='y' ,marker='o')
Python
sns.lineplot(x='year', y='passengers', data=sns.load_dataset('flights'), marker='o')
plt.title('Number of Passengers Over Time')
plt.show()
Output:
Adding Markers to Line PlotsVisualizing Relationships and Patterns with Seaborn
We’ll see various plots in Seaborn for visualizing relationships, distributions and trends across our dataset. These visualizations help to find hidden patterns and correlations in datasets with multiple variables.
1. Pair Plots
Pair plots are used explore relationships between several variables by generating scatter plots for every pair of variables in a dataset along with univariate distributions on the diagonal. This is useful for exploring datasets with multiple variables and seeing potential correlations.
Syntax:
sns.pairplot(data, hue=None)
Parameters:
- data: Dataset to plot.
- hue: (optional) Categorical variable used for color coding data points.
Returns: An array of Axes objects containing the scatter plot grid and distributions.
Example:
Python
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("iris")
sns.pairplot(data, hue="species")
plt.show()
Output:
Pair Plots2. Joint Plots
Joint plots combine a scatter plot with the distributions of the individual variables. This allows for a quick visual representation of how the variables are distributed individually and how they relate to one another.
Syntax:
sns.jointplot(x, y, data, kind='scatter')
Parameters:
- x, y: Variables to plot.
- data: Dataset to plot.
- kind: Type of plot to display ('scatter', 'kde', 'reg' etc).
Returns:
An Axes object with the joint plot including scatter plot and distribution plots on the margins.
Example:
Python
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset("tips")
sns.jointplot(x="total_bill", y="tip", data=data, kind="scatter")
plt.show()
Output:
Joint PlotsThis creates a scatter plot between total_bill and tip with histograms of the individual distributions along the margins. The kind
parameter can be set to 'kde' for kernel density estimates or 'reg' for regression plots.
3. Grid Plot
Grid plots in Seaborn are used to create multiple subplots in a grid layout. Using Seaborn's FacetGrid we can visualize how variables interact across different categories which makesit easier to compare groups or conditions within our dataset.
Syntax:
g = sns.FacetGrid(data, col='column_name', row='row_name')
g.map(sns.scatterplot, 'x', 'y')
Parameters:
- data: Dataset to plot.
- col, row: Variables for the columns and rows of the grid (categorical variables).
- sns.scatterplot: The plotting function to apply to each facet.
Returns: A FacetGrid object with the grid of plots.
Example: To use FacetGrid, we first need to initialize it with a dataset and specify the variables that will form the row, column or hue dimensions of the grid. Here is an example using the tips
dataset:
Python
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time", row="sex")
Output:
Multi-Plot Grids with Seaborn's FacetGridRegression Plots: Visualizing Linear Relationships
Seaborn simplifies the process of performing and visualizing regressions specifically linear regressions which is important for identifying relationships between variables, detecting trends and making predictions. It supports two primary functions for regression visualization:
regplot()
: This function plots a scatter plot along with a linear regression model fit.lmplot()
: This function also plots linear models but provides more flexibility in handling multiple facets and datasets.
Example: Let’s use a simple dataset to visualize a linear regression between two variables: x
(independent variable) and y
(dependent variable).
Python
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
sns.regplot(x='total_bill', y='tip', data=tips, scatter_kws={'s':10}, line_kws={'color':'red'})
plt.show()
Output:
Regression PlotsAs we explore Seaborn functions and techniques we can create clear, customized and insightful visualizations that helps us to understand our data better.
Similar Reads
Python - Data visualization tutorial
Data visualization is a crucial aspect of data analysis, helping to transform analyzed data into meaningful insights through graphical representations. This comprehensive tutorial will guide you through the fundamentals of data visualization using Python. We'll explore various libraries, including M
7 min read
What is Data Visualization and Why is It Important?
Data visualization is the graphical representation of information. In this guide we will study what is Data visualization and its importance with use cases.Understanding Data VisualizationData visualization translates complex data sets into visual formats that are easier for the human brain to under
4 min read
Data Visualization using Matplotlib in Python
Matplotlib is a widely-used Python library used for creating static, animated and interactive data visualizations. It is built on the top of NumPy and it can easily handles large datasets for creating various types of plots such as line charts, bar charts, scatter plots, etc. These visualizations he
10 min read
Data Visualization with Seaborn - Python
Seaborn is a widely used Python library used for creating statistical data visualizations. It is built on the top of Matplotlib and designed to work with Pandas, it helps in the process of making complex plots with fewer lines of code. It specializes in visualizing data distributions, relationships
9 min read
Data Visualization with Pandas
Pandas allows to create various graphs directly from your data using built-in functions. This tutorial covers Pandas capabilities for visualizing data with line plots, area charts, bar plots, and more.Introducing Pandas for Data VisualizationPandas is a powerful open-source data analysis and manipul
5 min read
Plotly for Data Visualization in Python
Plotly is an open-source Python library designed to create interactive, visually appealing charts and graphs. It helps users to explore data through features like zooming, additional details and clicking for deeper insights. It handles the interactivity with JavaScript behind the scenes so that we c
12 min read
Data Visualization using Plotnine and ggplot2 in Python
Plotnoine is a Python library that implements a grammar of graphics similar to ggplot2 in R. It allows users to build plots by defining data, aesthetics, and geometric objects. This approach provides a flexible and consistent method for creating a wide range of visualizations. It is built on the con
7 min read
Introduction to Altair in Python
Altair is a statistical visualization library in Python. It is a declarative in nature and is based on Vega and Vega-Lite visualization grammars. It is fast becoming the first choice of people looking for a quick and efficient way to visualize datasets. If you have used imperative visualization libr
5 min read
Python - Data visualization using Bokeh
Bokeh is a data visualization library in Python that provides high-performance interactive charts and plots. Bokeh output can be obtained in various mediums like notebook, html and server. It is possible to embed bokeh plots in Django and flask apps. Bokeh provides two visualization interfaces to us
4 min read
Pygal Introduction
Python has become one of the most popular programming languages for data science because of its vast collection of libraries. In data science, data visualization plays a crucial role that helps us to make it easier to identify trends, patterns, and outliers in large data sets. Pygal is best suited f
5 min read