0% found this document useful (0 votes)
9 views

Seaborn

Seaborn is a Python data visualization library that simplifies creating statistical graphics and is integrated with Pandas. It offers features like automatic relationship estimation, built-in themes, and various plot types including scatter plots, bar plots, and heatmaps. Users can easily customize plots and visualize complex data relationships, making it a powerful tool for data analysis.

Uploaded by

efavourable
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Seaborn

Seaborn is a Python data visualization library that simplifies creating statistical graphics and is integrated with Pandas. It offers features like automatic relationship estimation, built-in themes, and various plot types including scatter plots, bar plots, and heatmaps. Users can easily customize plots and visualize complex data relationships, making it a powerful tool for data analysis.

Uploaded by

efavourable
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Seaborn: Data Visualization in Python

Introduction to Seaborn

Seaborn is a data visualization library that provides a high-level interface for


creating informative and attractive statistical graphics. It simplifies the process of
creating complex visualizations and is closely integrated with Pandas.

Main Features:

Automatic estimation and plotting of complex relationships (e.g., linear regression).

Built-in themes for better visual aesthetics.

Simplifies common tasks such as visualizing distributions, relationships, and categorical data.

Importing Seaborn

Before we start using Seaborn, we need to import it along with Matplotlib for
plotting.
import seaborn as sns
import matplotlib.pyplot as plt

# Load a sample dataset


tips = sns.load_dataset('tips')

# Display the first few rows of the dataset


tips.head()

Seaborn Themes

Seaborn allows you to customize the appearance of your plots easily with built-in
themes. There are several preset themes:
# Apply a theme
sns.set_theme(style='darkgrid')

# Example plot using the theme


sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.show()
2. Visualizing Data Distributions

2.1. Histogram (Distplot)

The distplot function shows the distribution of a single variable. It combines a


histogram with a kernel density estimate (KDE).
# Basic histogram and KDE plot
sns.histplot(tips['total_bill'], kde=True)
plt.show()

You can control the bin size of the histogram:


# Histogram with specific number of bins
sns.histplot(tips['total_bill'], bins=20, kde=True)
plt.show()

2.2. Kernel Density Plot

You can also just plot the KDE without the histogram to estimate the probability
density function of a continuous variable.
# KDE plot
sns.kdeplot(tips['total_bill'])
plt.show()

3. Visualizing Relationships Between Variables

3.1. Scatter Plot

A scatter plot is used to observe relationships between two variables.


# Basic scatter plot
sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.show()
You can use the hue parameter to distinguish data points by a categorical variable.
# Scatter plot with categorical distinction
sns.scatterplot(x='total_bill', y='tip', hue='sex', data=tips)
plt.show()

3.2. Pairplot

The pairplot function creates pairwise scatter plots of all numerical columns in the
dataset. This is useful for examining relationships across the dataset.
# Pairplot with hue for categorical distinction
sns.pairplot(tips, hue='sex')
plt.show()

4. Categorical Data Plots

4.1. Bar Plot

A bar plot shows the relationship between a categorical variable and a continuous
variable by aggregating the data.
# Bar plot showing the average tip by day
sns.barplot(x='day', y='tip', data=tips)
plt.show()

You can change the estimator from the default (mean) to other statistical functions
like sum, median, or count.
# Bar plot showing the total bill by day (sum)
sns.barplot(x='day', y='total_bill', estimator=sum, data=tips)
plt.show()

4.2. Box Plot

A box plot is a great way to visualize the distribution of data and identify outliers.
# Box plot comparing the distribution of tips by day
sns.boxplot(x='day', y='tip', data=tips)
plt.show()

Box plots can also show multiple variables by adding the hue parameter.
# Box plot with hue
sns.boxplot(x='day', y='tip', hue='sex', data=tips)
plt.show()
4.3. Violin Plot

A violin plot combines the features of a box plot and a KDE plot, giving a deeper
understanding of the data distribution.
# Violin plot comparing the distribution of tips by day
sns.violinplot(x='day', y='tip', data=tips)
plt.show()

5. Heatmaps

A heatmap is useful for visualizing matrix-like data such as correlation matrices.


# Compute the correlation matrix
corr = tips.corr()

# Create a heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()

6. Regression Plots

6.1. Linear Regression with lmplot

The lmplot function allows you to plot linear regressions between variables.
# Linear regression plot between total_bill and tip
sns.lmplot(x='total_bill', y='tip', data=tips)
plt.show()

You can also add a categorical distinction to the regression plot.


# Linear regression plot with hue
sns.lmplot(x='total_bill', y='tip', hue='sex', data=tips)
plt.show()

7. Customizing Plots

7.1. Titles and Labels

You can add titles, labels, and adjust other elements using Matplotlib functions.
# Add a title and labels
sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.title('Scatter Plot of Total Bill vs. Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()

7.2. Changing Plot Size

Seaborn plots can be resized using the plt.figure function from Matplotlib.
# Resize plot
plt.figure(figsize=(10, 6))
sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.show()

8. Matrix plots are a special type of plot in Seaborn that display data in a grid
format, often used for visualizing relationships between variables in a dataset or
showing correlations between them. Seaborn offers several matrix plot functions,
such as heatmaps and clustermaps, which help with analyzing patterns, trends, and
correlations.

Types of Matrix Plots in Seaborn


1. Heatmap: Displays matrix-like data as a grid of color-coded cells.
2. Clustermap: A heatmap that also clusters data hierarchically to reveal
patterns.
3. Pairplot (though not strictly a matrix plot, it's often used to visualize
pairwise relationships).

Heatmap

What is a Heatmap?

A heatmap is a 2D plot where each cell of the matrix contains a color representing
the value of a variable. It's commonly used to visualize correlation matrices, or any
matrix-like data (like a confusion matrix in machine learning).

Basic Syntax

# Calculate the correlation matrix


corr = tips.corr()
# Create a heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')

Explanation:

 tips.corr(): Calculates the correlation matrix of numerical columns in the dataset.


 sns.heatmap(): Plots the correlation matrix as a heatmap.
o annot=True: Displays the correlation values in each cell.
o cmap='coolwarm': Sets the color map (color gradient from cool to warm).

Heatmap Use Cases:

 Correlation Analysis: Use a heatmap to quickly identify correlations between variables.


 Confusion Matrix Visualization: In machine learning, heatmaps are often used to
display confusion matrices.

2. Clustermap

What is a Clustermap?

A clustermap is an enhanced heatmap that clusters both rows and columns using
hierarchical clustering algorithms. It helps in identifying patterns in data and
grouping similar rows or columns together.

Basic Syntax
# Load the dataset
iris = sns.load_dataset('iris')

# Create a clustermap based on the correlation matrix


sns.clustermap(iris.corr(), annot=True, cmap='coolwarm')
plt.title("Clustermap of Iris Dataset")
plt.show()

Explanation:

 sns.clustermap(): Automatically clusters rows and columns in the dataset based on


their similarity.
o annot=True: Displays the correlation values in each cell.
o cmap='coolwarm': Color map for the clustermap.

Clustermap Use Cases:


 Pattern Discovery: Helps in discovering groups of similar features or observations in the
dataset.
 Hierarchical Clustering Visualization: Shows how variables are grouped based on their
correlation or distance.

3. Pairplot

What is a Pairplot?

Though not strictly a matrix plot, the pairplot is a very common visualization used
to examine pairwise relationships between variables in a dataset. It plots all
possible scatter plots between numerical columns, along with histograms or KDE
plots for the diagonal.

Basic Syntax
# Load the iris dataset
iris = sns.load_dataset('iris')

# Create a pairplot
sns.pairplot(iris, hue='species')

Explanation:

 sns.pairplot(): Creates pairwise scatter plots between all numerical columns.


o hue='species': Color-codes the points based on the categorical column (in
this case, species).
o The diagonal shows the distribution of each variable.

Pairplot Use Cases:

 Exploratory Data Analysis (EDA): Quickly examine pairwise relationships between


numerical variables.
 Visualizing Clusters: Use the hue parameter to see how different categories are

You might also like