Bivariate
visualizations
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Alex Scriven
Data Scientist
What are bivariate visualizations?
Bivariate plots are those which display (and can therefore compare) two variables.
Common bivariate plots include:
sca erplots
Correlation plots
Line charts
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
scatterplot
A sca erplot is a plot consisting of:
A y-axis representing one variable
An x-axis representing a di erent variable
Each point is a dot on the graph, e.g., (68, 472)
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
scatterplot with plotly.express
Visualizing Flipper Length and Body Mass
with plotly.express :
import plotly.express as px
fig = px.scatter(
data_frame=penguins,
x="Body Mass (g)",
y="Flipper Length (mm)")
fig.show()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
More plotly.express arguments
Useful plotly.express sca erplot arguments:
trendline : Add di erent types of trend lines
symbol : Set di erent symbols for di erent categories
Check the documentation for more!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Line charts in plotly.express
A line chart is used to plot some variable (y- Here is our simple line chart:
axis) over time (x-axis).
Let's visualize Microso 's stock price.
fig = px.line(
data_frame=msft_stock,
x='Date',
y='Open',
title='MSFT Stock Price (5Y)')
fig.show()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
scatterplots and line plots with graph_objects
For more customization, graph_objects uses
import plotly.graph_objects as go
go.Scatter() for both sca er and line plots.
Here is the code for our penguins sca erplot fig = go.Figure(go.Scatter(
using graph_objects x=penguins['Body Mass (g)'],
y=penguins['Flipper Length (mm)'],
Here is the code for our line chart with mode='markers'))
graph_objects
fig = go.Figure(go.Scatter(
Remember to set 'mode'
x=msft_stock['Date'],
And use DataFrame subsets, not column
y=msft_stock['Opening Stock Price'],
names
mode='lines'))
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
graph_objects vs. plotly.express?
When should we use plotly.express or graph_objects ? Largely, it is about customization -
graph_objects has many more options!
graph_objects express
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Correlation plot
A correlation plot is a way to visualize correlations between variables.
The Pearson Correlation Coe cient summarizes this relationship
Has a value -1 to 1
1 is totally positively correlated
As x increases, y increases
0 is not at all correlated
No relationship between x and y
-1 is totally negatively correlated
As x increases, y decreases
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Correlation plot setup
df contains data on bike sharing rental Our Pearson correlation table:
numbers in Korea with various weather
variables
pandas provides a method to create the data
needed:
cr = df.corr(method='pearson')
print(cr)
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Correlation plot with Plotly
Let's build a correlation plot:
import plotly.graph_objects as go
fig = go.Figure(go.Heatmap(
x=cr.columns,
y=cr.columns,
z=cr.values.tolist(),
colorscale='rdylgn', zmin=-1, zmax=1))
fig.show()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Our correlation plot
Voila!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Customizing hover
information and
legends
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Alex Scriven
Data Scientist
What do we mean by hover?
Hover information: The text and data that appears when your mouse hovers over a data point
in a Plotly visualization.
By default, you get some hover information already:
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Other default hover information
The relevant layout argument is hovermode ,
which can be set to di erent values:
x or y : adds a highlight on the x or y axis
x unified / y unified : A do ed line
appears on the relevant axis ( x here) and
the hover-box is forma ed
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Hover information using plotly.express
Customizing hover data in plotly.express :
hover_name = A speci ed column that will appear in bold at the top of the hover box
hover_data = A list of columns to include or a dictionary to include/exclude columns
{column_name: False} (this will exclude column_name )
No extensive forma ing options
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Variables in hover information
Hover columns don't need to be in the plot!
E.g.: Revenue vs. company size with age of We can see age in the hover!
company displayed on hover
fig = px.scatter(revenues,
x="Revenue",
y="employees",
hover_data=['age'])
fig.show()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Styling hover information
There are two main ways to style hover information:
1. Using the hoverlabel layout element
A dictionary of stylistic properties (background colors, borders, font, sizings, etc.)
2. Using the hovertemplate layout element
An HTML-like string to style the text (beyond this course)
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
What is a legend?
A legend is an information box that provides a key to the elements inside the plot, particularly
the color or style.
Legends o en automatically appear with plotly.
For example, when adding colors to our bar chart
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Creating and styling the legend
You can turn on and style the legend using update_layout()
showlegend = True shows the default legend
legend = a dictionary specifying styles and positioning of the legend
x , y : (0-1) the percentage across x or y axis to position
Other stylistic elements such as bgcolor (background color), borderwidth , title , and
font
As always - check the documentation (link) for more!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
A styled legend
We can create a styled legend and position it:
fig.update_layout({
'showlegend': True,
'legend': {
'title': 'All Companies',
'x': 0.5, 'y': 0.8
'bgcolor': 'rgb(246,228,129)'}
})
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Adding annotations
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Alex Scriven
Data Scientist
What are annotations?
Annotations are extra boxes of text and data added to a plot.
Unlike hover information, annotations are always present.
They serve two primary purposes:
1. Data-linked annotations (draw a ention, add notes) on a particular point
2. Add extra notes to a plot,
Much like adding a text-box in Microso Word
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Creating annotations
In Plotly you can add annotations in several ways:
1. Using add_annotation()
Adds a single annotation
2. Using update_layout() and the annotations argument
A list of annotation objects
Useful if adding many annotations
For consistency, we'll stick with update_layout()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Important annotation arguments
There are several key elements of an annotation (dictionary) worth highlighting:
showarrow = True / False
Determines whether an arrow will be drawn from the box to the given x / y coordinates
You can style the arrow as well!
text = The actual text to be displayed
You can insert variables into this text too
x and y : coordinates at which to place the annotation
Be careful placing annotations absolutely - if your data changes, things may overlap!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Positioning annotations
By default, the x and y arguments will be in the units of the plot to link to a data point.
However, you can position absolutely by:
Se ing the arguments xref and yref to paper
Now the x and y parameters are 0-1 positions
A position of ( x=0.5 , y=0.5 ) would be right in the middle of the plot
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Data-linked annotations
Let's annotate our company (we know the Nice! We can see our company clearly:
revenue and employee count) on our previous
sca erplot.
my_annotation = {
'x': 215111, 'y': 449000,
'showarrow': True,'arrowhead': 3,
'text': "Our company is doing well",
'font' : {'size': 10, 'color': 'black'}}
fig.update_layout({'annotations': [my_annotation]})
fig.show()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Floating annotation
We can also have a oating annotation,
positioned absolutely.
We get a strong message!
float_annotation = {
'xref': 'paper', 'yref': 'paper',
'x': 0.5, 'y': 0.8,
'showarrow': False,
'text': "You should <b>BUY<b>",
'font' : {'size': 15,'color': 'black'},
'bgcolor': 'rgb(255,0,0)'}
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Editing plot axes
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Alex Scriven
Data Scientist
Our dataset
Using the penguins dataset, let's aggregate ipper size by species:
spec av_ ip_length
Adelie 189.953642
Chinstrap 195.823529
Gentoo 217.186992
Those columns aren't labeled well for presentation!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
The default axis titles
Let's create a simple bar chart:
fig = px.bar(penguin_flippers,
x='spec',
y='av_flip_length')
fig.show()
This works, but those axes titles aren't great.
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Editing axis titles
plotly o en has 'shortcut' functions:
fig.update_xaxes(title_text='Species')
fig.update_yaxes(title_text='Average Flipper Length')
Or with the more general update_layout()
fig.update_layout('xaxis': {'title': {'text': 'Species'}},
'yaxis': {'title':{'text': 'Average Flipper Length'}})
We will stick with update_layout() for consistency
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Cleaning up our plot
Both methods will produce a more presentation-worthy chart.
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Which method to use?
The shortcut method is helpful to quickly change just that one a ribute.
To further style axes, the update_layout() method allows you to edit:
Font family, font size
Text angle
Text color
Much more!
See more on the Plotly documentation
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Editing axes ranges
Plotly automatically calculates axes ranges from your data - this may not be desired!
Let's set the y-axis to start at 150 and go up to a small bu er (30) past the maximum ipper
length
fig.update_layout({'yaxis':
{'range' : [150,
penguin_flippers['av_flip_length'].max() + 30]}
})
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Our new axes ranges
We get speci c axes:
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Data scale issues
What happens when some data points are much larger than others?
Top 10 countries by number of billionaires
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Our scale problem
Let's plot without any adjustment:
fig = px.bar(billionaire_data,
x='Country',
y='Number Billionaires')
fig.show()
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
The log scale
Common scale used to plot data with large
value di erences.
It looks like this:
Ticks on our y-axis aren't uniform (10,20, 30,
etc.)
Each tick is an order of magnitude bigger (10,
100, 1000, etc.)
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Using log with our data
Plotly has log_y and log_x arguments
fig = px.bar(billionaire_data,
x='Country',
y='Number Billionaires',
log_y=True)
fig.show()
That's be er!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Log scale: a word of warning
When visualizing data, you are telling a story.
If your audience doesn't know what a log scale is, there may be miscommunication.
So remember to keep your audience in mind!
INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N