0% found this document useful (0 votes)
109 views45 pages

Bivariate Visualizations: Alex Scriven

This document discusses bivariate visualizations and provides examples of scatterplots, line charts, and correlation plots created with Plotly in Python. It explains that bivariate plots display two variables and common types include scatterplots, line charts, and correlation plots. Examples are given of creating scatterplots and line charts with Plotly Express and graph objects, and building a correlation plot from Pandas correlation data. The document also covers customizing hover information and legends.

Uploaded by

Nikol Cotrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views45 pages

Bivariate Visualizations: Alex Scriven

This document discusses bivariate visualizations and provides examples of scatterplots, line charts, and correlation plots created with Plotly in Python. It explains that bivariate plots display two variables and common types include scatterplots, line charts, and correlation plots. Examples are given of creating scatterplots and line charts with Plotly Express and graph objects, and building a correlation plot from Pandas correlation data. The document also covers customizing hover information and legends.

Uploaded by

Nikol Cotrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Bivariate

visualizations
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N

Alex Scriven
Data Scientist
What are bivariate visualizations?

Bivariate plots are those which display (and can therefore compare) two variables.

Common bivariate plots include:

sca erplots

Correlation plots

Line charts

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


scatterplot
A sca erplot is a plot consisting of:

A y-axis representing one variable

An x-axis representing a di erent variable

Each point is a dot on the graph, e.g., (68, 472)

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


scatterplot with plotly.express

Visualizing Flipper Length and Body Mass


with plotly.express :

import plotly.express as px
fig = px.scatter(
data_frame=penguins,
x="Body Mass (g)",
y="Flipper Length (mm)")
fig.show()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


More plotly.express arguments

Useful plotly.express sca erplot arguments:

trendline : Add di erent types of trend lines

symbol : Set di erent symbols for di erent categories

Check the documentation for more!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Line charts in plotly.express
A line chart is used to plot some variable (y- Here is our simple line chart:
axis) over time (x-axis).

Let's visualize Microso 's stock price.

fig = px.line(
data_frame=msft_stock,
x='Date',
y='Open',
title='MSFT Stock Price (5Y)')
fig.show()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


scatterplots and line plots with graph_objects
For more customization, graph_objects uses
import plotly.graph_objects as go
go.Scatter() for both sca er and line plots.

Here is the code for our penguins sca erplot fig = go.Figure(go.Scatter(
using graph_objects x=penguins['Body Mass (g)'],
y=penguins['Flipper Length (mm)'],
Here is the code for our line chart with mode='markers'))
graph_objects

fig = go.Figure(go.Scatter(
Remember to set 'mode'
x=msft_stock['Date'],
And use DataFrame subsets, not column
y=msft_stock['Opening Stock Price'],
names
mode='lines'))

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


graph_objects vs. plotly.express?

When should we use plotly.express or graph_objects ? Largely, it is about customization -


graph_objects has many more options!

graph_objects express

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Correlation plot

A correlation plot is a way to visualize correlations between variables.

The Pearson Correlation Coe cient summarizes this relationship

Has a value -1 to 1

1 is totally positively correlated


As x increases, y increases

0 is not at all correlated


No relationship between x and y

-1 is totally negatively correlated


As x increases, y decreases

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Correlation plot setup

df contains data on bike sharing rental Our Pearson correlation table:


numbers in Korea with various weather
variables

pandas provides a method to create the data


needed:

cr = df.corr(method='pearson')
print(cr)

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Correlation plot with Plotly
Let's build a correlation plot:

import plotly.graph_objects as go
fig = go.Figure(go.Heatmap(
x=cr.columns,
y=cr.columns,
z=cr.values.tolist(),
colorscale='rdylgn', zmin=-1, zmax=1))
fig.show()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Our correlation plot
Voila!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Customizing hover
information and
legends
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N

Alex Scriven
Data Scientist
What do we mean by hover?
Hover information: The text and data that appears when your mouse hovers over a data point
in a Plotly visualization.

By default, you get some hover information already:

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Other default hover information

The relevant layout argument is hovermode ,


which can be set to di erent values:

x or y : adds a highlight on the x or y axis

x unified / y unified : A do ed line


appears on the relevant axis ( x here) and
the hover-box is forma ed

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Hover information using plotly.express

Customizing hover data in plotly.express :

hover_name = A speci ed column that will appear in bold at the top of the hover box

hover_data = A list of columns to include or a dictionary to include/exclude columns


{column_name: False} (this will exclude column_name )

No extensive forma ing options

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Variables in hover information
Hover columns don't need to be in the plot!

E.g.: Revenue vs. company size with age of We can see age in the hover!
company displayed on hover

fig = px.scatter(revenues,
x="Revenue",
y="employees",
hover_data=['age'])
fig.show()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Styling hover information

There are two main ways to style hover information:

1. Using the hoverlabel layout element


A dictionary of stylistic properties (background colors, borders, font, sizings, etc.)

2. Using the hovertemplate layout element


An HTML-like string to style the text (beyond this course)

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


What is a legend?
A legend is an information box that provides a key to the elements inside the plot, particularly
the color or style.

Legends o en automatically appear with plotly.


For example, when adding colors to our bar chart

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Creating and styling the legend
You can turn on and style the legend using update_layout()

showlegend = True shows the default legend

legend = a dictionary specifying styles and positioning of the legend


x , y : (0-1) the percentage across x or y axis to position

Other stylistic elements such as bgcolor (background color), borderwidth , title , and
font

As always - check the documentation (link) for more!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


A styled legend

We can create a styled legend and position it:

fig.update_layout({
'showlegend': True,
'legend': {
'title': 'All Companies',
'x': 0.5, 'y': 0.8
'bgcolor': 'rgb(246,228,129)'}
})

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Adding annotations
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N

Alex Scriven
Data Scientist
What are annotations?

Annotations are extra boxes of text and data added to a plot.

Unlike hover information, annotations are always present.

They serve two primary purposes:

1. Data-linked annotations (draw a ention, add notes) on a particular point

2. Add extra notes to a plot,


Much like adding a text-box in Microso Word

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Creating annotations
In Plotly you can add annotations in several ways:

1. Using add_annotation()
Adds a single annotation

2. Using update_layout() and the annotations argument


A list of annotation objects

Useful if adding many annotations

For consistency, we'll stick with update_layout()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Important annotation arguments

There are several key elements of an annotation (dictionary) worth highlighting:

showarrow = True / False


Determines whether an arrow will be drawn from the box to the given x / y coordinates

You can style the arrow as well!

text = The actual text to be displayed


You can insert variables into this text too

x and y : coordinates at which to place the annotation

Be careful placing annotations absolutely - if your data changes, things may overlap!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Positioning annotations

By default, the x and y arguments will be in the units of the plot to link to a data point.

However, you can position absolutely by:

Se ing the arguments xref and yref to paper


Now the x and y parameters are 0-1 positions

A position of ( x=0.5 , y=0.5 ) would be right in the middle of the plot

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Data-linked annotations

Let's annotate our company (we know the Nice! We can see our company clearly:
revenue and employee count) on our previous
sca erplot.

my_annotation = {
'x': 215111, 'y': 449000,
'showarrow': True,'arrowhead': 3,
'text': "Our company is doing well",
'font' : {'size': 10, 'color': 'black'}}
fig.update_layout({'annotations': [my_annotation]})
fig.show()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Floating annotation
We can also have a oating annotation,
positioned absolutely.
We get a strong message!
float_annotation = {
'xref': 'paper', 'yref': 'paper',
'x': 0.5, 'y': 0.8,
'showarrow': False,
'text': "You should <b>BUY<b>",
'font' : {'size': 15,'color': 'black'},
'bgcolor': 'rgb(255,0,0)'}

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N
Editing plot axes
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N

Alex Scriven
Data Scientist
Our dataset
Using the penguins dataset, let's aggregate ipper size by species:

spec av_ ip_length


Adelie 189.953642
Chinstrap 195.823529
Gentoo 217.186992

Those columns aren't labeled well for presentation!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


The default axis titles
Let's create a simple bar chart:

fig = px.bar(penguin_flippers,
x='spec',
y='av_flip_length')
fig.show()

This works, but those axes titles aren't great.

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Editing axis titles
plotly o en has 'shortcut' functions:

fig.update_xaxes(title_text='Species')
fig.update_yaxes(title_text='Average Flipper Length')

Or with the more general update_layout()

fig.update_layout('xaxis': {'title': {'text': 'Species'}},


'yaxis': {'title':{'text': 'Average Flipper Length'}})

We will stick with update_layout() for consistency

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Cleaning up our plot
Both methods will produce a more presentation-worthy chart.

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Which method to use?

The shortcut method is helpful to quickly change just that one a ribute.

To further style axes, the update_layout() method allows you to edit:

Font family, font size

Text angle

Text color

Much more!

See more on the Plotly documentation

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Editing axes ranges

Plotly automatically calculates axes ranges from your data - this may not be desired!

Let's set the y-axis to start at 150 and go up to a small bu er (30) past the maximum ipper
length

fig.update_layout({'yaxis':
{'range' : [150,
penguin_flippers['av_flip_length'].max() + 30]}
})

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Our new axes ranges
We get speci c axes:

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Data scale issues
What happens when some data points are much larger than others?

Top 10 countries by number of billionaires

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Our scale problem

Let's plot without any adjustment:

fig = px.bar(billionaire_data,
x='Country',
y='Number Billionaires')
fig.show()

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


The log scale
Common scale used to plot data with large
value di erences.

It looks like this:

Ticks on our y-axis aren't uniform (10,20, 30,


etc.)

Each tick is an order of magnitude bigger (10,


100, 1000, etc.)

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Using log with our data
Plotly has log_y and log_x arguments

fig = px.bar(billionaire_data,
x='Country',
y='Number Billionaires',
log_y=True)
fig.show()

That's be er!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Log scale: a word of warning

When visualizing data, you are telling a story.

If your audience doesn't know what a log scale is, there may be miscommunication.

So remember to keep your audience in mind!

INTRODUCTION TO DATA VISUALIZATION WITH PLOTLY IN PYTHON


Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H P L O T LY I N P Y T H O N

You might also like