0% found this document useful (0 votes)
87 views

Density - Contour Plot

The document discusses various methods for visualizing three-dimensional data using two-dimensional plots in Matplotlib, including contour plots, filled contour plots, and heatmaps. It also covers customizing plots, histograms, two-dimensional histograms, kernel density estimation, and legends.

Uploaded by

kidal14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Density - Contour Plot

The document discusses various methods for visualizing three-dimensional data using two-dimensional plots in Matplotlib, including contour plots, filled contour plots, and heatmaps. It also covers customizing plots, histograms, two-dimensional histograms, kernel density estimation, and legends.

Uploaded by

kidal14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Density and Contour Plots

Sometimes it's helpful to show three-dimensional information as a two-dimensional map,


complete with contours and colored regions. The Matplotlib functions plt.contour (for contour
plots), plt.contourf (for filled contour plots), and plt.imshow (for displaying images) can all be
useful in this regard. Several applications of these are discussed below. In order to begin
plotting, we will first import the necessary functions into the notebook and configure it for use.

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np

Visualizing a Three-Dimensional Function


We'll start by demonstrating a contour plot using a function z=f(x,y), using the following
particular choice for f

def f(x, y):


return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

A contour plot can be created with the plt.contour function. It takes three arguments: a grid
of x values, a grid of y values, and a grid of z values. The x and y values represent positions on
the plot, and the z values will be represented by the contour levels. Perhaps the most
straightforward way to prepare such data is to use the np.meshgrid function, which builds
two-dimensional grids from one-dimensional arrays:

x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

Now let's look at this with a standard line-only contour plot:

plt.contour(X, Y, Z, colors='black');
Notice that by default when a single color is used, negative values are represented by dashed
lines, and positive values by solid lines. Alternatively, the lines can be color-coded by
specifying a colormap with the cmap argument. Here, we'll also specify that we want more lines
to be drawn—20 equally spaced intervals within the data range:

plt.contour(X, Y, Z, 20, cmap='RdGy');

Here we chose the RdGy (short for Red-Gray) colormap, which is a good choice for centered
data. Matplotlib has a wide range of colormaps available, which you can easily browse in
IPython by doing a tab completion on the plt.cm module:

plt.cm.<TAB>

Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can
change this by switching to a filled contour plot using the plt.contourf() function (notice
the f at the end), which uses largely the same syntax as plt.contour().

Additionally, we'll add a plt.colorbar() command, which automatically creates an


additional axis with labeled color information for the plot:

plt.contourf(X, Y, Z, 20, cmap='RdGy')


plt.colorbar();

The colorbar makes it clear that the black regions are "peaks," while the red regions are
"valleys."

This plot has the potential to have problems due to the fact that it is quite "splotchy." In other
words, the color stages are discrete rather than continuous, which is not always preferable.
Setting the number of contours to a very high value might fix this, but it would produce a plot
that is highly expensive since Matplotlib would have to draw a new polygon for each
increment. To solve this problem, we may make advantage of plt.imshow(), a useful method
that converts a two-dimensional data grid into a picture.
The following code shows this:

plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',


cmap='RdGy')
plt.colorbar()
plt.axis(aspect='image');

There are a few potential gotchas with imshow(), however:

 plt.imshow() doesn't accept an x and y grid, so you must manually specify the extent
[xmin, xmax, ymin, ymax] of the image on the plot.
 plt.imshow() by default follows the standard image array definition where the origin
is in the upper left, not in the lower left as in most contour plots. This must be changed
when showing gridded data.
 plt.imshow() will automatically adjust the axis aspect ratio to match the input data;
this can be changed by setting, for example, plt.axis(aspect='image') to make x
and y units match.

Finally, it can sometimes be useful to combine contour plots and image plots. For example,
here we'll use a partially transparent background image (with transparency set via the alpha
parameter) and overplot contours with labels on the contours themselves (using the
plt.clabel() function):

contours = plt.contour(X, Y, Z, 3, colors='black')


plt.clabel(contours, inline=True, fontsize=8)

plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',


cmap='RdGy', alpha=0.5)
plt.colorbar();
The combination of these three functions—plt.contour, plt.contourf, and plt.imshow—
gives nearly limitless possibilities for displaying this sort of three-dimensional data within a
two-dimensional plot. For more information on the options available in these functions, refer
to their docstrings. If you are interested in three-dimensional visualizations of this type of
data, see Three-dimensional Plotting in Matplotlib.
Histograms, Binning’s, and Density
In many cases, a basic histogram is the best place to start when trying to make sense of a dataset.
We showed earlier how, when the standard boilerplate imports are made, Matplotlib's
histogram function can produce a simple histogram in a single line:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')

data = np.random.randn(1000)
plt.hist(data);

The hist() function has many options to tune both the calculation and the display; here's an
example of a more customized histogram:

plt.hist(data, bins=30, normed=True, alpha=0.5,


histtype='stepfilled', color='steelblue',
edgecolor='none');

The plt.hist docstring has more information on other customization options available. I find
this combination of histtype='stepfilled' along with some transparency alpha to be very
useful when comparing histograms of several distributions:

x1 = np.random.normal(0, 0.8, 1000)


x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

kwargs = dict(histtype='stepfilled', alpha=0.3, normed=True, bins=40)

plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
If you would like to simply compute the histogram (that is, count the number of points in a
given bin) and not display it, the np.histogram() function is available:

counts, bin_edges = np.histogram(data, bins=5)


print(counts)
[ 12 190 468 301 29]

Two-Dimensional Histograms and Binnings


Histograms can be made in two dimensions by assigning each point to one of several different
bins, just as they can be made in one dimension by segmenting the number line. In this article,
we will examine, briefly, a number of potential approaches. To begin, let's define some
information, which will be an x and y array sampled from a multivariate Gaussian distribution:

mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T

plt.hist2d: Two-dimensional histogram

One straightforward way to plot a two-dimensional histogram is to use Matplotlib's


plt.hist2d function:

plt.hist2d(x, y, bins=30, cmap='Blues')


cb = plt.colorbar()
cb.set_label('counts in bin')

Just as with plt.hist, plt.hist2d has a number of extra options to fine-tune the plot and the
binning, which are nicely outlined in the function docstring. Further, just as plt.hist has a
counterpart in np.histogram, plt.hist2d has a counterpart in np.histogram2d, which can
be used as follows:
counts, xedges, yedges = np.histogram2d(x, y, bins=30)

For the generalization of this histogram binning in dimensions higher than two, see the
np.histogramdd function.

plt.hexbin: Hexagonal binnings

The two-dimensional histogram creates a tesselation of squares across the axes. Another natural
shape for such a tesselation is the regular hexagon. For this purpose, Matplotlib provides the
plt.hexbin routine, which will represents a two-dimensional dataset binned within a grid of
hexagons:

plt.hexbin(x, y, gridsize=30, cmap='Blues')


cb = plt.colorbar(label='count in bin')

plt.hexbin has a number of interesting options, including the ability to specify weights for
each point, and to change the output in each bin to any NumPy aggregate (mean of weights,
standard deviation of weights, etc.).

Kernel density estimation

Another common method of evaluating densities in multiple dimensions is kernel density


estimation (KDE). One extremely quick and simple KDE implementation exists in the
scipy.stats package. Here is a quick example of using the KDE on this data:

from scipy.stats import gaussian_kde

# fit an array of size [Ndim, Nsamples]


data = np.vstack([x, y])
kde = gaussian_kde(data)

# evaluate on a regular grid


xgrid = np.linspace(-3.5, 3.5, 40)
ygrid = np.linspace(-6, 6, 40)
Xgrid, Ygrid = np.meshgrid(xgrid, ygrid)
Z = kde.evaluate(np.vstack([Xgrid.ravel(), Ygrid.ravel()]))

# Plot the result as an image


plt.imshow(Z.reshape(Xgrid.shape),
origin='lower', aspect='auto',
extent=[-3.5, 3.5, -6, 6],
cmap='Blues')
cb = plt.colorbar()
cb.set_label("density")
KDE has a smoothing length that effectively slides the knob between detail and smoothness
(one example of the ubiquitous bias–variance trade-off). The literature on choosing an
appropriate smoothing length is vast: gaussian_kde uses a rule-of-thumb to attempt to find a
nearly optimal smoothing length for the input data.
Customizing Plot Legends

An illustration's plot legend explains the data visualization's significance by labeling the chart's
symbols. After learning the basics of legend creation, we'll go into Matplotlib's aesthetic and
positioning customization options.
The plt.legend() tool generates a minimal legend for all named plot objects with a single
argument:

import matplotlib.pyplot as plt


plt.style.use('classic')
%matplotlib inline
import numpy as np
x = np.linspace(0, 10, 1000)
fig, ax = plt.subplots()
ax.plot(x, np.sin(x), '-b', label='Sine')
ax.plot(x, np.cos(x), '--r', label='Cosine')
ax.axis('equal')
leg = ax.legend();

But there are many ways we might want to customize such a legend. For example, we can
specify the location and turn off the frame:

ax.legend(loc='upper left', frameon=False)


fig

We can use the ncol command to specify the number of columns in the legend:

ax.legend(frameon=False, loc='lower center', ncol=2)


fig
We can use a rounded box (fancybox) or add a shadow, change the transparency (alpha value)
of the frame, or change the padding around the text:

ax.legend(fancybox=True, framealpha=1, shadow=True, borderpad=1)


fig

For more information on available legend options, see the plt.legend docstring.

Choosing Elements for the Legend


As we have already seen, the legend includes all labeled elements by default. If this is not what
is desired, we can fine-tune which elements and labels appear in the legend by using the objects
returned by plot commands. The plt.plot() command is able to create multiple lines at once,
and returns a list of created line instances. Passing any of these to plt.legend() will tell it
which to identify, along with the labels we'd like to specify:

y = np.sin(x[:, np.newaxis] + np.pi * np.arange(0, 2, 0.5))


lines = plt.plot(x, y)

# lines is a list of plt.Line2D instances


plt.legend(lines[:2], ['first', 'second']);
I generally find in practice that it is clearer to use the first method, applying labels to the plot
elements you'd like to show on the legend:

plt.plot(x, y[:, 0], label='first')


plt.plot(x, y[:, 1], label='second')
plt.plot(x, y[:, 2:])
plt.legend(framealpha=1, frameon=True);

Notice that by default, the legend ignores all elements without a label attribute set.

Legend for Size of Points


In certain cases, the default legend entries won't do the trick for a specific visualization. You
may wish to make a legend that accounts for the fact that you'll be utilizing point sizes to denote
certain characteristics of the data. Here, we'll see how point sizes may be used to convey the
population of several California communities. To create a legend that describes the relative
sizes of the points, we'll display some empty labeled data.

import pandas as pd
cities = pd.read_csv('data/california_cities.csv')

# Extract the data we're interested in


lat, lon = cities['latd'], cities['longd']
population, area = cities['population_total'], cities['area_total_km2']

# Scatter the points, using size and color but no label


plt.scatter(lon, lat, label=None,
c=np.log10(population), cmap='viridis',
s=area, linewidth=0, alpha=0.5)
plt.axis(aspect='equal')
plt.xlabel('longitude')
plt.ylabel('latitude')
plt.colorbar(label='log$_{10}$(population)')
plt.clim(3, 7)

# Here we create a legend:


# we'll plot empty lists with the desired size and label
for area in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.3, s=area,
label=str(area) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False, labelspacing=1, title='City
Area')

plt.title('California Cities: Area and Population');


Because the legend always refers to anything that appears on the plot, plotting the desired form
is necessary for displaying it. Here, because the target objects (gray circles) are absent from
the plot, we resort to a trick: we plot empty lists to make it seem like there are more items there.
It's also worth noting that the legend only includes labels that were explicitly added to story
components.

Since the legend can now pick up on named plot objects thanks to our charting of empty lists,
we can utilize it to convey important information. A more complex graphic may be created
using this method.

Multiple Legends
Sometimes when designing a plot you'd like to add multiple legends to the same axes.
Unfortunately, Matplotlib does not make this easy: via the standard legend interface, it is only
possible to create a single legend for the entire plot. If you try to create a second legend using
plt.legend() or ax.legend(), it will simply override the first one. We can work around this
by creating a new legend artist from scratch, and then using the lower-level ax.add_artist()
method to manually add the second artist to the plot:

fig, ax = plt.subplots()

lines = []
styles = ['-', '--', '-.', ':']
x = np.linspace(0, 10, 1000)

for i in range(4):
lines += ax.plot(x, np.sin(x - i * np.pi / 2),
styles[i], color='black')
ax.axis('equal')

# specify the lines and labels of the first legend


ax.legend(lines[:2], ['line A', 'line B'],
loc='upper right', frameon=False)

# Create the second legend and add the artist manually.


from matplotlib.legend import Legend
leg = Legend(ax, lines[2:], ['line C', 'line D'],
loc='lower right', frameon=False)
ax.add_artist(leg);
This is a peek into the low-level artist objects that comprise any Matplotlib plot. If you examine
the source code of ax.legend() (recall that you can do this with within the IPython notebook
using ax.legend??) you'll see that the function simply consists of some logic to create a
suitable Legend artist, which is then saved in the legend_ attribute and added to the figure
when the plot is drawn.
Customizing Colour bars
The labels of each dots on a graph are named in the legend. A labeled colorbar is helpful for
assigning continuous labels depending on the color of points, lines, or regions. A colorbar is
an additional axis in Matplotlib that may explain what each color means in a given plot. This
Study includes a companion online supplement where you may see the figures in full color In
order to begin plotting, we will first load the necessary functions into the notebook and
configure it for usage.

import matplotlib.pyplot as plt


plt.style.use('classic')
%matplotlib inline
import numpy as np

As we have seen several times throughout this section, the simplest colorbar can be created
with the plt.colorbar function:

x = np.linspace(0, 10, 1000)


I = np.sin(x) * np.cos(x[:, np.newaxis])

plt.imshow(I)
plt.colorbar();

We'll now discuss a few ideas for customizing these colorbars and using them effectively in
various situations.

Customizing Colorbars
The colormap can be specified using the cmap argument to the plotting function that is creating
the visualization:

plt.imshow(I, cmap='gray');
All the available colormaps are in the plt.cm namespace; using IPython's tab-completion will
give you a full list of built-in possibilities:

plt.cm.<TAB>

But being able to choose a colormap is just the first step: more important is how to decide
among the possibilities! The choice turns out to be much more subtle than you might initially
expect.

Choosing the Colormap

A full treatment of color choice within visualization is beyond the scope of this book, but for
entertaining reading on this subject and others, see the article "Ten Simple Rules for Better
Figures". Matplotlib's online documentation also has an interesting discussion of colormap
choice.

Broadly, you should be aware of three different categories of colormaps:

 Sequential colormaps: These are made up of one continuous sequence of colors (e.g.,
binary or viridis).
 Divergent colormaps: These usually contain two distinct colors, which show positive
and negative deviations from a mean (e.g., RdBu or PuOr).
 Qualitative colormaps: these mix colors with no particular sequence (e.g., rainbow or
jet).

The jet colormap, which was the default in Matplotlib prior to version 2.0, is an example of a
qualitative colormap. Its status as the default was quite unfortunate, because qualitative maps
are often a poor choice for representing quantitative data. Among the problems is the fact that
qualitative maps usually do not display any uniform progression in brightness as the scale
increases.

We can see this by converting the jet colorbar into black and white:

from matplotlib.colors import LinearSegmentedColormap

def grayscale_cmap(cmap):
"""Return a grayscale version of the given colormap"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))

# convert RGBA to perceived grayscale luminance


# cf. http://alienryderflex.com/hsp.html
RGB_weight = [0.299, 0.587, 0.114]
luminance = np.sqrt(np.dot(colors[:, :3] ** 2, RGB_weight))
colors[:, :3] = luminance[:, np.newaxis]

return LinearSegmentedColormap.from_list(cmap.name + "_gray", colors,


cmap.N)

def view_colormap(cmap):
"""Plot a colormap with its grayscale equivalent"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
cmap = grayscale_cmap(cmap)
grayscale = cmap(np.arange(cmap.N))

fig, ax = plt.subplots(2, figsize=(6, 2),


subplot_kw=dict(xticks=[], yticks=[]))
ax[0].imshow([colors], extent=[0, 10, 0, 1])
ax[1].imshow([grayscale], extent=[0, 10, 0, 1])
view_colormap('jet')

Notice the bright stripes in the grayscale image. Even in full color, this uneven brightness
means that the eye will be drawn to certain portions of the color range, which will potentially
emphasize unimportant parts of the dataset. It's better to use a colormap such as viridis (the
default as of Matplotlib 2.0), which is specifically constructed to have an even brightness
variation across the range. Thus it not only plays well with our color perception, but also will
translate well to grayscale printing:

view_colormap('viridis')

If you favor rainbow schemes, another good option for continuous data is the cubehelix
colormap:

view_colormap('cubehelix')

For other situations, such as showing positive and negative deviations from some mean, dual-
color colorbars such as RdBu (Red-Blue) can be useful. However, as you can see in the
following figure, it's important to note that the positive-negative information will be lost upon
translation to grayscale!

view_colormap('RdBu')

We'll see examples of using some of these color maps as we continue.


There are a large number of colormaps available in Matplotlib; to see a list of them, you can
use IPython to explore the plt.cm submodule. For a more principled approach to colors in
Python.

Color limits and extensions

Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply an
instance of plt.Axes, so all of the axes and tick formatting tricks we've learned are applicable.
The colorbar has some interesting flexibility: for example, we can narrow the color limits and
indicate the out-of-bounds values with a triangular arrow at the top and bottom by setting the
extend property. This might come in handy, for example, if displaying an image that is subject
to noise:

# make noise in 1% of the image pixels


speckles = (np.random.random(I.shape) < 0.01)
I[speckles] = np.random.normal(0, 3, np.count_nonzero(speckles))

plt.figure(figsize=(10, 3.5))

plt.subplot(1, 2, 1)
plt.imshow(I, cmap='RdBu')
plt.colorbar()

plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);

Notice that in the left panel, the default color limits respond to the noisy pixels, and the range
of the noise completely washes-out the pattern we are interested in. In the right panel, we
manually set the color limits, and add extensions to indicate values which are above or below
those limits. The result is a much more useful visualization of our data.
Discrete Color Bars

Colormaps are by default continuous, but sometimes you'd like to represent discrete values.
The easiest way to do this is to use the plt.cm.get_cmap() function, and pass the name of a
suitable colormap along with the number of desired bins:

plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6))


plt.colorbar()
plt.clim(-1, 1);

The discrete version of a colormap can be used just like any other colormap.

You might also like