Density - Contour Plot
Density - Contour Plot
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
A contour plot can be created with the plt.contour function. It takes three arguments: a grid
of x values, a grid of y values, and a grid of z values. The x and y values represent positions on
the plot, and the z values will be represented by the contour levels. Perhaps the most
straightforward way to prepare such data is to use the np.meshgrid function, which builds
two-dimensional grids from one-dimensional arrays:
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
Notice that by default when a single color is used, negative values are represented by dashed
lines, and positive values by solid lines. Alternatively, the lines can be color-coded by
specifying a colormap with the cmap argument. Here, we'll also specify that we want more lines
to be drawn—20 equally spaced intervals within the data range:
Here we chose the RdGy (short for Red-Gray) colormap, which is a good choice for centered
data. Matplotlib has a wide range of colormaps available, which you can easily browse in
IPython by doing a tab completion on the plt.cm module:
plt.cm.<TAB>
Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can
change this by switching to a filled contour plot using the plt.contourf() function (notice
the f at the end), which uses largely the same syntax as plt.contour().
The colorbar makes it clear that the black regions are "peaks," while the red regions are
"valleys."
This plot has the potential to have problems due to the fact that it is quite "splotchy." In other
words, the color stages are discrete rather than continuous, which is not always preferable.
Setting the number of contours to a very high value might fix this, but it would produce a plot
that is highly expensive since Matplotlib would have to draw a new polygon for each
increment. To solve this problem, we may make advantage of plt.imshow(), a useful method
that converts a two-dimensional data grid into a picture.
The following code shows this:
plt.imshow() doesn't accept an x and y grid, so you must manually specify the extent
[xmin, xmax, ymin, ymax] of the image on the plot.
plt.imshow() by default follows the standard image array definition where the origin
is in the upper left, not in the lower left as in most contour plots. This must be changed
when showing gridded data.
plt.imshow() will automatically adjust the axis aspect ratio to match the input data;
this can be changed by setting, for example, plt.axis(aspect='image') to make x
and y units match.
Finally, it can sometimes be useful to combine contour plots and image plots. For example,
here we'll use a partially transparent background image (with transparency set via the alpha
parameter) and overplot contours with labels on the contours themselves (using the
plt.clabel() function):
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);
The hist() function has many options to tune both the calculation and the display; here's an
example of a more customized histogram:
The plt.hist docstring has more information on other customization options available. I find
this combination of histtype='stepfilled' along with some transparency alpha to be very
useful when comparing histograms of several distributions:
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
If you would like to simply compute the histogram (that is, count the number of points in a
given bin) and not display it, the np.histogram() function is available:
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 10000).T
Just as with plt.hist, plt.hist2d has a number of extra options to fine-tune the plot and the
binning, which are nicely outlined in the function docstring. Further, just as plt.hist has a
counterpart in np.histogram, plt.hist2d has a counterpart in np.histogram2d, which can
be used as follows:
counts, xedges, yedges = np.histogram2d(x, y, bins=30)
For the generalization of this histogram binning in dimensions higher than two, see the
np.histogramdd function.
The two-dimensional histogram creates a tesselation of squares across the axes. Another natural
shape for such a tesselation is the regular hexagon. For this purpose, Matplotlib provides the
plt.hexbin routine, which will represents a two-dimensional dataset binned within a grid of
hexagons:
plt.hexbin has a number of interesting options, including the ability to specify weights for
each point, and to change the output in each bin to any NumPy aggregate (mean of weights,
standard deviation of weights, etc.).
An illustration's plot legend explains the data visualization's significance by labeling the chart's
symbols. After learning the basics of legend creation, we'll go into Matplotlib's aesthetic and
positioning customization options.
The plt.legend() tool generates a minimal legend for all named plot objects with a single
argument:
But there are many ways we might want to customize such a legend. For example, we can
specify the location and turn off the frame:
We can use the ncol command to specify the number of columns in the legend:
For more information on available legend options, see the plt.legend docstring.
Notice that by default, the legend ignores all elements without a label attribute set.
import pandas as pd
cities = pd.read_csv('data/california_cities.csv')
Since the legend can now pick up on named plot objects thanks to our charting of empty lists,
we can utilize it to convey important information. A more complex graphic may be created
using this method.
Multiple Legends
Sometimes when designing a plot you'd like to add multiple legends to the same axes.
Unfortunately, Matplotlib does not make this easy: via the standard legend interface, it is only
possible to create a single legend for the entire plot. If you try to create a second legend using
plt.legend() or ax.legend(), it will simply override the first one. We can work around this
by creating a new legend artist from scratch, and then using the lower-level ax.add_artist()
method to manually add the second artist to the plot:
fig, ax = plt.subplots()
lines = []
styles = ['-', '--', '-.', ':']
x = np.linspace(0, 10, 1000)
for i in range(4):
lines += ax.plot(x, np.sin(x - i * np.pi / 2),
styles[i], color='black')
ax.axis('equal')
As we have seen several times throughout this section, the simplest colorbar can be created
with the plt.colorbar function:
plt.imshow(I)
plt.colorbar();
We'll now discuss a few ideas for customizing these colorbars and using them effectively in
various situations.
Customizing Colorbars
The colormap can be specified using the cmap argument to the plotting function that is creating
the visualization:
plt.imshow(I, cmap='gray');
All the available colormaps are in the plt.cm namespace; using IPython's tab-completion will
give you a full list of built-in possibilities:
plt.cm.<TAB>
But being able to choose a colormap is just the first step: more important is how to decide
among the possibilities! The choice turns out to be much more subtle than you might initially
expect.
A full treatment of color choice within visualization is beyond the scope of this book, but for
entertaining reading on this subject and others, see the article "Ten Simple Rules for Better
Figures". Matplotlib's online documentation also has an interesting discussion of colormap
choice.
Sequential colormaps: These are made up of one continuous sequence of colors (e.g.,
binary or viridis).
Divergent colormaps: These usually contain two distinct colors, which show positive
and negative deviations from a mean (e.g., RdBu or PuOr).
Qualitative colormaps: these mix colors with no particular sequence (e.g., rainbow or
jet).
The jet colormap, which was the default in Matplotlib prior to version 2.0, is an example of a
qualitative colormap. Its status as the default was quite unfortunate, because qualitative maps
are often a poor choice for representing quantitative data. Among the problems is the fact that
qualitative maps usually do not display any uniform progression in brightness as the scale
increases.
We can see this by converting the jet colorbar into black and white:
def grayscale_cmap(cmap):
"""Return a grayscale version of the given colormap"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
def view_colormap(cmap):
"""Plot a colormap with its grayscale equivalent"""
cmap = plt.cm.get_cmap(cmap)
colors = cmap(np.arange(cmap.N))
cmap = grayscale_cmap(cmap)
grayscale = cmap(np.arange(cmap.N))
Notice the bright stripes in the grayscale image. Even in full color, this uneven brightness
means that the eye will be drawn to certain portions of the color range, which will potentially
emphasize unimportant parts of the dataset. It's better to use a colormap such as viridis (the
default as of Matplotlib 2.0), which is specifically constructed to have an even brightness
variation across the range. Thus it not only plays well with our color perception, but also will
translate well to grayscale printing:
view_colormap('viridis')
If you favor rainbow schemes, another good option for continuous data is the cubehelix
colormap:
view_colormap('cubehelix')
For other situations, such as showing positive and negative deviations from some mean, dual-
color colorbars such as RdBu (Red-Blue) can be useful. However, as you can see in the
following figure, it's important to note that the positive-negative information will be lost upon
translation to grayscale!
view_colormap('RdBu')
Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply an
instance of plt.Axes, so all of the axes and tick formatting tricks we've learned are applicable.
The colorbar has some interesting flexibility: for example, we can narrow the color limits and
indicate the out-of-bounds values with a triangular arrow at the top and bottom by setting the
extend property. This might come in handy, for example, if displaying an image that is subject
to noise:
plt.figure(figsize=(10, 3.5))
plt.subplot(1, 2, 1)
plt.imshow(I, cmap='RdBu')
plt.colorbar()
plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);
Notice that in the left panel, the default color limits respond to the noisy pixels, and the range
of the noise completely washes-out the pattern we are interested in. In the right panel, we
manually set the color limits, and add extensions to indicate values which are above or below
those limits. The result is a much more useful visualization of our data.
Discrete Color Bars
Colormaps are by default continuous, but sometimes you'd like to represent discrete values.
The easiest way to do this is to use the plt.cm.get_cmap() function, and pass the name of a
suitable colormap along with the number of desired bins:
The discrete version of a colormap can be used just like any other colormap.