DSF - Unit IV Notes
DSF - Unit IV Notes
SYLLABUS
UNIT IV DATA VISUALIZATION
Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density and
contour plots – Histograms – legends – colors – subplots – text and annotation – customization – three
dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.
Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. It can be used to create a wide range of plots and charts, including line
plots, bar plots, histograms, scatter plots, and more. Here's a basic overview of using Matplotlib
for plotting:
Installing Matplotlib :
• You can install Matplotlib using pip:
pip install matplotlib
Importing Matplotlib:
• Import the matplotlib.pyplot module, which provides a MATLAB-like plotting interface.
import matplotlib.pyplot as plt
Creating a Simple Plot:
• Use the plot() function to create a simple line plot.
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y)
plt.show()
Saving Plots:
• Use savefig() to save your plot as an image file (e.g., PNG, PDF, SVG).
plt.savefig('plot.png')
Other Types of Plots:
Matplotlib supports many other types of plots, including bar plots, histograms, scatter plots,
and more.
plt.bar(x, y)
plt.hist(data, bins=10)
plt.scatter(x, y)
Matplotlib provides a wide range of customization options and is highly flexible, making it a
powerful tool for creating publication-quality plots and visualizations in Python.
Axes Limits
The most basic way to adjust axis limits is to use the plt.xlim() and plt.ylim()
methods
Example
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);
• The plt.axis() method allows you to set the x and y limits with a single call, by passing a
list that specifies [xmin, xmax, ymin, ymax]
plt.axis([-1, 11, -1.5, 1.5]);
• Aspect ratio equal is used to represent one unit in x is equal to one unit in y.
Labeling Plots
The labeling of plots includes titles, axis labels, and simple legends.
Title - plt.title()
Label - plt.xlabel()
plt.ylabel()
Legend - plt.legend()
Line style:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure() ax = plt.axes()
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');
OUTPUT:
Scatter plot with edge color, face color, size, and width of marker. (Scatter plot with line)
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 20)
y = np.sin(x)
plt.plot(x, y, '-o', color='gray', markersize=15, linewidth=4,
markerfacecolor='yellow', markeredgecolor='red',
markeredgewidth=4)
plt.ylim(-1.5, 1.5);
Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');
Continuous Errors
In some situations it is desirable to show errorbars on continuous quantities.
Though Matplotlib does not have a built-in convenience routine for this type of
application, it’s relatively easy to combine primitives like plt.plot and
plt.fill_between for a useful result.
Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-
Learn API. This is a method of fitting a very flexible nonparametric function to data
with a continuous measure of the uncertainty.
Visualizing errors in Matplotlib can be done using error bars or shaded regions to represent
uncertainty or variability in your data. Here are two common ways to visualize errors:
1. Error Bars:
Use the errorbar() function to plot data points with error bars
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
yerr = [0.5, 0.3, 0.7, 0.4, 0.8] # Error values
plt.errorbar(x, y, yerr=yerr, fmt='o', capsize=5)
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Error Bar Plot')
plt.show()
Shaded Regions:
Use the fill_between() function to plot shaded regions representing errors or uncertainties.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)
y = np.sin(x)
error = 0.1 # Error value
plt.plot(x, y)
plt.fill_between(x, y - error, y + error, alpha=0.2)
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Shaded Error Region')
plt.show()
These examples demonstrate how to visualize errors in your data using Matplotlib. You
can adjust the error values and plot styles to suit your specific needs and data.
Finally, it can sometimes be useful to combine contour plots and image plots. we’ll use a
partially transparent background image (with transparency set via the alpha parameter) and
over-plot contours with labels on the contours themselves (using the plt.clabel() function):
Density and contour plots are useful for visualizing the distribution and density of data
points in a 2D space. Matplotlib provides several functions to create these plots, such as
imshow() for density plots and contour() for contour plots. Here's how you can create them:
1. Density Plot (imshow):
Use the imshow() function to create a density plot. You can use a 2D histogram or a kernel
density estimation (KDE) to calculate the density.
import numpy as np
import matplotlib.pyplot as plt
# Generate random data
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
# Create density plot
plt.figure(figsize=(8, 6))
plt.hist2d(x, y, bins=30, cmap='Blues')
plt.colorbar(label='Density')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Density Plot')
plt.show()
HISTOGRAMS IN MATPLOTLIB:
Histogram is the simple plot to represent the large data set. A histogram is a graph
showing frequency distributions. It is a graph showing the number of observations within each
given interval.
1.Parameters:
plt.hist( ) is used to plot histogram. The hist() function will use an array of
numbers to create a histogram, the array is sent into the function as an argument.
bins - A histogram displays numerical data by grouping data into "bins" of equal
width. Each bin is plotted as a bar whose height corresponds to how many data
points are in that bin. Bins are also sometimes called "intervals", "classes", or
"buckets".
normed - Histogram normalization is a technique to distribute the frequencies of
the histogram over a wider range than the current range.
x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array
or a sequence of arrays which are not required to be of the same length.
histtype - {'bar', 'barstacked', 'step', 'stepfilled'}, optional The type of histogram to
draw.
'bar' is a traditional bar-type histogram. If multiple data are given the bars
are arranged side by side.
'barstacked' is a bar-type histogram where multiple data are stacked on top
of each other.
'step' generates a lineplot that is by default unfilled.
'stepfilled' generates a lineplot that is by default filled. Default is 'bar'
align - {'left', 'mid', 'right'}, optional Controls how the histogram is plotted.
'left': bars are centered on the left bin edges.
'mid': bars are centered between the bin edges.
'right': bars are centered on the right bin edges. Default is 'mid'
Default is None
label - str or None, optional. Default is None
Other parameter:
**kwargs - Patch properties, it allows us to pass a variable number of keyword arguments to a
python function. ** denotes this type of function.
Example
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);
The hist() function has many options to tune both the calculation and the display; here’s an
example of a more customized histogram.
The plt.hist docstring has more information on other customization options available. I find this
combination of histtype='stepfilled' along with some transparency alpha to be very useful when
comparing histograms of several distributions
Histograms are a useful way to visualize the distribution of a single numerical variable.
Matplotlib provides the hist() function to create histograms. Here's a basic example:
import numpy as np
import matplotlib.pyplot as plt
# Generate random data
data = np.random.normal(loc=0, scale=1, size=1000)
# Create a histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
# Add labels and title
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of Random Data')
# Display the plot
plt.show()
In this example, data is a NumPy array containing random data sampled from a normal
distribution. The hist() function creates a histogram with 30 bins, colored in sky blue with black
edges. The x-axis represents the values, and the y-axis represents the frequency of each value.
You can customize the appearance of the histogram by adjusting parameters such as bins, color,
edgecolor, and adding labels and a title to make the plot more informative.
LEGENDS IN MATPLOTLIB:
Plot legends give meaning to a visualization, assigning labels to the various plot elements.
We previously saw how to create a simple legend; here we’ll take a look at customizing the
placement and aesthetics of the legend in Matplotlib
Multiple legends
It is only possible to create a single legend for the entire plot. If you try to create a second
legend using plt.legend() or ax.legend(), it will simply override the first one. We can work around
this by creating a new legend artist from scratch, and then using the lower-level ax.add_artist()
method to manually add the second artist to the plot.
Example
import matplotlib.pyplot as plt
plt.style.use('classic')
import numpy as np
x = np.linspace(0, 10, 1000)
ax.legend(loc='lower center', frameon=True, shadow=True,borderpad=1,fancybox=True)
fig
COLORS IN MATPLOTLIB
In Matplotlib, a color bar is a separate axes that can provide a key for the meaning of
colors in a plot. For continuous labels based on the color of points, lines, or regions, a labeled
color bar can be a great tool.
The simplest colorbar can be created with the plt.colorbar() function.
In Matplotlib, you can specify colors in several ways, including using predefined color
names, RGB or RGBA tuples, hexadecimal color codes, and more. Here's how you can specify
colors in Matplotlib:
1. Predefined Color Names:
• Matplotlib provides a set of predefined color names, such as 'red', 'blue', 'green', etc.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], color='red') # Plot with red color
plt.show()
Discrete colorbars
Colormaps are by default continuous, but sometimes you’d like to represent discrete
values. The easiest way to do this is to use the plt.cm.get_cmap() function, and pass the name of a
suitable colormap along with the number of desired bins.
SUBPLOTS IN MATPLOTLIB
Matplotlib has the concept of subplots: groups of smaller axes that can exist
together within a single figure.
These subplots might be insets, grids of plots, or other more complicated layouts.
We’ll explore four routines for creating subplots in Matplotlib.
plt.axes: Subplots by Hand
plt.subplot: Simple Grids of Subplots
plt.subplots: The Whole Grid in One Go
plt.GridSpec: More Complicated Arrangements
Subplots in Matplotlib allow you to create multiple plots within the same figure. You can
arrange subplots in a grid-like structure and customize each subplot independently. Here's a
basic example of creating subplots:
import matplotlib.pyplot as plt
import numpy as np
# Data for plotting
x = np.linspace(0, 2*np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Create a figure and a grid of subplots
fig, axs = plt.subplots(2, 1, figsize=(8, 6))
# Plot data on the first subplot
axs[0].plot(x, y1, label='sin(x)', color='blue')
axs[0].set_title('Plot of sin(x)')
axs[0].legend()
# Plot data on the second subplot
axs[1].plot(x, y2, label='cos(x)', color='red')
axs[1].set_title('Plot of cos(x)')
axs[1].legend()
# Adjust layout and display the plot
plt.tight_layout()
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
plt.show()
In this example, plt.subplots(2, 1) creates a figure with 2 rows and 1 column of subplots.
The axs variable is a NumPy array containing the axes objects for each subplot. You can then use
these axes objects to plot data and customize each subplot independently. You can customize the
arrangement of subplots by changing the arguments to plt.subplots() (e.g., plt.subplots(2, 2) for a
2x2 grid) and by adjusting the layout using plt.tight_layout() to prevent overlapping subplots.
We now have two axes (the top with no tick labels) that are just touching: the
bottom of the upper panel (at position 0.5) matches the top of the lower panel (at
position 0.1+ 0.4).
If the axis value is changed in second plot both the plots are separated with each
other, example
ax2 = fig.add_axes([0.1, 0.01, 0.8, 0.4])
Note that by default, the text is aligned above and to the left of the specified coordinates; here the
“.” at the beginning of each string will approximately mark the given coordinate location.
The transData coordinates give the usual data coordinates associated with the x- and y-axis
labels. The transAxes coordinates give the location from the bottom-left corner of the axes (here
the white box) as a fraction of the axes size.
The transfigure coordinates are similar, but specify the position from the bottom left of the
figure (here the gray box) as a fraction of the figure size.
Notice now that if we change the axes limits, it is only the transData coordinates that will be
affected, while the others remain stationary.
CUSTOMIZATION IN MATPLOTLIB
Customization in Matplotlib allows you to control various aspects of your plots, such as
colors, line styles, markers, fonts, and more. Here are some common customization options:
1. Changing Figure Size:
• Use figsize in plt.subplots() or plt.figure() to set the size of the figure
fig, ax = plt.subplots(figsize=(8, 6))
Changing Line Color, Style, and Width:
• Use color, linestyle, and linewidth parameters in plot functions to customize the lines.
plt.plot(x, y, color='red', linestyle='--', linewidth=2)
Changing Marker Style and Size:
• Use marker, markersize, and markerfacecolor parameters to customize markers in scatter
plots.
plt.scatter(x, y, marker='o', s=100, c='blue')
Setting Axis Limits:
• Use xlim() and ylim() to set the limits of the x and y axes.
plt.xlim(0, 10)
plt.ylim(0, 20)
Matplotlib provides a toolkit called mplot3d for creating 3D plots. You can create 3D
scatter plots, surface plots, wireframe plots, and more.
Three-Dimensional Points and Lines
The most basic three-dimensional plot is a line or scatter plot created from sets of (x, y, z)
triples.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d ax = plt.axes(projection='3d')
# Data for a three-dimensional line
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')
# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');
plt.show()
Notice that by default, the scatter points have their transparency adjusted to give a sense
of depth on the page.
In analogy with the more common two-dimensional plots discussed earlier, we can create
these using the ax.plot3D and ax.scatter3D functions
Here's a basic example of creating a 3D scatter plot:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d
import Axes3D
import numpy as np
# Generate random data
x = np.random.normal(size=500)
y = np.random.normal(size=500)
z = np.random.normal(size=500)
# Create a 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c='b', marker='o')
# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Scatter Plot')
# Show plot
plt.show()
In this example, fig.add_subplot(111, projection='3d') creates a 3D subplot, and
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
ax.scatter(x, y, z, c='b', marker='o') creates a scatter plot in 3D space. You can customize the
appearance of the plot by changing parameters such as c (color), marker, and adding labels and a
title.
You can also create surface plots and wireframe plots using the plot_surface() and
plot_wireframe() functions, respectively. Here's an example of a 3D surface plot:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d
import Axes3D
import numpy as np
# Generate data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
# Create a 3D surface plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Surface Plot')
# Show plot
plt.show()
These examples demonstrate how to create basic 3D plots in Matplotlib. You can explore
the mplot3d toolkit and its functions to create more advanced 3D visualizations.
Three-Dimensional Contour Plots
mplot3d contains tools to create three-dimensional relief plots using the same
inputs.
Like two-dimensional ax.contour plots, ax.contour3D requires all the input data to
be in the form of two- dimensional regular grids, with the Z data evaluated at each
point.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d def f(x, y):
return np.sin(np.sqrt(x ** 2 + y ** 2))
x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
Two other types of three-dimensional plots that work on gridded data are
wireframes and surface plots.
These take a grid of values and project it onto the specified threedimensional
surface, and can make the resulting three-dimensional forms quite easy to
visualize.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='black')
ax.set_title('wireframe');
plt.show()
Surface Triangulations
For some applications, the evenly sampled grids required by the preceding
routines are overly restrictive and inconvenient.
In these situations, the triangulation-based plots can be very useful.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d
theta = 2 * np.pi * np.random.random(1000) r = 6 * np.random.random(1000)
x = np.ravel(r * np.sin(theta))
y = np.ravel(r * np.cos(theta))
z = f(x, y)
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, c=z, cmap='viridis', linewidth=0.5)
Basemap is a toolkit for Matplotlib that allows you to create maps and plot geographic
data. It provides various map projections and features for customizing maps. Here's a basic
example of plotting geographic data using Basemap:
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
# Create a map
plt.figure(figsize=(10, 6))
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\ llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.drawcountries()
m.fillcontinents(color='lightgray',lake_color='aqua')
m.drawmapboundary(fill_color='aqua')
# Plot cities
lons = [-77.0369, -122.4194, 120.9660, -0.1276]
lats = [38.9072, 37.7749, 14.5995, 51.5074]
cities = ['Washington, D.C.', 'San Francisco', 'Manila', 'London']
x, y = m(lons, lats)
m.scatter(x, y, marker='o', color='r')
# Add city labels
for city, xpt, ypt in zip(cities, x, y):
plt.text(xpt+50000, ypt+50000, city, fontsize=10, color='blue')
# Add a title
plt.title('Cities Around the World')
# Show the map
plt.show()
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
In this example, we first create a Basemap instance with the desired projection and map extent.
We then draw coastlines, countries, continents, and a map boundary. Next, we plot cities on the
map using the scatter() method and add labels for each city using plt.text(). Finally, we add a title
to the plot and display the map. Basemap offers a wide range of features for working with
geographic data, including support for various map projections, drawing political boundaries,
and plotting points, lines, and shapes on maps. You can explore the Basemap documentation for
more advanced features and customization options.
Map Projections
The Basemap package implements several dozen such projections, all referenced by a
short format code. Here we’ll briefly demonstrate some of the more common ones.
Cylindrical projections
Pseudo-cylindrical projections
Perspective projections
Conic projections
Cylindrical projection
The simplest of map projections are cylindrical projections, in which lines of
constant latitude and longitude are mapped to horizontal and vertical lines,
respectively.
This type of mapping represents equatorial regions quite well, but results in
extreme distortions near the poles.
The spacing of latitude lines varies between different cylindrical projections,
leading to different conservation properties, and different distortion near the
poles.
Other cylindrical projections are the Mercator (projection='merc') and the
cylindrical equal-area (projection='cea') projections.
The additional arguments to Basemap for this view specify the latitude (lat) and
longitude (lon) of the lower-left corner (llcrnr) and upper-right corner (urcrnr) for
the desired map, in units of degrees.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180,
urcrnrlon=180, ) draw_map(m)
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)
draw_map(m)
Perspective projections
Perspective projections are constructed using a particular choice of perspective
point, similar to if you photographed the Earth from a particular point in space (a
point which, for some projections, technically lies within the Earth!).
One common example is the orthographic projection (projection='ortho'), which
shows one side of the globe as seen from a viewer at a very long distance.
Thus, it can show only half the globe at a time.
Other perspective-based projections include the gnomonic projection
(projection='gnom') and stereographic projection (projection='stere').
These are often the most useful for showing small portions of the map.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)
draw_map(m);
Political boundaries
drawcountries() - Draw country boundaries drawstates() - Draw US state boundaries
drawcounties() - Draw US county boundaries
Map features
drawgreatcircle() - Draw a great circle between two points drawparallels() - Draw lines of
constant latitude drawmeridians() - Draw lines of constant longitude drawmapscale() - Draw a
linear scale on the map
Whole-globe images
bluemarble() - Project NASA’s blue marble image onto the map shadedrelief() - Project a shaded
relief image onto the map etopo() - Draw an etopo relief image onto the map warpimage() -
Project a user-provided image onto the map
Relational Plots:
• Seaborn provides functions for visualizing relationships between variables, such as
sns.relplot(), sns.scatterplot(), and sns.lineplot().
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter')
Heatmaps:
• Seaborn can create heatmaps to visualize matrix-like data using sns.heatmap().
flights = sns.load_dataset('flights').pivot('month', 'year', 'passengers')
sns.heatmap(flights, annot=True, fmt='d')
• Pairplots are useful for visualizing pairwise relationships in a dataset using sns.pairplot().
sns.pairplot(tips, hue='sex')
1. Styling and Themes:
• Seaborn allows you to customize the appearance of plots using styling functions
(sns.set(), sns.set_style(), sns.set_context()) and themes (sns.set_theme()).
2. Other Plots:
• Seaborn offers many other types of plots and customization options. The official Seaborn
documentation provides detailed examples and explanations for each type of plot.
Seaborn is built on top of Matplotlib and integrates well with Pandas, making it a powerful tool
for visualizing data in Python.
Faceted histograms
Sometimes the best way to view data is via histograms of subsets. Seaborn’s
FacetGrid makes this extremely simple.
We’ll take a look at some data that shows the amount that restaurant staff receive
in tips based on various indicator data
Joint distributions
Similar to the pair plot we saw earlier, we can use sns.jointplot to show the joint distribution
between different datasets, along with the associated marginal distributions.
Bar plots
Time series can be plotted with sns.factorplot.