0% found this document useful (0 votes)
42 views

DSF - Unit IV Notes

Data science fundamentals notes Anna University

Uploaded by

Rockerz Rick
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

DSF - Unit IV Notes

Data science fundamentals notes Anna University

Uploaded by

Rockerz Rick
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

P.S.N.A.

COLLEGE OF ENGINEERING & TECHNOLOGY


(An Autonomous Institution affiliated to Anna University, Chennai)
Kothandaraman Nagar, Muthanampatti (PO), Dindigul – 624 622.
Phone: 0451-2554032, 2554349 Web Link: www.psnacet.org
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Subject Code / Name : OCS353 / DATA SCIENCE FUNDAMENTALS
Year / Semester : IV/ VII ‘A’

SYLLABUS
UNIT IV DATA VISUALIZATION
Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density and
contour plots – Histograms – legends – colors – subplots – text and annotation – customization – three
dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.

Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. It can be used to create a wide range of plots and charts, including line
plots, bar plots, histograms, scatter plots, and more. Here's a basic overview of using Matplotlib
for plotting:
Installing Matplotlib :
• You can install Matplotlib using pip:
pip install matplotlib

Importing Matplotlib:
• Import the matplotlib.pyplot module, which provides a MATLAB-like plotting interface.
import matplotlib.pyplot as plt
Creating a Simple Plot:
• Use the plot() function to create a simple line plot.
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y)
plt.show()

Adding Labels and Title :


• Use xlabel(), ylabel(), and title() to add labels and a title to your plot.
plt.xlabel('x-axis label')
plt.ylabel('y-axis label')
plt.title('Title')

Customizing Plot Appearance :


• Use various formatting options to customize the appearance of your plot.
plt.plot(x, y, color='red', linestyle='--', marker='o', label='data') plt.legend()
Creating Multiple Plots :
• Use subplot() to create multiple plots in the same figure.
plt.subplot(2, 1, 1)

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


plt.plot(x, y)
plt.subplot(2, 1, 2)
plt.scatter(x, y)

Creating Multiple Plots:


• Use subplot() to create multiple plots in the same figure.
plt.subplot(2, 1, 1)
plt.plot(x, y)
plt.subplot(2, 1, 2)
plt.scatter(x, y)

Saving Plots:
• Use savefig() to save your plot as an image file (e.g., PNG, PDF, SVG).
plt.savefig('plot.png')
Other Types of Plots:
Matplotlib supports many other types of plots, including bar plots, histograms, scatter plots,
and more.
plt.bar(x, y)
plt.hist(data, bins=10)
plt.scatter(x, y)
Matplotlib provides a wide range of customization options and is highly flexible, making it a
powerful tool for creating publication-quality plots and visualizations in Python.

SIMPLE LINE PLOTS IN MATPLOTLIB


Creating a simple line plot in Matplotlib involves specifying the x-axis and y-axis values
and then using the plot() function to create the plot.
The simplest of all plots is the visualization of a single function y = f x. Here we will take a
first look at creating a simple plot of this type. The figure (an instance of the class plt.Figure) can
be thought of as a single container that contains all the objects representing axes, graphics, text,
and labels. The axes (an instance of the class plt.Axes) is what we see above: a bounding box with
ticks and labels, which will eventually contain the plot elements that make up our visualization.
Here's a basic example:
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Create a simple line plot
plt.plot(x, y)
# Add labels and title
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Simple Line Plot')
# Display the plot
plt.show()

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


This code will create a simple line plot with the given x and y values, and display it with labeled
axes and a title. You can customize the appearance of the plot further by using additional
arguments in the plot() function, such as color, linestyle, and marker.
Line Colors and Styles
 The first adjustment you might wish to make to a plot is to control the line colors and
styles.
 To adjust the color, you can use the color keyword, which accepts a string argument
representing virtually any imaginable color. The color can be specified in a variety of
ways
 If no color is specified, Matplotlib will automatically cycle through a set of default
colors for multiple lines
Different forms of color representation.
specify color by name - color='blue'
short color code (rgbcmyk) - color='g'
Grayscale between 0 and 1 - color='0.75'
Hex code (RRGGBB from 00 to FF) - color='#FFDD44' RGB tuple, values 0 and 1 -
color=(1.0,0.2,0.3) all HTML color
names supported - color='chartreuse'

• We can adjust the line style using the linestyle keyword.


Different line styles
linestyle='solid' linestyle='dashed' linestyle='dashdot' linestyle='dotted'
Short assignment
linestyle='-' # solid
linestyle='--' # dashed
linestyle='-.' # dashdot
linestyle=':' # dotted
• linestyle and color codes can be combined into a single nonkeyword argument to the
plt.plot() function plt.plot(x, x + 0, '-g')
# solid green
plt.plot(x, x + 1, '--c')
# dashed cyan plt.plot(x, x + 2, '-.k')
# dashdot black plt.plot(x, x + 3, ':r');
# dotted red

Axes Limits
 The most basic way to adjust axis limits is to use the plt.xlim() and plt.ylim()
methods
Example
plt.xlim(10, 0)
plt.ylim(1.2, -1.2);
• The plt.axis() method allows you to set the x and y limits with a single call, by passing a
list that specifies [xmin, xmax, ymin, ymax]
plt.axis([-1, 11, -1.5, 1.5]);
• Aspect ratio equal is used to represent one unit in x is equal to one unit in y.

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


plt.axis('equal')

Labeling Plots
The labeling of plots includes titles, axis labels, and simple legends.
Title - plt.title()
Label - plt.xlabel()
plt.ylabel()
Legend - plt.legend()

Example programs Line color:


import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure() ax = plt.axes()
x = np.linspace(0, 10, 1000) ax.plot(x, np.sin(x));
plt.plot(x, np.sin(x - 0), color='blue') # specify color by name
plt.plot(x, np.sin(x - 1), color='g') # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44') # Hex code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 and 1
plt.plot(x, np.sin(x - 5), color='chartreuse');# all HTML color names supported.
OUTPUT:

Line style:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure() ax = plt.axes()
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, linestyle='solid')
plt.plot(x, x + 1, linestyle='dashed')
plt.plot(x, x + 2, linestyle='dashdot')
plt.plot(x, x + 3, linestyle='dotted');

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


# For short, you can use the following codes:
plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.')# dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted

OUTPUT:

Axis limit with label and legend:


import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure() ax = plt.axes()
x = np.linspace(0, 10, 1000)
plt.xlim(-1, 11)
plt.ylim(-1.5, 1.5);
plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.plot(x, np.cos(x), ':b', label='cos(x)')
plt.title("A Sine Curve")
plt.xlabel("x") plt.ylabel("sin(x)");
plt.legend();
OUTPUT:

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


SIMPLE SCATTER PLOTS IN MATPLOTLIB
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot.
Instead of points being joined by line segments, here the points are represented individually
with a dot, circle, or other shape.
Creating a simple scatter plot in Matplotlib involves specifying the x-axis and y-axis
values and then using the scatter() function to create the plot. Here's a basic example:
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Create a simple scatter plot
plt.scatter(x, y)
# Add labels and title
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Simple Scatter Plot')
# Display the plot
plt.show()
This code will create a simple scatter plot with the given x and y values, and display it with
labeled axes and a title. You can customize the appearance of the plot further by using additional
arguments in the scatter() function, such as color, s (size of markers), and alpha (transparency).
Syntax
plt.plot(x, y, 'type of symbol ', color);
Example
plt.plot(x, y, 'o', color='black');
The third argument in the function call is a character that represents the type of symbol
used for the plotting. Just as you can specify options such as '-' and '--' to control the line style,
the marker style has its own set of short string codes.
Example
 Various symbols used to specify ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']
 Short hand assignment of line, symbol and color also allowed. plt.plot(x, y, '-ok');
 Additional arguments in plt.plot()
We can specify some other parameters related with scatter plot which makes it more attractive.
They are color, marker size, linewidth, marker face color, marker edge color, marker edge width,
etc
Example
plt.plot(x, y, '-p', color='gray', markersize=15, linewidth=4, markerfacecolor='white',
markeredgecolor='gray', markeredgewidth=2)
plt.ylim(-1.2, 1.2);

Scatter Plots with plt.scatter


 A second, more powerful method of creating scatter plots is the plt.scatter function,
which can be used very similarly to the plt.plot function
plt.scatter(x, y, marker='o');

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


 The primary difference of plt.scatter from plt.plot is that it can be used to create
scatter plots where the properties of each individual point (size, face color, edge
color, etc.) can be individually controlled or mapped to data.
 Notice that the color argument is automatically mapped to a color scale (shown
here by the colorbar() command), and the size argument is given in pixels.
 Cmap – color map used in scatter plot gives different color combinations.

Perceptually Uniform Sequential


['viridis', 'plasma', 'inferno', 'magma']
Sequential
['Greys','Purples','Blues','Greens','Oranges','Reds','YlOrBr','YlOrRd',
'OrRd','PuRd','RdPu','BuPu','GnBu','PuBu','YlGnBu','PuBuGn','BuGn', 'YlGn']
Sequential (2)
['binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink', 'spring', 'summer',
'autumn','winter','cool','Wistia','hot','afmhot','gist_heat','copper']
Diverging
['PiYG','PRGn','BrBG','PuOr','RdGy','RdBu','RdYlBu','RdYlGn','Spectral', 'coolwarm', 'bwr',
'seismic']
Qualitative
['Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'Set1', 'Set2', 'Set3', 'tab10', 'tab20', 'tab20b',
'tab20c']
Miscellaneous
['flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern', 'gnuplot',
'gnuplot2', 'CMRmap', 'cubehelix', 'brg', 'hsv', 'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral',
'gist_ncar']

Example programs. Simple scatter plot.


import numpy as np import matplotlib.pyplot as
plt x = np.linspace(0, 10, 30) y = np.sin(x)
plt.plot(x, y, 'o', color='black');

Scatter plot with edge color, face color, size, and width of marker. (Scatter plot with line)
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 20)
y = np.sin(x)
plt.plot(x, y, '-o', color='gray', markersize=15, linewidth=4,
markerfacecolor='yellow', markeredgecolor='red',
markeredgewidth=4)
plt.ylim(-1.5, 1.5);

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Scatter plot with random colors, size and transparency
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100) colors = rng.rand(100)
sizes = 1000 * rng.rand(100)
plt.scatter(x, y, c=colors, s=sizes, alpha=0.3, map='viridis')
plt.colorbar()

VISUALIZING ERRORS IN MATPLOTLIB


For any scientific measurement, accurate accounting for errors is nearly as important, if
not more important, than accurate reporting of the number itself. For example, imagine that I am
using some astrophysical observations to estimate the Hubble Constant, the local measurement
of the expansion rate of the Universe. In visualization of data and results, showing these errors
effectively can make a plot convey much more complete information.
Types of errors
 Basic Errorbars
 Continuous Errors

Basic Errorbars
A basic errorbar can be created with a single Matplotlib function call.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)
plt.errorbar(x, y, yerr=dy, fmt='.k');

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


 Here the fmt is a format code controlling the appearance of lines and points, and
has the same syntax as the shorthand used in plt.plot()
 In addition to these basic options, the errorbar function has many options to fine
tune the outputs. Using these additional options you can easily customize the
aesthetics of your errorbar plot.

plt.errorbar(x, y, yerr=dy, fmt='o', color='black',ecolor='lightgray', elinewidth=3, capsize=0);

Continuous Errors
 In some situations it is desirable to show errorbars on continuous quantities.
Though Matplotlib does not have a built-in convenience routine for this type of
application, it’s relatively easy to combine primitives like plt.plot and
plt.fill_between for a useful result.
 Here we’ll perform a simple Gaussian process regression (GPR), using the Scikit-
Learn API. This is a method of fitting a very flexible nonparametric function to data
with a continuous measure of the uncertainty.

Visualizing errors in Matplotlib can be done using error bars or shaded regions to represent
uncertainty or variability in your data. Here are two common ways to visualize errors:
1. Error Bars:
Use the errorbar() function to plot data points with error bars
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
yerr = [0.5, 0.3, 0.7, 0.4, 0.8] # Error values
plt.errorbar(x, y, yerr=yerr, fmt='o', capsize=5)
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Error Bar Plot')
plt.show()

Shaded Regions:
Use the fill_between() function to plot shaded regions representing errors or uncertainties.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)
y = np.sin(x)
error = 0.1 # Error value
plt.plot(x, y)
plt.fill_between(x, y - error, y + error, alpha=0.2)
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.title('Shaded Error Region')
plt.show()
These examples demonstrate how to visualize errors in your data using Matplotlib. You
can adjust the error values and plot styles to suit your specific needs and data.

DENSITY AND CONTOUR PLOTS IN MATPLOTLIB


To display three-dimensional data in two dimensions using contours or color-coded
regions. There are three Matplotlib functions that can be helpful for this task:
 plt.contour for contour plots,
 plt.contourf for filled contour plots, and
 plt.imshow for showing images.
Visualizing a Three-Dimensional Function
A contour plot can be created with the plt.contour function. It takes three arguments:
 a grid of x values,
 a grid of y values, and
 a grid of z values.
The x and y values represent positions on the plot,
and the z values will be represented by the contour levels.
The way to prepare such data is to use the np.meshgrid function, which builds two-dimensional
grids from one- dimensional arrays:

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Example
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
 Notice that by default when a single color is used, negative values are represented
by dashed lines, and positive values by solid lines.
 Alternatively, you can color-code the lines by specifying a colormap with the cmap
argument.
 We’ll also specify that we want more lines to be drawn—20 equally spaced
intervals within the data range.
plt.contour(X, Y, Z, 20, cmap='RdGy');
 One potential issue with this plot is that it is a bit “splotchy.” That is, the color steps
are discrete rather
than continuous, which is not always what is desired.
 You could remedy this by setting the number of contours to a very high number,
but this results in a rather inefficient plot: Matplotlib must render a new polygon
for each step in the level.
 A better way to handle this is to use the plt.imshow() function, which interprets a
two-dimensional grid of data as an image.
There are a few potential gotchas with imshow().
• plt.imshow() doesn’t accept an x and y grid, so you must manually specify the extent
[xmin, xmax, ymin, ymax] of the image on the plot.
• plt.imshow() by default follows the standard image array definition where the origin is in
the upper left, not in the lower left as in most contour plots. This must be changed when showing
gridded data.
• plt.imshow() will automatically adjust the axis aspect ratio to match the input data; you
can change this by setting, for example, plt.axis(aspect='image') to make x and y units match.

Finally, it can sometimes be useful to combine contour plots and image plots. we’ll use a
partially transparent background image (with transparency set via the alpha parameter) and
over-plot contours with labels on the contours themselves (using the plt.clabel() function):

contours = plt.contour(X, Y, Z, 3, colors='black')


plt.clabel(contours, inline=True, fontsize=8)
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdGy', alpha=0.5)
plt.colorbar();

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Example Program
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.imshow(Z, extent=[0, 10, 0, 10], origin='lower', cmap='RdGy')
plt.colorbar()

Density and contour plots are useful for visualizing the distribution and density of data
points in a 2D space. Matplotlib provides several functions to create these plots, such as
imshow() for density plots and contour() for contour plots. Here's how you can create them:
1. Density Plot (imshow):
Use the imshow() function to create a density plot. You can use a 2D histogram or a kernel
density estimation (KDE) to calculate the density.
import numpy as np
import matplotlib.pyplot as plt
# Generate random data
x = np.random.normal(size=1000)
y = np.random.normal(size=1000)
# Create density plot
plt.figure(figsize=(8, 6))
plt.hist2d(x, y, bins=30, cmap='Blues')
plt.colorbar(label='Density')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Density Plot')
plt.show()

2.Contour Plot (contour):


Use the contour() function to create a contour plot. You can specify the number of
contour levels and the colormap.
import numpy as np
import matplotlib.pyplot as plt
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
# Generate random data
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(X**2 + Y**2)
# Create contour plot
plt.figure(figsize=(8, 6))
plt.contour(X, Y, Z, levels=20, cmap='RdGy')
plt.colorbar(label='Intensity')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Contour Plot')
plt.show()
These examples demonstrate how to create density and contour plots in Matplotlib. You
can customize the plots by adjusting parameters such as the number of bins, colormap, and
contour levels to better visualize your data.

HISTOGRAMS IN MATPLOTLIB:
Histogram is the simple plot to represent the large data set. A histogram is a graph
showing frequency distributions. It is a graph showing the number of observations within each
given interval.
1.Parameters:
 plt.hist( ) is used to plot histogram. The hist() function will use an array of
numbers to create a histogram, the array is sent into the function as an argument.
 bins - A histogram displays numerical data by grouping data into "bins" of equal
width. Each bin is plotted as a bar whose height corresponds to how many data
points are in that bin. Bins are also sometimes called "intervals", "classes", or
"buckets".
 normed - Histogram normalization is a technique to distribute the frequencies of
the histogram over a wider range than the current range.
 x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array
or a sequence of arrays which are not required to be of the same length.
 histtype - {'bar', 'barstacked', 'step', 'stepfilled'}, optional The type of histogram to
draw.
 'bar' is a traditional bar-type histogram. If multiple data are given the bars
are arranged side by side.
 'barstacked' is a bar-type histogram where multiple data are stacked on top
of each other.
 'step' generates a lineplot that is by default unfilled.
 'stepfilled' generates a lineplot that is by default filled. Default is 'bar'

 align - {'left', 'mid', 'right'}, optional Controls how the histogram is plotted.
 'left': bars are centered on the left bin edges.
 'mid': bars are centered between the bin edges.
 'right': bars are centered on the right bin edges. Default is 'mid'

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


 orientation - {'horizontal', 'vertical'}, optional
If 'horizontal', barh will be used for bar-type histograms and the bottom kwarg will be the left
edges.
 color - color or array_like of colors or None, optional
Color spec or sequence of color specs, one per dataset. Default (None) uses the standard line
color sequence.

Default is None
 label - str or None, optional. Default is None

Other parameter:
**kwargs - Patch properties, it allows us to pass a variable number of keyword arguments to a
python function. ** denotes this type of function.

Example
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
plt.hist(data);

The hist() function has many options to tune both the calculation and the display; here’s an
example of a more customized histogram.

plt.hist(data, bins=30, alpha=0.5,histtype='stepfilled', color='steelblue',edgecolor='none');

The plt.hist docstring has more information on other customization options available. I find this
combination of histtype='stepfilled' along with some transparency alpha to be very useful when
comparing histograms of several distributions

x1 = np.random.normal(0, 0.8, 1000)


x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)
kwargs = dict(histtype='stepfilled', alpha=0.3, bins=40)
plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
OUTPUT:

Histograms are a useful way to visualize the distribution of a single numerical variable.
Matplotlib provides the hist() function to create histograms. Here's a basic example:
import numpy as np
import matplotlib.pyplot as plt
# Generate random data
data = np.random.normal(loc=0, scale=1, size=1000)
# Create a histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
# Add labels and title
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of Random Data')
# Display the plot
plt.show()

In this example, data is a NumPy array containing random data sampled from a normal
distribution. The hist() function creates a histogram with 30 bins, colored in sky blue with black
edges. The x-axis represents the values, and the y-axis represents the frequency of each value.
You can customize the appearance of the histogram by adjusting parameters such as bins, color,
edgecolor, and adding labels and a title to make the plot more informative.

Two-Dimensional Histograms and Binnings


 We can create histograms in two dimensions by dividing points among two
dimensional bins.
 We would define x and y values. Here for example We’ll start by defining some
data—an x and y array drawn from a multivariate Gaussian distribution:
 Simple way to plot a two-dimensional histogram is to use Matplotlib’s plt.hist2d()
function
Example
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 1000).T
plt.hist2d(x, y, bins=30, cmap='Blues')
cb = plt.colorbar() cb.set_label('counts in bin')

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


OUTPUT:

LEGENDS IN MATPLOTLIB:
Plot legends give meaning to a visualization, assigning labels to the various plot elements.
We previously saw how to create a simple legend; here we’ll take a look at customizing the
placement and aesthetics of the legend in Matplotlib

plt.plot(x, np.sin(x), '-b', label='Sine')


plt.plot(x, np.cos(x), '--r', label='Cosine') plt.legend();
Legends in Matplotlib are used to identify different elements of a plot, such as lines,
markers, or
colors, and associate them with labels.
Here's how you can add legends to your plots:
1. Basic Legend:
• Use the legend() function to add a legend to your plot. You can specify the labels for each
element in the legend.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [1, 2, 3, 4, 5]
y2 = [5, 4, 3, 2, 1]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()

CUSTOMIZING PLOT LEGENDS


Location and turn off the frame
We can specify the location and turn off the frame. By the parameter loc and framon.
ax.legend(loc='upper left', frameon=False)
fig
Number of columns
We can use the ncol command to specify the number of columns in the legend.
ax.legend(frameon=False, loc='lower center', ncol=2)
fig
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
Rounded box, shadow and frame transparency
We can use a rounded box (fancybox) or add a shadow, change the transparency (alpha value) of
the frame, or change the padding around the text.
ax.legend(fancybox=True, framealpha=1, shadow=True, borderpad=1)
fig
Customizing Legend Location:
• You can specify the location of the legend using the loc parameter. Common location
values are 'upper left', 'upper right', 'lower left', 'lower right'.
plt.legend(loc='upper left')
Adding Legend Title:
• You can add a title to the legend using the title parameter.
plt.legend(title='Legend Title')
Customizing Legend Labels:
• You can customize the labels in the legend by passing a list of labels to the labels
parameter.
plt.legend(labels=['Label 1', 'Label 2'])
Adding Legend to Specific Elements:
• You can add legends to specific plot elements by passing the label parameter to the plot
functions.
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
Multiple Legends:
• You can create multiple legends by calling the legend() function multiple times with different
labels.
plt.plot(x, y1)
plt.plot(x, y2)
plt.legend(['Line 1', 'Line 2'], loc='upper left')
plt.legend(['Line 3', 'Line 4'], loc='lower right')

Choosing Elements for the Legend


 The legend includes all labeled elements by default. We can change which elements
and labels appear in the legend by using the objects returned by plot commands.
 The plt.plot() command is able to create multiple lines at once, and returns a list of
created line instances.
 Passing any of these to plt.legend() will tell it which to identify, along with the
labels we’d like to specify.
y = np.sin(x[:, np.newaxis] + np.pi * np.arange(0, 2, 0.5))
lines = plt.plot(x, y) plt.legend(lines[:2],['first','second']); # Applying label individually.
plt.plot(x, y[:, 0], label='first')
plt.plot(x, y[:, 1], label='second')
plt.plot(x, y[:, 2:]) plt.legend(framealpha=1, frameon=True);

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


1. Removing Legend:
• You can remove the legend from your plot by calling plt.legend().remove() or
plt.gca().legend().remove().
These are some common ways to add and customize legends in Matplotlib. Legends are useful
for explaining the components of your plot and making it easier for viewers to understand the
data.

Multiple legends
It is only possible to create a single legend for the entire plot. If you try to create a second
legend using plt.legend() or ax.legend(), it will simply override the first one. We can work around
this by creating a new legend artist from scratch, and then using the lower-level ax.add_artist()
method to manually add the second artist to the plot.
Example
import matplotlib.pyplot as plt
plt.style.use('classic')
import numpy as np
x = np.linspace(0, 10, 1000)
ax.legend(loc='lower center', frameon=True, shadow=True,borderpad=1,fancybox=True)
fig

COLORS IN MATPLOTLIB
In Matplotlib, a color bar is a separate axes that can provide a key for the meaning of
colors in a plot. For continuous labels based on the color of points, lines, or regions, a labeled
color bar can be a great tool.
The simplest colorbar can be created with the plt.colorbar() function.
In Matplotlib, you can specify colors in several ways, including using predefined color
names, RGB or RGBA tuples, hexadecimal color codes, and more. Here's how you can specify
colors in Matplotlib:
1. Predefined Color Names:
• Matplotlib provides a set of predefined color names, such as 'red', 'blue', 'green', etc.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], color='red') # Plot with red color
plt.show()

RGB or RGBA Tuples:


• You can specify colors using RGB or RGBA tuples, where each value ranges from 0 to 1.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], color=(0.1, 0.2, 0.5)) # Plot with RGB color
plt.show()
Hexadecimal Color Codes:
• You can also specify colors using hexadecimal color codes.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], color='#FF5733') # Plot with hexadecimal color
plt.show()
Short Color Codes:
• Matplotlib also supports short color codes, such as 'r' for red, 'b' for blue, 'g' for green, etc.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], color='g') # Plot with green color

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


plt.show()
Color Maps:
• You can use color maps (colormaps) to automatically assign colors based on a range of values.
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.scatter(x, y, c=x, cmap='viridis') # Scatter plot with colormap
plt.colorbar() # Add colorbar to show the mapping
plt.show()
These are some common ways to specify colors in Matplotlib. Using colors effectively can
enhance the readability and visual appeal of your plots.

Customizing Colorbars Choosing color map.


We can specify the colormap using the cmap argument to the plotting function that is creating
the visualization. Broadly, we can know three different categories of colormaps:
• Sequential colormaps - These consist of one continuous sequence of colors (e.g., binary
or viridis).
• Divergent colormaps - These usually contain two distinct colors, which show positive
and negative deviations from a mean (e.g., RdBu or PuOr).
• Qualitative colormaps - These mix colors with no particular sequence (e.g., rainbow or
jet).
Color limits and extensions
• Matplotlib allows for a large range of colorbar customization. The colorbar itself is simply
an instance of plt.Axes, so all of the axes and tick formatting tricks we’ve learned are applicable.
• We can narrow the color limits and indicate the out-of-bounds values with a triangular
arrow at the top and bottom by setting the extend property.
plt.subplot(1, 2, 2)
plt.imshow(I, cmap='RdBu')
plt.colorbar(extend='both')
plt.clim(-1, 1);
OUTPUT:

Discrete colorbars
Colormaps are by default continuous, but sometimes you’d like to represent discrete
values. The easiest way to do this is to use the plt.cm.get_cmap() function, and pass the name of a
suitable colormap along with the number of desired bins.

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


plt.imshow(I, cmap=plt.cm.get_cmap('Blues', 6))
plt.colorbar()
plt.clim(-1, 1);

SUBPLOTS IN MATPLOTLIB
 Matplotlib has the concept of subplots: groups of smaller axes that can exist
together within a single figure.
 These subplots might be insets, grids of plots, or other more complicated layouts.
 We’ll explore four routines for creating subplots in Matplotlib.
 plt.axes: Subplots by Hand
 plt.subplot: Simple Grids of Subplots
 plt.subplots: The Whole Grid in One Go
 plt.GridSpec: More Complicated Arrangements
Subplots in Matplotlib allow you to create multiple plots within the same figure. You can
arrange subplots in a grid-like structure and customize each subplot independently. Here's a
basic example of creating subplots:
import matplotlib.pyplot as plt
import numpy as np
# Data for plotting
x = np.linspace(0, 2*np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Create a figure and a grid of subplots
fig, axs = plt.subplots(2, 1, figsize=(8, 6))
# Plot data on the first subplot
axs[0].plot(x, y1, label='sin(x)', color='blue')
axs[0].set_title('Plot of sin(x)')
axs[0].legend()
# Plot data on the second subplot
axs[1].plot(x, y2, label='cos(x)', color='red')
axs[1].set_title('Plot of cos(x)')
axs[1].legend()
# Adjust layout and display the plot
plt.tight_layout()
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
plt.show()
In this example, plt.subplots(2, 1) creates a figure with 2 rows and 1 column of subplots.
The axs variable is a NumPy array containing the axes objects for each subplot. You can then use
these axes objects to plot data and customize each subplot independently. You can customize the
arrangement of subplots by changing the arguments to plt.subplots() (e.g., plt.subplots(2, 2) for a
2x2 grid) and by adjusting the layout using plt.tight_layout() to prevent overlapping subplots.

plt.axes: Subplots by Hand


 The most basic method of creating an axes is to use the plt.axes function. As we’ve
seen previously, by default this creates a standard axes object that fills the entire
figure.
 plt.axes also takes an optional argument that is a list of four numbers in the figure
coordinate system.
 These numbers represent [bottom, left, width,height] in the figure coordinate
system, which ranges from 0 at the bottom left of the figure to 1 at the top right of
the figure.
For example,
we might create an inset axes at the top-right corner of another axes by setting the x and y
position to 0.65 (that is, starting at 65% of the width and 65% of the height of the figure) and the
x and y extents to 0.2 (that is, the size of the axes is 20% of the width and 20% of the height of
the figure).
import matplotlib.pyplot as plt
import numpy as np
ax1 = plt.axes() # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])
OUTPUT:

Vertical sub plot


The equivalent of plt.axes() command within the object-oriented interface is
ig.add_axes(). Let’s use this to create two vertically stacked axes.
fig = plt.figure()
ax1 = fig.add_axes([0.1, 0.5, 0.8, 0.4], xticklabels=[], ylim=(-1.2, 1.2))
ax2 = fig.add_axes([0.1, 0.1, 0.8, 0.4], ylim=(-1.2, 1.2))
x = np.linspace(0, 10)
ax1.plot(np.sin(x))
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
ax2.plot(np.cos(x));
OUTPUT:

 We now have two axes (the top with no tick labels) that are just touching: the
bottom of the upper panel (at position 0.5) matches the top of the lower panel (at
position 0.1+ 0.4).
 If the axis value is changed in second plot both the plots are separated with each
other, example
ax2 = fig.add_axes([0.1, 0.01, 0.8, 0.4])

plt.subplot: Simple Grids of Subplots


 Matplotlib has several convenience routines to align columns or rows of subplots.
 The lowest level of these is plt.subplot(), which creates a single subplot within a
grid.
 This command takes three integer arguments—the number of rows, the number of
columns, and the index of the plot to be created in this scheme, which runs from
the upper left to the bottom right
for i in range(1, 7):
plt.subplot(2, 3, i)
plt.text(0.5, 0.5, str((2, 3, i)), fontsize=18, ha='center')
OUTPUT:

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


plt.subplots: The Whole Grid in One Go
 The approach just described can become quite tedious when you’re creating a large
grid of subplots,
especially if you’d like to hide the x- and y-axis labels on the inner plots.
 For this purpose, plt.subplots() is the easier tool to use (note the s at the end of
subplots).
 Rather than creating a single subplot, this function creates a full grid of subplots in
a single line, returning them in a NumPy array
 Rather than creating a single subplot, this function creates a full grid of subplots in
a single line, returning them in a NumPy array.
 The arguments are the number of rows and number of columns, along with
optional keywords sharex and sharey, which allow you to specify the relationships
between different axes.
 Here we’ll create a 2×3 grid of subplots, where all axes in the same row share their
y- axis scale, and all axes in the same column share their x-axis scale
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')
Note that by specifying sharex and sharey, we’ve automatically removed inner labels on
the grid to make the plot cleaner.

plt.GridSpec: More Complicated Arrangements


To go beyond a regular grid to subplots that span multiple rows and columns,
plt.GridSpec() is the best tool. The plt.GridSpec() object does not create a plot by itself; it is
simply a convenient interface that is recognized by the plt.subplot() command.
For example, a gridspec for a grid of two rows and three columns with some specified
width and height space looks like this:

grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)


From this we can specify subplot locations and extents plt.subplot(grid[0, 0])
plt.subplot(grid[0, 1:])
plt.subplot(grid[1, :2])
plt.subplot(grid[1, 2]);
OUTPUT:

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


TEXT AND ANNOTATION IN MATPLOTLIB
The most basic types of annotations we will use are axes labels and titles, here we will see
some more visualization and annotation information’s.
 Text annotation can be done manually with the plt.text/ax.text command, which
will place text at a particular x/y value.
 The ax.text method takes an x position, a y position, a string, and then optional
keywords specifying the color, size, style, alignment, and other properties of the
text. Here we used ha='right' and ha='center', where ha is short for horizontal
alignment.

Transforms and Text Position


 We anchored our text annotations to data locations. Sometimes it’s preferable to
anchor the text to a position on the axes or figure, independent of the data. In
Matplotlib, we do this by modifying the transform.
 Any graphics display framework needs some scheme for translating between
coordinate systems.
 Mathematically, such coordinate transformations are relatively straightforward,
and Matplotlib has a well- developed set of tools that it uses internally to perform
them (the tools can be explored in the matplotlib.transforms submodule).
 There are three predefined transforms that can be useful in this situation.
o ax.transData - Transform associated with data coordinates
o ax.transAxes - Transform associated with the axes (in units of axes
dimensions)
o fig.transFigure - Transform associated with the figure (in units of figure
dimensions)
Example
import matplotlib.pyplot as plt
import matplotlib as mpl plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd
fig, ax = plt.subplots(facecolor='lightgray') ax.axis([0, 10, 0, 10])
# transform=ax.transData is the default, but we'll specify it anyway ax.text(1, 5, ". Data: (1, 5)",
transform=ax.transData)
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
ax.text(0.5, 0.1, ". Axes: (0.5, 0.1)", transform=ax.transAxes)
ax.text(0.2, 0.2, ". Figure: (0.2, 0.2)", transform=fig.transFigure);
OUTPUT:

Note that by default, the text is aligned above and to the left of the specified coordinates; here the
“.” at the beginning of each string will approximately mark the given coordinate location.

The transData coordinates give the usual data coordinates associated with the x- and y-axis
labels. The transAxes coordinates give the location from the bottom-left corner of the axes (here
the white box) as a fraction of the axes size.

The transfigure coordinates are similar, but specify the position from the bottom left of the
figure (here the gray box) as a fraction of the figure size.

Notice now that if we change the axes limits, it is only the transData coordinates that will be
affected, while the others remain stationary.

Arrows and Annotation


 Along with tick marks and text, another useful annotation mark is the simple
arrow.
 Drawing arrows in Matplotlib is not much harder because there is a plt.arrow()
function available.
 The arrows it creates are SVG (scalable vector graphics)objects that will be subject
to the varying aspect ratio of your plots, and the result is rarely what the user
intended.
 The arrow style is controlled through the arrowprops dictionary, which has
numerous options available.
Text and annotations in Matplotlib are used to add descriptive text, labels, and
annotations to your plots. Here's how you can add text and annotations:
1. Adding Text:
• Use the text() function to add text at a specific location on the plot.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.text(2, 10, 'Example Text', fontsize=12, color='red')

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


plt.show()
Adding Annotations:
Use the annotate() function to add annotations with arrows pointing to specific points on the
plot.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.annotate('Example Annotation', xy=(2, 4), xytext=(3, 8), arrowprops=dict(facecolor='black',
shrink=0.05))
plt.show()
Customizing Text Properties:
• You can customize the appearance of text and annotations using various properties like
fontsize, color, fontstyle, fontweight, etc.
plt.text(2, 10, 'Example Text', fontsize=12, color='red', fontstyle='italic', fontweight='bold')
Text Alignment:
• Use the ha and va parameters to specify horizontal and vertical alignment of text.
plt.text(2, 10, 'Example Text', ha='center', va='top')
Adding Mathematical Expressions:
• You can use LaTeX syntax to include mathematical expressions in text and annotations.
plt.text(2, 10, r'$\alpha > \beta$', fontsize=12)
Rotating Text:
• Use the rotation parameter to rotate text.
plt.text(2, 10, 'Example Text', rotation=45)
Adding Background Color:
• Use the bbox parameter to add a background color to text.
plt.text(2, 10, 'Example Text', bbox=dict(facecolor='red', alpha=0.5))
These are some common techniques for adding text and annotations to your plots in Matplotlib.
They can be useful for providing additional information and context to your visualizations.

CUSTOMIZATION IN MATPLOTLIB
Customization in Matplotlib allows you to control various aspects of your plots, such as
colors, line styles, markers, fonts, and more. Here are some common customization options:
1. Changing Figure Size:
• Use figsize in plt.subplots() or plt.figure() to set the size of the figure
fig, ax = plt.subplots(figsize=(8, 6))
Changing Line Color, Style, and Width:
• Use color, linestyle, and linewidth parameters in plot functions to customize the lines.
plt.plot(x, y, color='red', linestyle='--', linewidth=2)
Changing Marker Style and Size:
• Use marker, markersize, and markerfacecolor parameters to customize markers in scatter
plots.
plt.scatter(x, y, marker='o', s=100, c='blue')
Setting Axis Limits:
• Use xlim() and ylim() to set the limits of the x and y axes.
plt.xlim(0, 10)
plt.ylim(0, 20)

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Setting Axis Labels and Title:
• Use xlabel(), ylabel(), and title() to set axis labels and plot title.
plt.xlabel('X-axis Label', fontsize=12)
plt.ylabel('Y-axis Label', fontsize=12)
plt.title('Plot Title', fontsize=14)
Changing Tick Labels:
• Use xticks() and yticks() to set custom tick labels on the x and y axes.
plt.xticks([1, 2, 3, 4, 5], ['A', 'B', 'C', 'D', 'E'])
Adding Gridlines:
• Use grid() to add gridlines to the plot.
plt.grid(True)
Changing Font Properties:
• Use fontdict parameter in text functions to set font properties.
plt.text(2, 10, 'Example Text', fontdict={'family': 'serif', 'color': 'blue', 'size': 12})
Adding Legends:
• Use legend() to add a legend to the plot.
plt.legend(['Line 1', 'Line 2'], loc='upper left')
These are some common customization options in Matplotlib. You can combine these options to
create
highly customized and visually appealing plots for your data.

THREE DIMENSIONAL PLOTTING IN MATPLOTLIB


We enable three-dimensional plots by importing the mplot3d toolkit, included with the
main Matplotlib installation.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d fig = plt.figure()
ax = plt.axes(projection='3d')
With this 3D axes enabled, we can now plot a variety of three-dimensional plot types.

Matplotlib provides a toolkit called mplot3d for creating 3D plots. You can create 3D
scatter plots, surface plots, wireframe plots, and more.
Three-Dimensional Points and Lines
The most basic three-dimensional plot is a line or scatter plot created from sets of (x, y, z)
triples.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d ax = plt.axes(projection='3d')
# Data for a three-dimensional line
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')
# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');
plt.show()
Notice that by default, the scatter points have their transparency adjusted to give a sense
of depth on the page.

In analogy with the more common two-dimensional plots discussed earlier, we can create
these using the ax.plot3D and ax.scatter3D functions
Here's a basic example of creating a 3D scatter plot:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d
import Axes3D
import numpy as np
# Generate random data
x = np.random.normal(size=500)
y = np.random.normal(size=500)
z = np.random.normal(size=500)
# Create a 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c='b', marker='o')
# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Scatter Plot')
# Show plot
plt.show()
In this example, fig.add_subplot(111, projection='3d') creates a 3D subplot, and
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
ax.scatter(x, y, z, c='b', marker='o') creates a scatter plot in 3D space. You can customize the
appearance of the plot by changing parameters such as c (color), marker, and adding labels and a
title.
You can also create surface plots and wireframe plots using the plot_surface() and
plot_wireframe() functions, respectively. Here's an example of a 3D surface plot:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d
import Axes3D
import numpy as np
# Generate data
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
# Create a 3D surface plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Surface Plot')
# Show plot
plt.show()
These examples demonstrate how to create basic 3D plots in Matplotlib. You can explore
the mplot3d toolkit and its functions to create more advanced 3D visualizations.
Three-Dimensional Contour Plots
 mplot3d contains tools to create three-dimensional relief plots using the same
inputs.

 Like two-dimensional ax.contour plots, ax.contour3D requires all the input data to
be in the form of two- dimensional regular grids, with the Z data evaluated at each
point.

 Here we’ll show a three-dimensional contour diagram of a three dimensional


sinusoidal function

import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d def f(x, y):
return np.sin(np.sqrt(x ** 2 + y ** 2))
x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Sometimes the default viewing angle is not optimal, in which case we can use the view_init
method to set the elevation and azimuthal angles.
ax.view_init(60, 35)
fig

Wire frames and Surface Plots

 Two other types of three-dimensional plots that work on gridded data are
wireframes and surface plots.
 These take a grid of values and project it onto the specified threedimensional
surface, and can make the resulting three-dimensional forms quite easy to
visualize.
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='black')
ax.set_title('wireframe');
plt.show()

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


 A surface plot is like a wireframe plot, but each face of the wireframe is a filled
polygon.
 Adding a colormap to the filled polygons can aid perception of the topology of the
surface being visualized
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis', edgecolor='none')
ax.set_title('surface')
plt.show()

Surface Triangulations
 For some applications, the evenly sampled grids required by the preceding
routines are overly restrictive and inconvenient.
 In these situations, the triangulation-based plots can be very useful.

import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits
import mplot3d
theta = 2 * np.pi * np.random.random(1000) r = 6 * np.random.random(1000)
x = np.ravel(r * np.sin(theta))
y = np.ravel(r * np.cos(theta))
z = f(x, y)
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, c=z, cmap='viridis', linewidth=0.5)

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


GEOGRAPHIC DATA WITH BASEMAP IN MATPLOTLIB
 One common type of visualization in data science is that of geographic data.
 Matplotlib’s main tool for this type of visualization is the Basemap toolkit, which is
one of several Matplotlib toolkits that live under the mpl_toolkits namespace.
 Basemap is a useful tool for Python users to have in their virtual toolbelts
 Installation of Basemap. Once you have the Basemap toolkit installed and
imported, geographic plots also require the PIL package in Python 2, or the pillow
package in Python 3.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);

 Matplotlib axes that understands spherical coordinates and allows us to easily


over-plot data on the map
 We’ll use an etopo image (which shows topographical features both on land and
under the ocean) as the map background Program to display particular area of the
map with latitude and longitude lines
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap from itertools import chain
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None, width=8E6, height=8E6, lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)
def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
lats = m.drawparallels(np.linspace(-90, 90, 13))
lons = m.drawmeridians(np.linspace(-180, 180, 13)) # keys contain the plt.Line2D instances
lat_lines = chain(*(tup[1][0] for tup in lats.items()))
lon_lines = chain(*(tup[1][0] for tup in lons.items()))
all_lines = chain(lat_lines, lon_lines)
# cycle through these lines and set the desired style for line in all_lines:
line.set(linestyle='-', alpha=0.3, color='r')

Basemap is a toolkit for Matplotlib that allows you to create maps and plot geographic
data. It provides various map projections and features for customizing maps. Here's a basic
example of plotting geographic data using Basemap:
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
# Create a map
plt.figure(figsize=(10, 6))
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\ llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.drawcountries()
m.fillcontinents(color='lightgray',lake_color='aqua')
m.drawmapboundary(fill_color='aqua')
# Plot cities
lons = [-77.0369, -122.4194, 120.9660, -0.1276]
lats = [38.9072, 37.7749, 14.5995, 51.5074]
cities = ['Washington, D.C.', 'San Francisco', 'Manila', 'London']
x, y = m(lons, lats)
m.scatter(x, y, marker='o', color='r')
# Add city labels
for city, xpt, ypt in zip(cities, x, y):
plt.text(xpt+50000, ypt+50000, city, fontsize=10, color='blue')
# Add a title
plt.title('Cities Around the World')
# Show the map
plt.show()
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
In this example, we first create a Basemap instance with the desired projection and map extent.
We then draw coastlines, countries, continents, and a map boundary. Next, we plot cities on the
map using the scatter() method and add labels for each city using plt.text(). Finally, we add a title
to the plot and display the map. Basemap offers a wide range of features for working with
geographic data, including support for various map projections, drawing political boundaries,
and plotting points, lines, and shapes on maps. You can explore the Basemap documentation for
more advanced features and customization options.

Map Projections
The Basemap package implements several dozen such projections, all referenced by a
short format code. Here we’ll briefly demonstrate some of the more common ones.
 Cylindrical projections
 Pseudo-cylindrical projections
 Perspective projections
 Conic projections
Cylindrical projection
 The simplest of map projections are cylindrical projections, in which lines of
constant latitude and longitude are mapped to horizontal and vertical lines,
respectively.
 This type of mapping represents equatorial regions quite well, but results in
extreme distortions near the poles.
 The spacing of latitude lines varies between different cylindrical projections,
leading to different conservation properties, and different distortion near the
poles.
 Other cylindrical projections are the Mercator (projection='merc') and the
cylindrical equal-area (projection='cea') projections.
 The additional arguments to Basemap for this view specify the latitude (lat) and
longitude (lon) of the lower-left corner (llcrnr) and upper-right corner (urcrnr) for
the desired map, in units of degrees.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None, llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180,
urcrnrlon=180, ) draw_map(m)

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Pseudo-cylindrical projections
 Pseudo-cylindrical projections relax the requirement that meridians (lines of
constant longitude) remain vertical; this can give better properties near the poles
of the projection.
 The Mollweide projection (projection='moll') is one common example of this, in
which all meridians are elliptical arcs
 It is constructed so as to preserve area across the map: though there are
distortions near the poles, the area of small patches reflects the true area.
 Other pseudo-cylindrical projections are the sinusoidal (projection='sinu') and
Robinson (projection='robin') projections.
 The extra arguments to Basemap here refer to the central latitude (lat_0) and
longitude (lon_0) for the desired map.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='moll', resolution=None, lat_0=0, lon_0=0)
draw_map(m)

Perspective projections
 Perspective projections are constructed using a particular choice of perspective
point, similar to if you photographed the Earth from a particular point in space (a
point which, for some projections, technically lies within the Earth!).
 One common example is the orthographic projection (projection='ortho'), which
shows one side of the globe as seen from a viewer at a very long distance.
 Thus, it can show only half the globe at a time.
 Other perspective-based projections include the gnomonic projection
(projection='gnom') and stereographic projection (projection='stere').
 These are often the most useful for showing small portions of the map.

import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=0)
draw_map(m);

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Conic projections
 A conic projection projects the map onto a single cone, which is then unrolled.
 This can lead to very good local properties, but regions far from the focus point of
the cone may become very distorted.
 One example of this is the Lambert conformal conic projection (projection='lcc').
 It projects the map onto a cone arranged in such a way that two standard parallels
(specified in Basemap by lat_1 and lat_2) have well-represented distances, with
scale decreasing between them and increasing outside of them.
 Other useful conic projections are the equidistant conic (projection='eqdc') and the
Albers equal-area (projection='aea') projection
import numpy as np
import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None, lon_0=0, lat_0=50, lat_1=45, lat_2=55,
width=1.6E7, height=1.2E7)
draw_map(m)

Drawing a Map Background


The Basemap package contains a range of useful functions for drawing borders of
physical features like continents, oceans, lakes, and rivers, as well as political boundaries such as
countries and US states and counties.
The following are some of the available drawing functions that you may wish to explore
using IPython’s help features:
Physical boundaries and bodies of water
o drawcoastlines() - Draw continental coast lines
o drawlsmask() - Draw a mask between the land and sea, for use with projecting images on

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


one or the other
o drawmapboundary() - Draw the map boundary, including the fill color for oceans
o drawrivers() - Draw rivers on the map
o fillcontinents() - Fill the continents with a given color; optionally fill lakes with another
color

 Political boundaries
drawcountries() - Draw country boundaries drawstates() - Draw US state boundaries
drawcounties() - Draw US county boundaries
 Map features
drawgreatcircle() - Draw a great circle between two points drawparallels() - Draw lines of
constant latitude drawmeridians() - Draw lines of constant longitude drawmapscale() - Draw a
linear scale on the map
 Whole-globe images
bluemarble() - Project NASA’s blue marble image onto the map shadedrelief() - Project a shaded
relief image onto the map etopo() - Draw an etopo relief image onto the map warpimage() -
Project a user-provided image onto the map

Plotting Data on Maps


 The Basemap toolkit is the ability to over-plot a variety of data onto a map background.
 There are many map-specific functions available as methods of the Basemap instance. Some
of these map-specific methods are:
contour()/contourf() - Draw contour lines or filled contours
imshow() - Draw an image
pcolor()/pcolormesh() - Draw a pseudocolor plot for irregular/regular meshes plot() - Draw
lines and/or markers
scatter() - Draw points with markers quiver() - Draw vectors
barbs() - Draw wind barbs drawgreatcircle() - Draw a great circle

VISUALIZATION WITH SEABORN


The main idea of Seaborn is that it provides high-level commands to create a variety of
plot types useful for statistical data exploration, and even some statistical model fitting.
Seaborn is a Python visualization library based on Matplotlib that provides a high-level
interface for creating attractive and informative statistical graphics. It is particularly useful for
visualizing data from Pandas DataFrames and NumPy arrays. Seaborn simplifies the process of
creating complex visualizations such as categorical plots, distribution plots, and relational plots.
Histograms, KDE, and densities
 In statistical data visualization, all you want is to plot histograms and joint
distributions of variables. We have seen that this is relatively straightforward in
Matplotlib
 Rather than a histogram, we can get a smooth estimate of the distribution using a
kernel density estimation, which Seaborn does with sns.kdeplot
import pandas as pd import seaborn as sns
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000
)
data = pd.DataFrame(data, columns=['x', 'y'])
R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals
for col in 'xy':
sns.kdeplot(data[col], shade=True)
 Histograms and KDE can be combined using distplot sns.distplot(data['x'])
sns.distplot(data['y']);
 If we pass the full two-dimensional dataset to kdeplot, we will get a two-
dimensional visualization of the data.
 We can see the joint distribution and the marginal distributions together using
sns.jointplot.

Here's a brief overview of some of the key features of Seaborn:


1. Installation:
• You can install Seaborn using pip:
pip install seaborn
Importing Seaborn:
• Import Seaborn as sns conventionally:
import seaborn as sns
Loading Example Datasets:
• Seaborn provides several built-in datasets for practice and exploration:
tips = sns.load_dataset('tips')
Categorical Plots:
• Seaborn provides several functions for visualizing categorical data, such as sns.catplot(),
sns.barplot(), sns.countplot(), and sns.boxplot().
sns.catplot(x='day', y='total_bill', data=tips, kind='box')
Distribution Plots:
• Seaborn offers various functions for visualizing distributions, including sns.distplot(),
sns.kdeplot(), and sns.histplot().
sns.distplot(tips['total_bill'])

Relational Plots:
• Seaborn provides functions for visualizing relationships between variables, such as
sns.relplot(), sns.scatterplot(), and sns.lineplot().
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter')
Heatmaps:
• Seaborn can create heatmaps to visualize matrix-like data using sns.heatmap().
flights = sns.load_dataset('flights').pivot('month', 'year', 'passengers')
sns.heatmap(flights, annot=True, fmt='d')

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Pairplots:
When you generalize joint plots to datasets of larger dimensions, you end up with pair
plots. This is very useful for exploring correlations between multidimensional data, when you’d
like to plot all pairs of values against each other.
We’ll demo this with the Iris dataset, which lists measurements of petals and sepals of
three iris species:
import seaborn as sns
iris = sns.load_dataset("iris") sns.pairplot(iris, hue='species', size=2.5);

• Pairplots are useful for visualizing pairwise relationships in a dataset using sns.pairplot().
sns.pairplot(tips, hue='sex')
1. Styling and Themes:
• Seaborn allows you to customize the appearance of plots using styling functions
(sns.set(), sns.set_style(), sns.set_context()) and themes (sns.set_theme()).
2. Other Plots:
• Seaborn offers many other types of plots and customization options. The official Seaborn
documentation provides detailed examples and explanations for each type of plot.
Seaborn is built on top of Matplotlib and integrates well with Pandas, making it a powerful tool
for visualizing data in Python.
Faceted histograms
 Sometimes the best way to view data is via histograms of subsets. Seaborn’s
FacetGrid makes this extremely simple.
 We’ll take a look at some data that shows the amount that restaurant staff receive
in tips based on various indicator data

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals


Factor plots
Factor plots can be useful for this kind of visualization as well. This allows you to view the
distribution of a parameter within bins defined by any other parameter.

Joint distributions
Similar to the pair plot we saw earlier, we can use sns.jointplot to show the joint distribution
between different datasets, along with the associated marginal distributions.

Bar plots
Time series can be plotted with sns.factorplot.

R.GAYATHRI / AP-CSE UNIT-IV NOTES Data Science Fundamentals

You might also like