0% found this document useful (0 votes)
76 views40 pages

UNIT-5 (1)

This document provides an overview of data visualization techniques using Matplotlib in Python, covering various plot types such as line plots, scatter plots, histograms, and contour plots. It explains the dual interfaces of Matplotlib, customization options for plots, and methods for visualizing errors and density. Additionally, it includes examples of code for creating and customizing different types of visualizations.

Uploaded by

kavya sree bandi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views40 pages

UNIT-5 (1)

This document provides an overview of data visualization techniques using Matplotlib in Python, covering various plot types such as line plots, scatter plots, histograms, and contour plots. It explains the dual interfaces of Matplotlib, customization options for plots, and methods for visualizing errors and density. Additionally, it includes examples of code for creating and customizing different types of visualizations.

Uploaded by

kavya sree bandi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

UNIT-5 Data Visualization with Matplotlib

Python for Data Visualization: Visualization with matplotlib – box plot, line plots –
scatter plots – visualizing errors – density and contour plots – histograms, binnings,
and density – three-dimensional plotting – geographic data – data analysis using state
models and seaborn – graph plotting using Plotly – interactive data visualization using
Bokeh.

Visualization with Matplotlib


 Matplotlib is a tool for visualization in Python.
 Matplotlib is a multiplatform data visualization library built on NumPy arrays, and designed
to work with the broader SciPy stack.
 It was conceived by John Hunter in2002
 One of Matplotlib’s most important features is its ability to play well with many operating
systems and graphics backends.
 Matplotlib supports dozens of backends and output types.

Two Interfaces of matplotlib:


A potentially confusing feature of Matplotlib is its dual interfaces:
 MATLAB-style interface
 Object-oriented interface.

MATLAB-Style Interface Object-Oriented Interface


1. A convenient MATLAB-style state-based 1. More powerful object-oriented
interface interface.
2. The MATLAB-style tools arecontained in 2. Tools are methods of explicit Figure
the pyplot (plt) interface. and Axes object.

3. Depends on some notion of an “active” 3. In the object-oriented interface the


figure or axes plotting functions are methods ofexplicit
Figure and Axes objects.
4. Best choice for more simple plots 4. Best choice for complicated plots.

5. In MATLAB-style interface plt.plot() 5. In the object-oriented interface to


method is used plotting ax.plot() method is used.
UNIT-5 Data Visualization with Matplotlib

Simple Line Plots


 The simplest of all plots is the visualization of a single function y = f(x)
 For all Matplotlib plots, we start by creating a figure and an axes. In their simplest form, a
figure and axes can be created as follows:
Example:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes()

Output:

 In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single
container that contains all the objects representing axes, graphics, text, and labels.
 The axes (an instance of the class plt.Axes) are what we see above: a bounding box with ticks
and labels, which will eventually contain the plot elements that make up our visualization.

Example:
#Simple Line Plots
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5, 6]
y = [1, 5, 3, 5, 7, 8]
plt.plot(x, y)
plt.show()
Output:
UNIT-5 Data Visualization with Matplotlib

Example:
#Plot a Line Plot Logarithmically in Matplotlib
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 5, 10) # [0, 0.55, 1.11, 1.66, 2.22, 2.77, 3.33, 3.88, 4.44, 5]
y = np.exp(x) # [1, 1.74, 3.03, 5.29, 9.22, 16.08, 28.03, 48.85, 85.15, 148.41]
plt.plot(x, y)
plt.show()

Output:
UNIT-5 Data Visualization with Matplotlib

Example:
#Customizing Line Plots in Matplotlib
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randint(low=1, high=10, size=25)
plt.plot(x, color = 'blue', linewidth=3, linestyle='-.')
plt.show()

Output:

Controlling the appearance of the axes and lines in plots


Adjusting the Plot:
 Line Colors and Styles
 Axes Limits Labelling Plots
Adjusting the Plot: Line Colors and Styles
 The plt.plot() function takes additional arguments that can be used to specify line colors and
styles
 To adjust the color, we can use the color keyword, which accepts a string argument
representing virtually any imaginable color.
 The color can be specified in a variety of ways :

Example:
UNIT-5 Data Visualization with Matplotlib

#Adjusting the Plot: Line Colors and Styles


import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 1000)
# Controlling the color of plot elements
plt.plot(x, np.sin(x - 0), color='blue') # specify color by name
plt.plot(x, np.sin(x - 1), color='g') # short color code (rgbcmyk)
plt.plot(x, np.sin(x - 2), color='0.75') # Grayscale between 0 and 1
plt.plot(x, np.sin(x - 3), color='#FFDD44') # Hex code (RRGGBB from 00 to FF)
plt.plot(x, np.sin(x - 4), color=(1.0,0.2,0.3)) # RGB tuple, values 0 and 1
plt.plot(x, np.sin(x - 5), color='chartreuse'); # all HTML color names supported

Output:

Similarly, we can adjust the line style using the linestyle keyword
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 4, linestyle='-') # solid
plt.plot(x, x + 5, linestyle='--') # dashed
plt.plot(x, x + 6, linestyle='-.') # dashdot
plt.plot(x, x + 7, linestyle=':'); # dotted
plt.show()
UNIT-5 Data Visualization with Matplotlib

Output:

These linestyle and color codes can be combined into a single


non keyword argument to the plt.plot() function.

Example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 1000)
plt.plot(x, x + 0, '-g') # solid green
plt.plot(x, x + 1, '--c') # dashed cyan
plt.plot(x, x + 2, '-.k') # dashdot black
plt.plot(x, x + 3, ':r'); # dotted red
plt.show()

Output:
UNIT-5 Data Visualization with Matplotlib

Labeling Plots
 Labeling of plots includes titles, axis labels, and simple legends.
 Titles and axis labels are the simplest such labels.

Example:
#Labeling Plots
import matplotlib.pyplot as plt
import numpy as np
plt.title("Labeling Plots")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")
plt.show()
Output:
UNIT-5 Data Visualization with Matplotlib

Scatter Plots
 Another commonly used plot type is the simple scatter plot, a close cousin of the line plot.
 Instead of points being joined by line segments, here the points are represented individually
with a dot, circle, or other shape.

Example:
#Scatter Plots with plt.plot
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, '*', color='blue')
plt.show()
Output:
UNIT-5 Data Visualization with Matplotlib

 The third argument in the function call plt.plot is a character that represents the type of
symbol used for the plotting.
 Additional keyword arguments to plt.plot specify a wide range of properties of the lines and
markers.

Example:
#Properties of lines and markers
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, '-s', color='blue',
markersize=15, linewidth=2,
markerfacecolor='red',
markeredgecolor='green',
markeredgewidth=2)
plt.ylim(-1.2, 1.2)
plt.show()
Output:
UNIT-5 Data Visualization with Matplotlib

Scatter Plots with plt.scatter


 A second, more powerful method of creating scatter plots is the plt.scatter function, which
can be used very similarly to the plt.plot function
Example:
#Scatter plot with plt.scatter
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.scatter(x, y,marker='s')
plt.show()

Output:
UNIT-5 Data Visualization with Matplotlib

Plot Versus Scatter:


 The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots
where the properties of each individual point (size, face color, edge color, etc.) can be
individually controlled or mapped to data.
 While it doesn’t matter as much for small amounts of data, as datasets get larger than a few
thousand points, plt.plot can be noticeably more efficient than plt.scatter and for this reason,
plt.plot should be preferred over plt.scatter for large datasets.

Visualizing Errors
 For any scientific measurement, accurate accounting for errors is nearly as important, if not
more important, than accurate reporting of the number itself.
 In visualization of data and results, showing these errors effectively can make a plot convey
much more complete information.
Basic Error bars
 Error bars indicate how much each data point in a plot deviates from the actual value.
 Error bars display the standard deviation of the distribution while the actual plot depicts the
shape of the distribution.
 For graphs present in the research papers it is a necessary to display the error value along
with the data points which depicts the amount of uncertainty present in the data.
 The function matplotlib.pyplot.errorbar() plots given x values and y values in a graph and
marks the error or standard deviation of the distribution on each point.
UNIT-5 Data Visualization with Matplotlib

Example:
#Visualizing Errors
#Basic Error Bars
import matplotlib.pyplot as plt
import numpy as np
x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
plt.plot(x,y)
plt.errorbar(x, y, xerr=0.3,yerr=0.2, fmt='s', color='red',ecolor='blue');
plt.show()

Output:

Example:
#Visulizing errors
import numpy as np
import matplotlib.pyplot as plt
x=[1,2,3,4,5]
y=[2,4,6,8,10]
UNIT-5 Data Visualization with Matplotlib

plt.errorbar(x,y,xerr=0.3,yerr=0.5,fmt='o',color='red',ecolor='blue')
plt.show()

Output:

Density and Contour Plots


(Matplotlib functions used to display three-dimensional data in two dimensions)
To display three-dimensional data in two dimensions using contours or color-coded regions.
There are three Matplotlib functions that can be helpful for this task:
 plt.contour for contour plots
 plt.contourf for filled contour plots
 plt.imshow for showing images.
Visualizing a Three-Dimensional Function
A contour plot can be created with the plt.contour function. It takes three arguments:
 a grid of x values,
 a grid of y values, and
 a grid of z values.
UNIT-5 Python for Data Visualization

The x and y values represent positions on the plot, and the z values will be represented by the
contour levels.
The way to prepare such data is to use the np.meshgrid function, which builds two-dimensional
grids from one-dimensional arrays:

Example:
#contour plots
import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X,Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
fig,ax=plt.subplots(1,1)
cp = ax.contourf(X, Y, Z)
ax.set_title('Filled Contours Plot')
ax.set_xlabel('x (cm)')
ax.set_ylabel('y (cm)')
plt.show()

Output:
UNIT-5 Python for Data Visualization

Example:
#Contour Plot
import matplotlib.pyplot as plt
plt.figure()
cp = plt.contour(X, Y, Z, colors='magenta')
plt.title('Contour Plot')
plt.xlabel('x (cm)')
plt.ylabel('y (cm)')
plt.show()

Output:
UNIT-5 Python for Data Visualization

Histograms, Binnings, and Density


Histogram is the simple plot to represent the large data set. A histogram is a graph showing
frequency distributions. It is a graph showing the number of observations within each given
interval.
Parameters:
 plt.hist( ) is used to plot histogram. The hist() function will use an array of numbers to create
a histogram, the array is sent into the function as an argument.
 bins - A histogram displays numerical data by grouping data into "bins" of equal width. Each
bin is plotted as a bar whose height corresponds to how many data points are in that bin. Bins
are also sometimes called "intervals", "classes", or "buckets".
 normed - Histogram normalization is a technique to distribute the frequencies of the
histogram over a wider range than the current range.
 x - (n,) array or sequence of (n,) arrays Input values, this takes either a single array or a
sequence of arrays which are not required to be of the same length.
 histtype - {'bar', 'barstacked', 'step', 'stepfilled'}, optional The type of histogram to draw.
 'bar' is a traditional bar-type histogram. If multiple data are given the bars are
arranged side by side.
UNIT-5 Python for Data Visualization

 'barstacked' is a bar-type histogram where multiple data are stacked on top of


each other.
 'step' generates a lineplot that is by default unfilled.
 'stepfilled' generates a lineplot that is by default filled. Default is 'bar'
 align - {'left', 'mid', 'right'}, optional Controls how the histogram is plotted.
 'left': bars are centered on the left bin edges.
 'mid': bars are centered between the bin edges.
 'right': bars are centered on the right bin edges. Default is 'mid'
 orientation - {'horizontal', 'vertical'}, optional

If 'horizontal', barh will be used for bar-type histograms and the bottom kwarg will be the left
edges.
 color - color or array_like of colors or None, optional
Color spec or sequence of color specs, one per dataset. Default (None) uses the standard line color
sequence.
Default is None
 label - str or None, optional. Default is None
Other parameter
 **kwargs - Patch properties, it allows us to pass a variable number of keyword arguments to a
python function. ** denotes this type of function.

Example:
#Histograms
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data = np.random.randn(1000)
print(plt.hist(data))

Output:
UNIT-5 Python for Data Visualization

(array([ 6., 27., 73., 168., 234., 227., 167., 65., 25., 8.]), array([-3.04715424, -2.43211248, -
1.81707073, -1.20202897, -0.58698722, 0.02805453, 0.64309629, 1.25813804, 1.87317979,
2.48822155, 3.1032633 ]), <BarContainer object of 10 artists>)

Example:
#Histograms,Binnings and Density
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
plt.hist(data["x"], alpha=0.5)
plt.hist(data["y"], alpha=0.5)
plt.show()

Output:
UNIT-5 Python for Data Visualization

Kernel density estimation


 Another common method of evaluating densities in multiple dimensions is kernel density
estimation (KDE).
 KDE can be thought of as a way to “smear out” the points in space and add up the result to
obtain a smooth function.
 KDE has a smoothing length that effectively slides the knob between detail and smoothness.

Example:
#Kernal Density Estimation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
sns.kdeplot(data["x"], shade=True)
sns.kdeplot(data["y"], shade=True)
plt.show()
UNIT-5 Python for Data Visualization

Output:

Three-Dimensional Plotting
 Matplotlib was initially designed with only two-dimensional plotting in mind. Around the
time of the 1.0 release, some three-dimensional plotting utilities were built on top of
Matplotlib’s two-dimensional display, and the result is a convenient (if somewhat limited) set
of tools for three- dimensional data visualization.
 We enable three-dimensional plots by importing the mplot3d toolkit from mpl_toolkits
import mplot3d
 We can create a three-dimensional axes by passing the keyword projection='3d' to any of the
normal axes creation routines:
import numpy as np
import matplotlib.pyplot as plt fig = plt.figure()
ax = plt.axes(projection='3d')
 Three-dimensional plotting is one of the functionalities that benefits immensely from viewing
figures interactively rather than statically in the notebook when running this code.
 Different three-dimensional plots are
 Three-Dimensional Points and Lines
(Three-dimensional Line or scatter Plot)
UNIT-5 Python for Data Visualization

 Three-dimensional contour Plot


 Wireframe and Surface Plots
 Surface Triangulations (Triangulated surface plot)

Example 1:
#Three-Diemensional Plotting
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')

Output:

Example 2:
#3-D line plot
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d
UNIT-5 Python for Data Visualization

fig = plt.figure()
# Data for a three-dimensional line
ax = plt.axes(projection='3d')
z = np.linspace(0,15,1000)
x = np.sin(z)
y = np.sin(z)
ax.scatter3D(x,y,z,c=z,cmap='Greens')
plt.show()

Output:

Example 3:
#3-D Scatter plot
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d
fig = plt.figure()
UNIT-5 Python for Data Visualization

ax = plt.axes(projection='3d')
z = 15*np.random.randn(100)
x= np.sin(z) +0.1*np.random.randn(100)
y= np.sin(z) +0.1*np.random.randn(100)
ax.scatter3D(x,y,z,color='blue')
plt.show()

Output:

Example 4:
#3D-Contour Plot
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d
def f(x, y):
return np.sin(np.sqrt(x ** 2 + y ** 2))
x = np.linspace(-6, 6, 30)
UNIT-5 Python for Data Visualization

y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='Reds')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()

Output:

Geographic Data
 One common type of visualization in data science is that of geographic data.
 Matplotlib’s main tool for geographic data visualization is the Basemap toolkit, which is
one of several Matplotlib toolkits that live under the mpl_toolkits namespace.
UNIT-5 Python for Data Visualization

 Basemap feels a bit clunky to use, and often even simple visualizations take much longer
to render
 Basemap is a useful tool for Python users to have in their virtual toolbelts.
Map Projections
 Map projections are used to project a spherical map, such as that of the Earth, onto a flat
surface without somehow distorting it or breaking its continuity.
 Depending on the intended use of the map projection, there are certain map features (e.g.,
direction, area, distance, shape, or other considerations) that are useful to maintain.

The Basemap package implements several projections, all referenced by a short format code. Some
of the more common ones are:
Cylindrical projections
Pseudo-cylindrical projections
Perspective projections
Conic projections
Cylindrical projection
 The simplest of map projections are cylindrical projections, in which lines of constant
latitude and longitude are mapped to horizontal and vertical lines, respectively.
 This type of mapping represents equatorial regions quite well, but results in extreme
distortions near the poles.
 The spacing of latitude lines varies between different cylindrical projections, leading to
different conservation properties, and different distortion near the poles.
 Other cylindrical projections are the Mercator (projection='merc') and the cylindrical
equal-area (projection='cea') projections.
 The additional arguments to Basemap for this view specify the latitude (lat) and longitude
(lon) of the lower-left corner (llcrnr) and upper-right corner (urcrnr) for the desired map,
in units of degrees.

Example:
#Cylindrical Projection
fig = plt.figure(figsize = (10,8))
UNIT-5 Python for Data Visualization

m = Basemap(projection='cyl',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180)
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawcountries(linewidth=1, linestyle='solid', color='k' )
m.drawmapboundary(fill_color='lightblue')
plt.title(" Cylindrical Equidistant Projection", fontsize=20)

Output:

Pseudo-cylindrical projections
 Pseudo-cylindrical projections relax the requirement that meridians (lines of constant
longitude) remain vertical; this can give better properties near the poles of the projection.
 The Mollweide projection (projection='moll') is one common example of this, in which all
meridians are elliptical arcs
 It is constructed so as to preserve area across the map: though there are distortions near the
poles, the area of small patches reflects the true area.
 Other pseudo-cylindrical projections are the sinusoidal (projection='sinu') and Robinson
(projection='robin') projections.
 The extra arguments to Basemap here refer to the central latitude (lat_0) and longitude
(lon_0) for the desired map.
UNIT-5 Python for Data Visualization

Example:
#Psuedo-Cylindrical Projection
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig = plt.figure(num=None, figsize=(8, 8) )
m = Basemap(projection='moll',lon_0=0,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,False],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mollweide Projection");
Output:

Perspective projections
 Perspective projections are constructed using a particular choice of perspective point,
similar to if you photographed the Earth from a particular point in space (a point which,
for some projections, technically lies within the Earth!).
UNIT-5 Python for Data Visualization

 One common example is the orthographic projection (projection='ortho'), which shows one
side of the globe as seen from a viewer at a very long distance. Thus, it can show only half
the globe at a time.
 Other perspective-based projections include the
 gnomonic projection (projection='gnom') and
 stereographic projection (projection='stere').
 These are often the most useful for showing small portions of the map.

Example:
#Perspective Projections
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
map = Basemap(projection='ortho',lat_0=0, lon_0=0)
map.drawmapboundary(fill_color='aqua')
map.fillcontinents(color='coral',lake_color='aqua')
map.drawcoastlines()
x, y = map(0, 0)
map.plot(x, y, marker='1',color='m')
plt.show()

Output:
UNIT-5 Python for Data Visualization

Conic Projections
 A conic projection projects the map onto a single cone, which is then unrolled.
 This can lead to very good local properties, but regions far from the focus point of the cone
may become much distorted.
 One example of this is the Lambert conformal conic projection (projection='lcc').
 It projects the map onto a cone arranged in such a way that two standard parallels (specified
in Basemap by lat_1 and lat_2) have well-represented distances, with scale decreasing
between them and increasing outside of them.
 Other useful conic projections are the equidistant conic (projection='eqdc') and the Albers
equal-area (projection='aea') projection

Example:
#ConicnProjections
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
UNIT-5 Python for Data Visualization

m =
Basemap(width=12000000,height=9000000,projection='lcc',resolution=None,lat_1=45.,lat_2=55
,lat_0=50,lon_0=-107.)
m.etopo()
plt.show()

Output:
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for
integers).

Drawing a Map Background


The Basemap package contains a range of useful functions for drawing borders of physical features
like continents, oceans, lakes, and rivers, as well as political boundaries such as countries and US
states and counties.
The following are some of the available drawing functions :
 Physical boundaries and bodies of water drawcoastlines() - Draw continental coast lines
UNIT-5 Python for Data Visualization

drawlsmask() - Draw a mask between the land and sea, for use with projecting images on one or
the other
drawmapboundary() - Draw the map boundary, including the fill color for oceans
drawrivers() - Draw rivers on the map
fillcontinents() - Fill the continents with a given color; optionally fill lakes with another color
 Political boundaries
drawcountries() - Draw country boundaries drawstates() - Draw US state boundaries drawcounties()
- Draw US county boundaries
 Map features
drawgreatcircle() - Draw a great circle between two points drawparallels() - Draw lines of constant
latitude drawmeridians() - Draw lines of constant longitude drawmapscale() - Draw a linear scale
on the map.
 Whole-globe images
bluemarble() - Project NASA’s blue marble image onto the map shadedrelief() - Project a shaded
relief image onto the map etopo() - Draw an etopo relief image onto the map
warpimage() - Project a user-provided image onto the map.

Plotting Data on Maps


 The Basemap toolkit is the ability to over-plot a variety of data onto a map background.
 There are many map-specific functions available as methods of the Basemap instance.
 Some of these map-specific methods are:
contour()/contourf() - Draw contour lines or filled contours
imshow() - Draw an image
pcolor()/pcolormesh() - Draw a pseudocolor plot for irregular/regular meshes
plot() - Draw lines and/or markers scatter() - Draw points with markers quiver() - Draw vectors
barbs() - Draw wind barbs drawgreatcircle() - Draw a great circle
UNIT-5 Python for Data Visualization

Data Analysis using Seaborn


 Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color
defaults, defines simple high-level functions for common statistical plot types, and integrates
with the functionality provided by Pandas DataFrames.
 The main idea of Seaborn is that it provides high-level commands to create a variety of plot
types useful for statistical data exploration, and even some statistical model fitting.
 Seaborn has many of its own high-level plotting routines, but it can also overwrite
Matplotlib’s default parameters and in turn get even simple Matplotlib scripts to produce
vastly superior output.
 We can set the style by calling Seaborn’s set() method. By convention, Seaborn is imported
as sns.

import seaborn as sns sns.set()


Rather than a histogram, we can get a smooth estimate of the distribution using a kernel
density estimation, which Seaborn does with sns.kdeplot

Example:
import pandas as pd
import seaborn as sns
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
for col in 'xy':
sns.kdeplot(data[col], shade=True)

Output:
UNIT-5 Python for Data Visualization

Histograms and KDE can be combined using distplot


sns.distplot(data['x'])
sns.distplot(data['y'])
UNIT-5 Python for Data Visualization

Data Analysis using StatsModels


 StatsModels is a Python module that provides classes and functions for the estimation of many
different statistical models, as well as for conducting statistical tests, and statistical data
exploration.
 StatsModels supports specifying models using R-style formulas and pandas
DataFrames.
 An extensive list of descriptive statistics, statistical tests, plotting functions, and result
statistics are available for different types of data and each estimator.
 Statsmodels is part of the Python scientific stack that is oriented towards data analysis, data
science and statistics.
 Statsmodels is built on top of the numerical libraries NumPy and SciPy, integrates with Pandas
for data handling, and uses Patsy for an R-like formula interface.
 Graphical functions are based on the Matplotlib library.
 Statsmodels provides the statistical backend for other Python libraries.
 Statsmodels is free software released under the Modified BSD (3-clause) license.
 Sample program: Linear Regression in Python using Statsmodels
UNIT-5 Python for Data Visualization

Example:
# import packages import numpy as np import pandas as pd
import statsmodels.formula.api as smf
# loading the csv file
df = pd.read_csv('headbrain1.csv') print(df.head())
# fitting the model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()
# model summary print(model.summary())

Output:

Graph Plotting using Plotly


 Python Plotly Library is an open-source library that can be used for data visualization and
understanding data simply and easily.
 Plotly supports various types of plots like line charts, scatter plots, histograms, box plots, etc.
 Plotly has hover tool capabilities that allow us to detect any outliers or anomalies in a large
number of data points.
 It is visually attractive that can be accepted by a wide range of audiences.
UNIT-5 Python for Data Visualization

 It allows us for the endless customization of our graphs that makes our plot more meaningful
and understandable for others.
 Plotly has some amazing features that make it preferable over other visualization tools or
libraries which are:
 Simplicity: The syntax is simple as each graph uses the same parameters.
 Interactivity: This allows users to interact with graphs on display, allowing for
a better storytelling experience. Zooming in and out, point value display, panning
graphs, and hovering over data values allow us to spot outliers or anomalies in
massive numbers of sample points.
 Attractivity: The visuals are very attractive and can be accepted by a wide range
of audiences.
 Customizability: It allows endless customization of graphs which makes plots
more meaningful and understandable for others. It allows users to personalize
graphs.
Modules of Plotly
1. Plotly Express: The plotly express module is used to produce professional and easy plots.
import plotly.express as px
2. Plotly Graph Objects: This is used to produce more complex plots than that of plotly
express .
import plotly.graph_objects as go

Example:
#Data Visualization using plotly
import plotly.express as px
# Creating the Figure instance
fig = px.line(x=[1, 2, 3], y=[1, 2, 3])
# showing the plot
fig.show()

Output:
UNIT-5 Python for Data Visualization

Interactive data visualization using Bokeh


 Bokeh is a data visualization library in Python that provides high- performance interactive
charts and plots.
 Bokeh output can be obtained in various mediums like notebook, html and server. It is possible
to embed bokeh plots in Django and flask apps.
 Bokeh is an interactive visualization library in python. The best feature which bokeh provides
is highly interactive graphs and plots that target modern web browsers for presentations.
 Bokeh helps us to make elegant, and concise charts with a wide range of various charts.
 Bokeh primarily focuses on converting the data source into JSON format which then uses as
input for BokehJS.
 Some of the best features of Bokeh are:
 Flexibility: Bokeh provides simple charts and customs charts too for complex use-
cases.
 Productivity: Bokeh has an easily compatible nature and can work with Pandas
and Jupyter notebooks.
 Styling: We have control of our chart and we can easily modify the charts by using
custom Javascript.
 Open source: Bokeh provide a large number of examples and idea to start with
and it is distributed under Berkeley Source Distribution (BSD) license.
UNIT-5 Python for Data Visualization

 With bokeh, we can easily visualize large data and create different charts in an attractive and
elegant manner.
Benefits of Bokeh:
 Bokeh allows to build complex statistical plots quickly and through simple commands
 Bokeh provides output in various medium like html, notebook and server
 We can also embed Bokeh visualization to flask and django app
 Bokeh can transform visualization written in other libraries like
matplotlib, seaborn, ggplot
 Bokeh has flexibility for applying interaction, layouts and different styling option to
visualization.
 Bokeh provides two visualization interfaces to users:
 bokeh.models : A low level interface that provides high flexibility to application
developers.
 bokeh.plotting : A high level interface for creating visual glyphs.
Visualization with Bokeh
 Bokeh offers both powerful and flexible features which imparts
simplicity and highly advanced customization.
 It provides multiple visualization interfaces to the user as shown below:

o Charts: a high-level interface that is used to build complex statistical plots as


quickly and in a simplistic manner.
o Plotting: an intermediate-level interface that is centered around
composing visual glyphs.
UNIT-5 Python for Data Visualization

o Models: a low-level interface that provides the maximum flexibility to application


developers.
Example:

#Data Visualization using Bokeh


from bokeh.plotting import figure, output_file, show
# instantiating the figure object
graph = figure(title = "Bokeh Line Graph")
# the points to be plotted
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
# plotting the line graph
graph.line(x, y)
# displaying the model
show(graph)

Output:
UNIT-5 Python for Data Visualization

Tutorial Questions:
1. What is matplotlib? Specify the two interfaces used by it with example python code.
2. Write python program to plot line plot by assuming your own data and explain the various
attributes of line plot.
3. Write python program to visualize the your own data using scatter plot and explain its
parameters.
4. Demonstrate the creating of line plot and scatter plot in matplotlib using different parameters
with corresponding python code.
5. Briefly explain about geographic data with basemap with example programs.
(or)Demonstrate different common map projections with example programs.
(or) Explain in details about the functions of mpl-toolkit for geographic data visualization.
6. Elaborate the usage of histogram for data exploration and explain its attributes.
7. Demonstrate Matplotlib functions that can be helpful to display three- dimensional data in two-
dimensions.
8. Explore Seaborn plots with example programs.
9. Elaborate error visualization methods in pyplot.
10. Showcase three dimensional drawing in matplotlib with corresponding python code.
(or) Discuss in detail about the three- dimensional plotting of matplot module.
11. Appraise the following i)Histogram ii) Binning iii) Density with appropriate python code.
12. Outline the data analysis using StatsModels with example programs.
13. Outline graph plotting using Plotly.
14. Describe interactive data visualization using Bokeh.

You might also like