0% found this document useful (0 votes)
97 views

Cs3353 Foundations of Data Science Unit V

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Cs3353 Foundations of Data Science Unit V

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT V DATA VISUALIZATION

Importing Matplotlib – Line plots – Scatter plots – visualizing errors


– density and contour plots –Histograms – legends – colors – subplots
– text and annotation – customization – three dimensional plotting -
Geographic Data with Basemap - Visualization with Seaborn.

1. Data Visualization
 Data visualization is the practice of translating information into a visual context, such as a
map or graph, to make data easier for the human brain to understand and pull insights
from. The main goal of data visualization is to make it easier to identify patterns, trends and
outliers in large data sets.
o The process of finding trends and correlations in our data by representing it
pictorially is called Data Visualization.

Why is data visualization important?


Human memory can remember a picture better than words. We can process visuals 60,000 times
faster compared to text.

The raw data undergoes different stages within a pipeline, which are:
 Fetching the Data
 Cleaning the Data Data visualization is the graphical representation of
 Data Visualization information and data in a pictorial or graphical format
 Modeling the Data (Example: charts, graphs, and maps).
 Interpreting the Data
 Revision

Data visualization is an easy and quick way to convey concepts to others. Data visualization has
some more specialties such as:
 Data visualization can identify areas that need improvement or modifications.
 Data visualization can clarify which factor influence customer behaviour.
 Data visualization helps you to understand which products to place where.
 Data visualization can predict sales volumes.

Merits of using Data Visualization


 To make easier in understanding and remembering.
 To discover unknown facts, outliers, and trends.
 To visualize relationships and patterns quickly.
 To make better decisions.
 To competitive analyse.
 To improve insights.

General Types of Visualizations


 Chart: Information presented in a tabular, graphical form with data displayed along two
axes. Can be in the form of a graph, diagram, or map.
 Table: A set of figures displayed in rows and columns.
 Graph: A diagram of points, lines, segments, curves, or areas that represents certain
variables in comparison to each other, usually along two axes at a right angle.
Unit V CS3352 Foundations of Data Science 1
 Geospatial: A visualization that shows data in map form using different shapes and colors to
show the relationship between pieces of data and specific locations.
 Infographic: A combination of visuals and words that represent data. Usually uses charts or
diagrams.
 Dashboards: A collection of visualizations and data displayed in one place to help with
analyzing and presenting data.

1.1 Python in Data visualization


Python provides various libraries that come with different features for visualizing data. All these
libraries come with different features and can support various types of graphs.
 Matplotlib
 Seaborn
 Bokeh
 Plotly

1.2 Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-
platform data visualization library built on NumPy arrays and designed to work with the broader
SciPy stack. It was introduced by John Hunter in the year 2002. One of the greatest benefits of
visualization is that it allows us visual access to huge amounts of data in easily digestible visuals.
Matplotlib consists of several plots like line, bar, scatter, histogram etc.

Installation :
Run the following command to install matplotlibpackage :
python -mpip install -U matplotlib

import matplotlib
Once Matplotlib is installed, import it in your applications by adding the import module statement:
from matplotlib import pyplot as plt
or
import matplotlib.pyplot as plt

matplotlib Version
The version string is stored under __version__ attribute.
import matplotlib Output
print(matplotlib.__version__) 3.4.3

MatplotlibPyplot
Most of the Matplotlib utilities lies under the pyplotsubmodule, and are usually imported under the
plt as:
import matplotlib.pyplot as plt
Now the Pyplot package can be referred to as plt.

1.3 Pyplot Simple


The plot() function draws a line from point to point and it takes two parameters – plot(x,y). Parameter
1 is an array containing the points on the x-axis & Parameter 2 is an array containing the points on the

Unit V CS3352 Foundations of Data Science 2


y-axis. If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to
the plot function.
/* Python program to plot line using matplotlib */ Output :

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y)
plt.show()

Note:
Points plotted are {[5,10], [2,5], [9,8], [4,4], [7,2]}

/* Python program to plot line using numpy arrays */ Output :

import matplotlib.pyplot as plt


import numpy as np

x = np.array([0, 6])
y = np.array([0, 25])
plt.plot(x, y)
plt.show()

Markers
You can use the keyword argument marker to emphasize each point with a specified marker with
markersize = 15.
/* Python program to show marker */ Output :

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,marker ='o', markersize=15)
plt.show()

Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line

Output :
/* Python program to show linestyle */

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,linestyle='dotted',marker='*')
plt.show()

Note:
linestyle = 'dashed'
plt.plot(x,y,ls ='dashed',marker='*')

Create Labels for a Plot


Use the xlabel() and ylabel() functions to set a label for the x- and y-axis.
/* Python program to show xlabel,ylabel,title */ Output :

Unit V CS3352 Foundations of Data Science 3


import matplotlib.pyplot as plt
x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,ls ='dashed',marker='*')
plt.title('Adhiparasathi Engineering College')
plt.xlabel('This is CSE class')
plt.ylabel('Foundations of Data Science')

plt.show()

Add Grid Lines to a Plot


With Pyplot, you can use the grid() function to add grid lines to the plot.
/* Python program to add Grid Lines */ Output :

import matplotlib.pyplot as plt


x = [5, 2, 9, 4, 7]
y = [10, 5, 8, 4, 2]
plt.plot(x,y,marker = 'o')
plt.grid()
plt.show()

Display Multiple Plots


With the subplot() function you can draw multiple plots in one figure. The subplot() function takes
three arguments. First and second arguments are rows and columns and the third argument
represents the index of the current plot.

/* Python program to show multiple plots */ Output :

import matplotlib.pyplot as plt

#plot 1:
x = [0, 1, 2, 3]
y = [3, 8, 1, 10]
plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:
x = [0, 1, 2, 3]
y = [10, 20, 30, 40]
plt.subplot(2, 1, 2)
plt.plot(x,y)
Note:
plt.show() plt.subplot(2, 1, 1)
It means 2 rows , 1 column, and this
plot is the first plot.

plt.subplot(2, 1, 2)
It means 2 rows, 1 column, and this
plot is the second plot.

1.4 Matplotlib Scatter


With Pyplot, we can use the scatter() function to draw a scatter plot. The scatter() function plots one
dot for each observation. It needs two arrays of the same length, one for the values of the x-axis, and
Unit V CS3352 Foundations of Data Science 4
one for values on the y-axis. The scatter() method takes in the following parameters:
 x_axis_data - An array containing x-axis data
 y_axis_data - An array containing y-axis data
 s- marker size (can be scalar or array of size equal to size of x or y)
 c - color of sequence of colors for markers
 marker- marker style
 cmap - cmap name
 linewidth s- width of marker border
 edgecolor - marker border color
 alpha- blending value, between 0 (transparent) and 1 (opaque)

/* Python program to create scatter plots*/ Output :

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y=[99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()

/* Python program to create scatter plots*/ Output :

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y=[99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y, marker='*', c='red', s=200,
edgecolor='black' )

plt.show()

/* Python program to create scatter plots & color each Output :


dot*/

import matplotlib.pyplot as plt


x1 = [26, 29, 48, 64, 6]
y1 = [26, 34, 90, 33, 38]
colors=["red","green","blue","yellow","violet”]
plt.scatter(x1, y1, c = colors, s = 200)
plt.show()

/* Python program to create scatter plots*/ Output :

import matplotlib.pyplot as plt


# dataset-1
Unit V CS3352 Foundations of Data Science 5
x1 = [89, 43, 36, 36, 95, 10]
y1 = [21, 46, 3, 35, 67, 95]
plt.scatter(x1, y1, c ="pink", marker
="s", edgecolor ="green", s =50)

# dataset2
x2 = [26, 29, 48, 64, 6]
y2 = [26, 34, 90, 33, 38]
plt.scatter(x2, y2, c ="yellow", marker
="^", edgecolor ="red", s =200)
plt.show()

Add a legend to a scatter plot in Matplotlib


/* Python program to add legends*/ Output :

import matplotlib.pyplot as plt

x1 = [89, 43, 36, 36, 95, 10]


y1 = [21, 46, 3, 35, 67, 95]
plt.scatter(x1, y1, c ="pink", marker
="s", edgecolor ="green", s =50)

x2 = [26, 29, 48, 64, 6]


y2 = [26, 34, 90, 33, 38]
plt.scatter(x2, y2, c ="yellow", marker
="^", edgecolor ="red", s =200)

# apply legend() Note:


plt.legend(["supply" , "sales"]) plt.legend(["supply" , "sales"], ncol = 2 , loc
plt.show() = "lower right")
/* Python program to add legends*/ Output :

import matplotlib.pyplot as plt


x1 = [26, 29, 48, 64, 6]
y1 = [26, 34, 90, 33, 38]
plt.scatter(x1, y1, c ="yellow", marker
="^", edgecolor ="red", s =200)

x2 = [89, 43, 36, 36, 95, 10]


y2 = [21, 46, 3, 35, 67, 95]
plt.scatter(x2, y2, c ="pink", marker
="s", edgecolor ="green", s =50)
plt.legend(["supply" , "sales"])

plt.title("Scatter Plot Demo ",


fontsize=22)
plt.xlabel('FODS',fontsize=20)
plt.ylabel('II-CSE',fontsize=20)
plt.show()

ColorMap
The Matplotlib module has a number of available colormaps. A colormap is like a list of colors,
where each color has a value that ranges from 0 to 100. This colormap is called 'viridis' and as you
can see it ranges from 0, which is a purple color, up to 100, which is a yellow color.
Unit V CS3352 Foundations of Data Science 6
How to Use the ColorMap?
Specify the colormap with the keyword argument cmap with the value of the colormap, in this
case 'viridis' which is one of the built-in colormaps available in Matplotlib. In addition
create an array with values (from 0 to 100), one value for each point in the scatter plot. Some of the
available ColorMaps are Accent, Blues, BuPu, BuGn, CMRmap, Greens, Greys, Dark2 etc.

/* Python program to add color maps*/

import matplotlib.pyplot as plt


import numpy as np

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0,10,20,30,40,45,50,55,60,70,80,90,100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.show()

Output :

/* Python program to add color maps & color bar*/


Adding
colorbar() import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0,10,20,30,40,45,50,55,60,70,80,90,100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()
plt.show()

Output :

Size
We can change the size of the dots with the s argument. Just like colors, we can do for sizes.

/* Python program to Set your own size for the markers*/ Output :

Unit V CS3352 Foundations of Data Science 7


import matplotlib.pyplot as plt
import numpy as np

x = np.array([5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60])
colors=["red","green","blue","yellow","violet","purple"]
sizes = np.array([100,200,300,400,500,600])
plt.scatter(x, y, c = colors, s=sizes )
plt.show()
Alpha
Adjust the transparency of the dots with the alpha argument. Just like colors, make sure the array for
sizes has the same length as the arrays for the x- and y-axis.
/* Python program to Set alpha*/
Output :
import matplotlib.pyplot as plt
import numpy as np

x = np.array([5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60])
colors=["red","green","blue","yellow","violet","purple"
]
sizes = np.array([100,200,300,400,500,600])
plt.scatter(x,y,c=colors,s=sizes,alpha=0.5)
plt.show()

Create random arrays with 100 values for x-points, y-points, colors and sizes
/* Python program to create random arrays , random colors, Output :
random sizes*/
import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(100,size=(100))
y = np.random.randint(100,size=(100))
colors = np.random.randint(100,size=(100))
sizes = 10 * np.random.randint(100,size=(100))

plt.scatter(x, y, c = colors, s=sizes,


cmap='nipy_spectral',alpha = 0.5 )
plt.colorbar()
plt.show()

Unit V CS3352 Foundations of Data Science 8


1.4 Visualizing errors in Python using Matplotlib

Error bars function used as graphical enhancement that visualizes the variability of the plotted
data on a Cartesian graph. Error bars can be applied to graphs to provide an additional layer of
detail on the presented data.

Scatter plot Dot Plot

Bar chart Line plot

Error bars indicate estimated error or uncertainty. Measurement is done through the use of
markers drawn over the original graph and its data points. To visualize this information, error
bars work by drawing lines that extend from the centre of the plotted data point to reveal this
uncertainty of a data point.
A short error bar shows that values are concentrated signaling around the plotted value, while a
long error bar indicate that the values are more spread out and less reliable. The length of each
pair of error bars tends to be of equal length on both sides; however, if the data is skewed then
the lengths on each side would be unbalanced.

Error bars always run parallel to a quantity of scale axis so they can be displayed either vertically
or horizontally depending on whether the quantitative scale is on the y-axis or x-axis if there are
two quantities of scales and two pairs of arrow bars can be used for both axes.

Unit V CS3352 Foundations of Data Science 9


/* Python program to create random simple Output :
graph */

# importing matplotlib
import matplotlib.pyplot as plt

# making a simple plot


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]

# plotting graph
plt.plot(x, y)
/* Python program to add some error in y Output :
value in the simple graph */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]

# creating error
y_error = 0.2

plt.plot(x, y)
plt.errorbar(x, y, Note:
yerr = y_error, fmt is a format code controlling the appearance of
fmt ='o') lines and points
/* Python program to add some error in x Output :
value in the simple graph */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
plt.plot(x, y)
plt.errorbar(x, y,
xerr = x_error,
fmt ='o')
/* Python program to add various Output :
parameters & some error in x value */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
plt.plot(x, y, color = "red")
plt.errorbar(x, y, xerr=x_error,
fmt='o', color='black',
ecolor='green', elinewidth=3,
capsize=10);
/* Python program to add some error in x & Output :
y value in the simple graph */

import matplotlib.pyplot as plt


x =[1, 2, 3, 4, 5, 6, 7]
y =[1, 2, 1, 2, 1, 2, 1]
x_error = 0.5
y_error = 0.3
plt.plot(x, y, color = "red")
plt.errorbar(x, y,yerr = y_error,
xerr = x_error,
fmt='o',ecolor="green")

Unit V CS3352 Foundations of Data Science 10


/* Python program to add some error in Output :
scatter plot */

import matplotlib.pyplot as plt


x = [1, 3, 5, 7]
y = [11, -2, 4, 19]
plt.scatter(x, y, marker='*' )
c = [1, 3, 2, 1]
plt.errorbar(x, y, yerr=c, fmt="o",
ecolor= "black")
plt.show()

Bar Plot in Matplotlib


A bar plot or bar chart is a graph that represents the category of data with rectangular bars with
lengths and heights that is proportional to the values which they represent. The bar plots can be
plotted horizontally or vertically. A bar chart describes the comparisons between the discrete
categories. One of the axis of the plot represents the specific categories being compared, while
the other axis represents the measured values corresponding to those categories.
ax.bar(x, height, width, bottom, align)
The function returns a Matplotlib container object with all bars.

/* Python program to implement Bar Chart */ Output :


import matplotlib.pyplot as plt
import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x,y, color="red")
plt.show()

Following is a simple example of the Matplotlib bar plot. It shows the number of students enrolled for
various courses offered at an institute.
/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.bar(langs,students, color= "violet")
plt.show()

Bar Width
The bar() takes the keyword argument width to set the width of the bars. Default width value is 0.8

/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.bar(langs,students, color= "violet", width = 0.1)
plt.show()

Unit V CS3352 Foundations of Data Science 11


Bar Height
The barh() takes the keyword argument height to set the height of the bars: Note: For horizontal bars,
use height instead of width.

/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.barh(langs,students, color= "violet")
plt.show()

/* Python program to implement Bar Chart */ Output :

import matplotlib.pyplot as plt


langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
plt.barh(langs,students, color= "violet", height = 0.1
)
plt.show()

Plotting multiple bar charts using Matplotlib in Python


A multiple bar chart is also called a Grouped Bar chart. A Bar plot or a Bar Chart has many
customizations such as Multiple bar plots, stacked bar plots, horizontal bar charts. Multiple bar
charts are generally used for comparing different entities.
Example 1: Simple multiple bar chart
In this example we will see how to plot multiple bar charts using matplotlib, here we are plotting
multiple bar charts to visualize the number of boys and girls in each Group.

/* Python program to implement Bar Chart */ Output :

import numpy as np
import matplotlib.pyplot as plt

X = ['Group A','Group B','Group C','Group D']


girls = [10,20,20,40]
boys = [20,30,25,30]

X_axis = np.arange(len(X))
width = 0.25

plt.bar(X_axis - 0.2, girls, width, label = 'Girls')


plt.bar(X_axis + 0.2, boys, width, label = 'Boys')

plt.xticks(X_axis, X)
plt.xlabel("Groups")
plt.ylabel("Number of Students")
plt.title("Number of Students in each group")
plt.legend()
plt.show()

Unit V CS3352 Foundations of Data Science 12


 Importing required libraries such as numpy for performing numerical calculations with
arrays and matplotlib for visualization of data.
 The data for plotting multiple bar charts are taken into the list.
 The np.arange( ) function from numpy library is used to create a range of values. We are
creating the X-axis values depending on the number of groups in our example.
 Plotting the multiple bars using plt.bar( ) function.
 To avoid overlapping of bars in each group, the bars are shifted -0.2 units and +0.2 units
from the X-axis.
 The width of the bars of each group is taken as 0.4 units.
 Finally, the multiple bar charts for both boys and girls are plotted in each group.

Unit V CS3352 Foundations of Data Science 13

You might also like