Cs3353 Foundations of Data Science Unit V
Cs3353 Foundations of Data Science Unit V
1. Data Visualization
Data visualization is the practice of translating information into a visual context, such as a
map or graph, to make data easier for the human brain to understand and pull insights
from. The main goal of data visualization is to make it easier to identify patterns, trends and
outliers in large data sets.
o The process of finding trends and correlations in our data by representing it
pictorially is called Data Visualization.
The raw data undergoes different stages within a pipeline, which are:
Fetching the Data
Cleaning the Data Data visualization is the graphical representation of
Data Visualization information and data in a pictorial or graphical format
Modeling the Data (Example: charts, graphs, and maps).
Interpreting the Data
Revision
Data visualization is an easy and quick way to convey concepts to others. Data visualization has
some more specialties such as:
Data visualization can identify areas that need improvement or modifications.
Data visualization can clarify which factor influence customer behaviour.
Data visualization helps you to understand which products to place where.
Data visualization can predict sales volumes.
1.2 Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-
platform data visualization library built on NumPy arrays and designed to work with the broader
SciPy stack. It was introduced by John Hunter in the year 2002. One of the greatest benefits of
visualization is that it allows us visual access to huge amounts of data in easily digestible visuals.
Matplotlib consists of several plots like line, bar, scatter, histogram etc.
Installation :
Run the following command to install matplotlibpackage :
python -mpip install -U matplotlib
import matplotlib
Once Matplotlib is installed, import it in your applications by adding the import module statement:
from matplotlib import pyplot as plt
or
import matplotlib.pyplot as plt
matplotlib Version
The version string is stored under __version__ attribute.
import matplotlib Output
print(matplotlib.__version__) 3.4.3
MatplotlibPyplot
Most of the Matplotlib utilities lies under the pyplotsubmodule, and are usually imported under the
plt as:
import matplotlib.pyplot as plt
Now the Pyplot package can be referred to as plt.
Note:
Points plotted are {[5,10], [2,5], [9,8], [4,4], [7,2]}
x = np.array([0, 6])
y = np.array([0, 25])
plt.plot(x, y)
plt.show()
Markers
You can use the keyword argument marker to emphasize each point with a specified marker with
markersize = 15.
/* Python program to show marker */ Output :
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line
Output :
/* Python program to show linestyle */
Note:
linestyle = 'dashed'
plt.plot(x,y,ls ='dashed',marker='*')
plt.show()
#plot 1:
x = [0, 1, 2, 3]
y = [3, 8, 1, 10]
plt.subplot(2, 1, 1)
plt.plot(x,y)
#plot 2:
x = [0, 1, 2, 3]
y = [10, 20, 30, 40]
plt.subplot(2, 1, 2)
plt.plot(x,y)
Note:
plt.show() plt.subplot(2, 1, 1)
It means 2 rows , 1 column, and this
plot is the first plot.
plt.subplot(2, 1, 2)
It means 2 rows, 1 column, and this
plot is the second plot.
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y=[99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y=[99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y, marker='*', c='red', s=200,
edgecolor='black' )
plt.show()
# dataset2
x2 = [26, 29, 48, 64, 6]
y2 = [26, 34, 90, 33, 38]
plt.scatter(x2, y2, c ="yellow", marker
="^", edgecolor ="red", s =200)
plt.show()
ColorMap
The Matplotlib module has a number of available colormaps. A colormap is like a list of colors,
where each color has a value that ranges from 0 to 100. This colormap is called 'viridis' and as you
can see it ranges from 0, which is a purple color, up to 100, which is a yellow color.
Unit V CS3352 Foundations of Data Science 6
How to Use the ColorMap?
Specify the colormap with the keyword argument cmap with the value of the colormap, in this
case 'viridis' which is one of the built-in colormaps available in Matplotlib. In addition
create an array with values (from 0 to 100), one value for each point in the scatter plot. Some of the
available ColorMaps are Accent, Blues, BuPu, BuGn, CMRmap, Greens, Greys, Dark2 etc.
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0,10,20,30,40,45,50,55,60,70,80,90,100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.show()
Output :
Output :
Size
We can change the size of the dots with the s argument. Just like colors, we can do for sizes.
/* Python program to Set your own size for the markers*/ Output :
x = np.array([5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60])
colors=["red","green","blue","yellow","violet","purple"]
sizes = np.array([100,200,300,400,500,600])
plt.scatter(x, y, c = colors, s=sizes )
plt.show()
Alpha
Adjust the transparency of the dots with the alpha argument. Just like colors, make sure the array for
sizes has the same length as the arrays for the x- and y-axis.
/* Python program to Set alpha*/
Output :
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60])
colors=["red","green","blue","yellow","violet","purple"
]
sizes = np.array([100,200,300,400,500,600])
plt.scatter(x,y,c=colors,s=sizes,alpha=0.5)
plt.show()
Create random arrays with 100 values for x-points, y-points, colors and sizes
/* Python program to create random arrays , random colors, Output :
random sizes*/
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randint(100,size=(100))
y = np.random.randint(100,size=(100))
colors = np.random.randint(100,size=(100))
sizes = 10 * np.random.randint(100,size=(100))
Error bars function used as graphical enhancement that visualizes the variability of the plotted
data on a Cartesian graph. Error bars can be applied to graphs to provide an additional layer of
detail on the presented data.
Error bars indicate estimated error or uncertainty. Measurement is done through the use of
markers drawn over the original graph and its data points. To visualize this information, error
bars work by drawing lines that extend from the centre of the plotted data point to reveal this
uncertainty of a data point.
A short error bar shows that values are concentrated signaling around the plotted value, while a
long error bar indicate that the values are more spread out and less reliable. The length of each
pair of error bars tends to be of equal length on both sides; however, if the data is skewed then
the lengths on each side would be unbalanced.
Error bars always run parallel to a quantity of scale axis so they can be displayed either vertically
or horizontally depending on whether the quantitative scale is on the y-axis or x-axis if there are
two quantities of scales and two pairs of arrow bars can be used for both axes.
# importing matplotlib
import matplotlib.pyplot as plt
# plotting graph
plt.plot(x, y)
/* Python program to add some error in y Output :
value in the simple graph */
# creating error
y_error = 0.2
plt.plot(x, y)
plt.errorbar(x, y, Note:
yerr = y_error, fmt is a format code controlling the appearance of
fmt ='o') lines and points
/* Python program to add some error in x Output :
value in the simple graph */
plt.bar(x,y, color="red")
plt.show()
Following is a simple example of the Matplotlib bar plot. It shows the number of students enrolled for
various courses offered at an institute.
/* Python program to implement Bar Chart */ Output :
Bar Width
The bar() takes the keyword argument width to set the width of the bars. Default width value is 0.8
import numpy as np
import matplotlib.pyplot as plt
X_axis = np.arange(len(X))
width = 0.25
plt.xticks(X_axis, X)
plt.xlabel("Groups")
plt.ylabel("Number of Students")
plt.title("Number of Students in each group")
plt.legend()
plt.show()