0% found this document useful (0 votes)
15 views

Lect10 DataViz

Uploaded by

nishu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lect10 DataViz

Uploaded by

nishu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Photo: Unsplash

Topic Ten: Data Visualisation


BUSN5101: Programming for Business

Tristan W. Reed
UWA Business School
Acknowledgement
of country
The University of Western Australia acknowledges that its campus is
situated on Noongar land, and that Noongar people remain the
spiritual and cultural custodians of their land, and continue to
practise their values, languages, beliefs and knowledge.

Artist: Dr Richard Barry Walley OAM


What is Data Visualisation?

• Data visualisation is the presentation of data (generally in) a graphical format or


representation, used to aid in the understanding of the data and its distribution.
• For example: trends, patterns, outliers and other ways the data is distributed;
• It is one very effective (although not always the most appropriate – depends on context)
method for communicating the findings of data analysis that you have undertaken;
• Visualisations need to work with the data – explaining it simply and succinctly, without
introducing any (unintentional) bias to communicate the findings effectively;
• Some examples of visualisations that may be used are tables, charts, maps and graphs.
Data Visualisation Tools

• There are many ways that we can visualise data, not just with Python and its libraries:
• “Desktop” BI platforms such as Tableau and Power BI – connected or imported data sources;
• Online vis. platforms (very similar to BI, just no analysis) such as Google Chart and Infogram;
• Other programming languages and packages/libraries (JavaScript, Python, R, Matlab etc.);
• “General purpose programs” such as Microsoft PowerPoint and Microsoft Excel;
• Pen and paper – drawing charts, maps and graphs!

• Generally we can taxonomise (group) the various types of visualisations as follows:


• Static (a single picture) and dynamic (a video or animation);
• Fixed (unchanging) and interactive (consider Power BI, web etc.);
4
Visualisation with Python

• Unsurprisingly, we have yet another (well, multiple) external packages / modules that
we can use with Python to visualise our data – namely matplotlib, also others:
• Primarily, it is a Python-based visualisation package and plotting library built upon numpy;
• To this end, there are two ways for us to utilise matplotlib within Python: object-oriented style
(creating figure objects and applying methods to them) and interactive (‘automated’) style;
• We’ll be focusing on using the object-oriented style as we generally write files of code that are
easily repeatable and understandable – although most things can be achieved either way.

5
Other Python Packages

• Besides matplotlib, there exists many other external packages which can be installed
(using pip) which extend it to enable more complex visualisations:
• seaborn: additional types of visualisations, higher level interface for developers to draw
(complex) statistical graphics using matplotlib and pandas;
• basemap: plots base maps with administrative and land boundaries and related methods;
• ggplot: a port (translation) of the ggplot library from R to Python, utilised similarly;

• Other non-matplotlib related packages also exist for interactive (web) plots which
can consume data fed by matplotlib and/or seaborn:
• See bokeh and plotly for some (beautiful) examples of this!
6
Photo: Unsplash

Topic Ten: matplotlib


BUSN5101: Programming for Business

Tristan W. Reed
UWA Business School
matplotlib Visualisation Hierarchy

• We have a three-level hierarchy that describes how visualisations are constructed


utilising the matplotlib library:
• Figure: the Figure is the base object of our visualisation, the object that contains the graphic
that will be output as an image;
• Axes: each Figure contains one or more sets of Axes, which represent the (x and y) axes of a
single plot – generally utilised with the subplots() method to define plots within a figure;
• Artists: these are used to create and adjust other components within the Axes (or Figure)
such as titles, labels, colours, shapes and so forth.

8
Getting Started

• We must first ensure that we have the Matplotlib external package on our system:
• pip install matplotlib

• After that, we can import it at the top of our Python code file to use it:
• import matplotlib.pyplot as plt

9
matplotlib Figure Components

10
Figure Setup

• We must therefore set up a Figure (denoted as the_fig) to work on before we do


anything else – within this, we will have our Axes and Artists:
• Must first import the library: import matplotlib.pyplot as plt
• Create the figure: the_fig = plt.figure(figsize = (w, l), dpi = 300)
where w and l are the dimensions (in inches!) and dpi is the resolution (generally 300 is fine);

• Once the Figure is created, the Axes can be created and Artists applied as well.
• Quite often ‘tight layout’ (reduced spacing) will be implemented: plt.tight_layout()
• We can show a figure (print to the screen) using plt.show()
• We can save a Figure to a file: the_fig.savefig(filename, dpi = 300, format =
‘png’) which will save it to the ‘filename’ specified.
11
Axes Setup

• We can add a set of Axes (we generally have just one, but don’t have to) to our Figure
(denoted as the_fig) as follows:
• the_ax = the_fig.subplots(nrows = 1, ncols = 1)

• We can utilise methods on the Axes (the_ax) to set up things such as the minimum and
maximum values, titles, labels and similar (more can be seen in the documentation):
• the_ax.set_title(title): set the title of the axes to title;
• the_ax.set_xlabel(label): set the x-axis label to label (same goes for y);
• the_ax.set_ylim(): set the (visible) limits of the y-axis between min and max (same for x);
• the_ax.legend(): add in a legend describing each data series;
12
A simple (single plot) example

import matplotlib.pyplot as plt

x = list(range(10))
y = list(range(0, 20, 2))
z = list(range(20, 40, 2))

the_fig = plt.figure(figsize = (3.5, 2.5))


the_ax = the_fig.subplots()

the_ax.plot(x, y, “r”, label = “set_1”)


the_ax.plot(x, z, “b”, label = “set_2”)

the_ax.set_title(“Number”)
the_ax.set_xlabel(“X”)
the_ax.set_ylabel(“Y”)
the_ax.set_xlim((0, 12)) # Tuple, not list!
the_ax.set_ylim((0, 50))
the_ax.legend()
13
plt.show()
A single (dual plot) example

import matplotlib.pyplot as plt the_ax[1].set_xlim(0, 12)


the_ax[1].set_ylim(0, 50)
x = list(range(10)) the_ax[1].legend()
y = list(range(0, 20, 2))
z = list(range(20, 40, 2)) plt.tight_layout()
plt.show()
the_fig = plt.figure(figsize = (3.5, 2.5))
the_ax = the_fig.subplots(2, 1,
sharex = True)

the_ax[0].plot(x, y, “r--”, label = “Y”)


the_ax[1].plot(x, z, “ob”, label = “Z”)

the_ax[0].set_title(“set_1”)
the_ax[0].set_xlabel(“X”)
the_ax[0].set_ylabel(“Y”)
the_ax[0].set_xlim(0, 12) # This is OK!
the_ax[0].set_ylim(0, 25) # Consistency?
the_ax[0].legend()

the_ax[1].set_title(“set_2”)
the_ax[1].set_xlabel(“X”)
the_ax[1].set_ylabel(“z”) 14
Plotting Methods

• To plot non-line graphics, we need to use the specific method with our Axes:
• Scatter plots: the_ax.scatter()
• Bar charts: the_ax.bar()
• Pie charts: the_ax.pie()
• Histograms: the_ax.hist()

• Other methods also exist, however we will not cover them today (check the docs!)

15
Scatter Plot

import numpy as np
import matplotlib.pyplot as plt

x = list(range(100)
y = list(range(100)) + 50 * np.random.random(100)

the_fig = plt.figure(figsize = (5, 5))


the_ax = the_fig.subplots()

the_ax.scatter(x, y, color = ‘r’)


the_ax.set_xlabel(“X”)
the_ax.set_ylabel(“Y”)

plt.show()
16
Photo: Unsplash

Topic Ten: Other Figures


BUSN5101: Programming for Business

Tristan W. Reed
UWA Business School
Bar Chart

import matplotlib.pyplot as plt


x = [“a”, “b”, “c”, “d”]
y = [20, 4, 10, 15]

the_fig = plt.figure(figsize = (5, 5))


the_ax = the_fig.subplots()

the_ax.bar(x, y, label = “Score”)


the_ax.set_xlabel(“Code”)
the_ax.set_ylabel(“Value”)
the_ax.legend()

plt.tight_layout()
plt.show() 18
Pie Chart – Be Careful!

import matplotlib.pyplot as plt

x = [2, 4, 10, 15]

the_fig = plt.figure(figsize = (5, 5))


the_ax = the_fig.subplots()

label = [“x1”, “x2”, “x3”, “x4”]

the_ax.pie(x, labels = label, autopct = “%.1f%%”)


the_ax.legend()

plt.tight_layout()
plt.show()
19
Histogram

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(1000)
the_fig = plt.figure(figsize = (5, 5))

the_ax = the_fig.subplots()

the_ax.hist(x, label = “random data”, color = “r”)


the_ax.set_xlabel(“Value”)
the_ax.set_ylabel(“Frequency”)

the_ax.legend()

plt.tight_layout()
plt.show() 20
The third way – with pandas?

• We can apply a .plot() method to a DataFrame object we have previously created


to ‘automatically’ plot it utilising matplotlib.
• We will need to specify arguments to it to determine things such as which column of the
DataFrame to use and to configure the previous options that we have with matplotlib.
• Arguably, it makes life easier all round to do it this way compared to the previous
methodology – everything can be supplied in just a single call to create the Figure.
• We can use plt.gca() to get the current Axes or plt.gcf() to get the current
Figure to do the adjustments that we have previously described.

21
pandas Plotting Example

import pandas as pd
import matplotlib.pyplot as plt

the_df = pd.DataFrame({
'length': [1.5, 0.5, 1.2, 0.9, 3],
'width': [0.7, 0.2, 0.15, 0.2, 1.1]
}, index = ['pig', 'rabbit', 'duck', 'chicken', 'horse’])

plot = the_df.plot(title = "DataFrame Plot")

22
pandas Series Plotting Example

import pandas as pd
import matplotlib.pyplot as plt

the_srs = pd.Series([1, 2, 3, 3])


plot = the_srs.plot(kind = 'hist', title = "My plot")

23
The End: Thank You
Any Questions? Ask via email ([email protected])

24

You might also like