Lect10 DataViz
Lect10 DataViz
Tristan W. Reed
UWA Business School
Acknowledgement
of country
The University of Western Australia acknowledges that its campus is
situated on Noongar land, and that Noongar people remain the
spiritual and cultural custodians of their land, and continue to
practise their values, languages, beliefs and knowledge.
• There are many ways that we can visualise data, not just with Python and its libraries:
• “Desktop” BI platforms such as Tableau and Power BI – connected or imported data sources;
• Online vis. platforms (very similar to BI, just no analysis) such as Google Chart and Infogram;
• Other programming languages and packages/libraries (JavaScript, Python, R, Matlab etc.);
• “General purpose programs” such as Microsoft PowerPoint and Microsoft Excel;
• Pen and paper – drawing charts, maps and graphs!
• Unsurprisingly, we have yet another (well, multiple) external packages / modules that
we can use with Python to visualise our data – namely matplotlib, also others:
• Primarily, it is a Python-based visualisation package and plotting library built upon numpy;
• To this end, there are two ways for us to utilise matplotlib within Python: object-oriented style
(creating figure objects and applying methods to them) and interactive (‘automated’) style;
• We’ll be focusing on using the object-oriented style as we generally write files of code that are
easily repeatable and understandable – although most things can be achieved either way.
5
Other Python Packages
• Besides matplotlib, there exists many other external packages which can be installed
(using pip) which extend it to enable more complex visualisations:
• seaborn: additional types of visualisations, higher level interface for developers to draw
(complex) statistical graphics using matplotlib and pandas;
• basemap: plots base maps with administrative and land boundaries and related methods;
• ggplot: a port (translation) of the ggplot library from R to Python, utilised similarly;
• Other non-matplotlib related packages also exist for interactive (web) plots which
can consume data fed by matplotlib and/or seaborn:
• See bokeh and plotly for some (beautiful) examples of this!
6
Photo: Unsplash
Tristan W. Reed
UWA Business School
matplotlib Visualisation Hierarchy
8
Getting Started
• We must first ensure that we have the Matplotlib external package on our system:
• pip install matplotlib
• After that, we can import it at the top of our Python code file to use it:
• import matplotlib.pyplot as plt
9
matplotlib Figure Components
10
Figure Setup
• Once the Figure is created, the Axes can be created and Artists applied as well.
• Quite often ‘tight layout’ (reduced spacing) will be implemented: plt.tight_layout()
• We can show a figure (print to the screen) using plt.show()
• We can save a Figure to a file: the_fig.savefig(filename, dpi = 300, format =
‘png’) which will save it to the ‘filename’ specified.
11
Axes Setup
• We can add a set of Axes (we generally have just one, but don’t have to) to our Figure
(denoted as the_fig) as follows:
• the_ax = the_fig.subplots(nrows = 1, ncols = 1)
• We can utilise methods on the Axes (the_ax) to set up things such as the minimum and
maximum values, titles, labels and similar (more can be seen in the documentation):
• the_ax.set_title(title): set the title of the axes to title;
• the_ax.set_xlabel(label): set the x-axis label to label (same goes for y);
• the_ax.set_ylim(): set the (visible) limits of the y-axis between min and max (same for x);
• the_ax.legend(): add in a legend describing each data series;
12
A simple (single plot) example
x = list(range(10))
y = list(range(0, 20, 2))
z = list(range(20, 40, 2))
the_ax.set_title(“Number”)
the_ax.set_xlabel(“X”)
the_ax.set_ylabel(“Y”)
the_ax.set_xlim((0, 12)) # Tuple, not list!
the_ax.set_ylim((0, 50))
the_ax.legend()
13
plt.show()
A single (dual plot) example
the_ax[0].set_title(“set_1”)
the_ax[0].set_xlabel(“X”)
the_ax[0].set_ylabel(“Y”)
the_ax[0].set_xlim(0, 12) # This is OK!
the_ax[0].set_ylim(0, 25) # Consistency?
the_ax[0].legend()
the_ax[1].set_title(“set_2”)
the_ax[1].set_xlabel(“X”)
the_ax[1].set_ylabel(“z”) 14
Plotting Methods
• To plot non-line graphics, we need to use the specific method with our Axes:
• Scatter plots: the_ax.scatter()
• Bar charts: the_ax.bar()
• Pie charts: the_ax.pie()
• Histograms: the_ax.hist()
• Other methods also exist, however we will not cover them today (check the docs!)
15
Scatter Plot
import numpy as np
import matplotlib.pyplot as plt
x = list(range(100)
y = list(range(100)) + 50 * np.random.random(100)
plt.show()
16
Photo: Unsplash
Tristan W. Reed
UWA Business School
Bar Chart
plt.tight_layout()
plt.show() 18
Pie Chart – Be Careful!
plt.tight_layout()
plt.show()
19
Histogram
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(1000)
the_fig = plt.figure(figsize = (5, 5))
the_ax = the_fig.subplots()
the_ax.legend()
plt.tight_layout()
plt.show() 20
The third way – with pandas?
21
pandas Plotting Example
import pandas as pd
import matplotlib.pyplot as plt
the_df = pd.DataFrame({
'length': [1.5, 0.5, 1.2, 0.9, 3],
'width': [0.7, 0.2, 0.15, 0.2, 1.1]
}, index = ['pig', 'rabbit', 'duck', 'chicken', 'horse’])
22
pandas Series Plotting Example
import pandas as pd
import matplotlib.pyplot as plt
23
The End: Thank You
Any Questions? Ask via email ([email protected])
24