Topic 2-3 OES - Visualisation - Part2
Topic 2-3 OES - Visualisation - Part2
part II-2
Pivot tables
and
charts for continuous variables
What you know so far
• Translating (quantifying) a business question into analytics
measures (indicators) by defining outcome and explanatory
variables and confounding factors
• Understand the structure of raw data in EXCEL (variables,
observations, cells),
• Generating new variables (IF, IFS, AND, OR),
• Generating data tables (COUNTIF, COUNTIFS,
AVERAGEIF, etc.)
• Plotting and customising column, bar, pie, doughnut and
treemap charts.
Today
• Generating data tables via pivot tables
• Understanding charts to visualise continuous variables
Generating data tables with Pivot
tables
• In addition to functions, many (but not all) data tables can be generated via
Pivot tables
• Prepare the data before using Pivot tables! Define numerical values where
calculations are required
• Go to INSERT Pivot table, and choose the data range (select the columns
and rows)
• A PIVOT TABLE FIELDS window opens
• Populate the ROWS field via drag-and-drop of the variables, ROWs means the
rows in the data table not the raw data
• Row works well for any categorical variable but not for continuous
variables
• Chose the VALUES field via drag and drop of the variables
• Count is the default value
• Choose the statistic ( click on the variable in VALUES and chose the
VALUE FIELD SETTING) such as sum and averages – mind the definition
of your variable
• Chose the COLUMN field via drag and drop of the variables if you want a cross
tabulate
• You can also condition the output table on variables and values using the
Histograms and Pareto Charts
• For showing the distribution of a continuous variable
• Histograms separate continuous variables into bins, or
categories of similar length, it is a kind of column chart for
continuous variables with no space between columns
• Histograms sort the bins by group value (from youngest to
oldest for example)
• Pareto charts sort bins by outcome value (by number of
observations per bins) and provide an cumulative density
plot
• Examples
• Number of employees
per age category
• Test scores distribution
• Length of videos
Scatter plots
• Exploring correlations between two continuous variables
• Scaling options for the axis might be needed to illustrate the
key point
• Consider linear or polynomial trend lines
• Keep in mind: correlation is not causation (more on this in
the correlation and regression session)
• Examples
• Sale price by profit
margin
• Sales by tenure
Data tables for histograms, pareto
charts, and scatter plots
• These chart types use the raw data and you need to select
the variable columns only
• Histogram and Pareto charts plot the distribution of one
continuous variable, which is in most cases your outcome
variable
• Scatter plats plot the distribution between two continuous
variables; plot the outcome variable on the y-axis and the
explanatory variable on the x-axis
Line charts
• Visualising trends over time, among others
• The outcome variable is continuous and the explanatory
variable is time (in most cases)
• You can show several trends, maybe consider scaling
options
• Growth rates are changes relative to a starting value
• You can add linear or polynomial trendlines
• Examples:
• Sales over time
• Absences over time
• Quits and layoffs over time
Area Charts
• Used to show changes in compositions within a variable
over time
• A kind of a stacked column chart, technically using bins of a
continuous variable (composition within bins) with each
column representing for example a point in time and no
space between columns
• Clever colour schemes needed if categories overlap
• Examples
• Sales by department by
month
• Absences per
department by week
• Profit margins by
product type by month
Data tables for line and area charts
• Both types of charts use a data-table similar to column
charts but the explanatory variable (rows in data table, x-
axis in chart) is typically a time variable
• The number of columns determine the number of lines, or
areas
• Area charts might need some Jan
Dep 1
3
Dep 2
3
Dep 3
1
tinkering with the colour schemes Feb 2 1 0
Mar 0 3 1
to avoid overlapping areas Apr 1 5 4
May 1 5 2
Jun 0 3 1
Jul 0 4 2
Aug 1 7 2
Sep 5 4 1
Oct 0 0 1
Nov 2 0 0
Dec 7 1 1
Bubble charts
• This is a multi-dimensional
plot, that can add a third
(or even fourth) dimension
to a two-dimensional chart
• It is an extension of scatter
plots where you add a third
dimension which is symbolised by the size of the bubble
• In specific cases, you can use colour to add a fourth
dimension. For example illustrate a few categories of a
fourth variable
• Examples
• product price (x) by profit margin (y) showing the market
share of the product (bubble size), maybe adding the
region (Europe vs. America) as colour of the bubbles
• Department size (y), average age (y) and proportion of
women (bubble)
Data table for bubble charts
• Technically, bubble charts work like scatter plots and you
can choose the columns of the raw data but mind that all
three columns have to entail continuous variables
• Unfortunately, bubbles become quite messy if you add too
many datapoints, hence, aggregated data are easier to
display
• In most cases, you need to redesign the data entry (right
mouse click and than `Select Data’, click on edit and choose
the x, y and z columns
• Similarly, axis scaling needs adjustment frequently
Task 3: Investigating retention
policies
The HR director asks you to develop a performance dashboard
visualising the employee retention pattern over the past year in
the company using a dataset of employees who left the firm
during the last year (dataset: retention policies). Based on this,
she also asks you for a concept for an ideal retention dashboard
and thoughts about additional data requirements.
1. Describe outcome variable(s) and explanatory variable(s) for
three charts for a retention dashboard based on the
available data
2. Plot the three dashboards (charts) and interpret the charts
3. Chose the most relevant dashboard (chart) and explain your
choice
4. Describe potential confounding factors for the chart chosen
in 3. and explain potentially biased/ misleading interpretation
5. Sketch/ describe an ideal retention dashboard (outcome
variable(s), explanatory variable(s)) and explain