Data Analytics For Decision Making
Data Analytics For Decision Making
Welcome to the course! We hope you’re excited to get started. On this first step you’ll find some
important information on the learning outcomes, course design and a brief background for our
educators from Bond University, Dr. Adrian Gepp and James Todd.
Learning Outcomes
Course Design
This course is split into two weeks. In the first, you will learn about methods for describing
important characteristics of data through application of graphical techniques and descriptive
statistics. You will also hear about why ethics is an important topic when performing data
analysis and reporting results. In the second week, you will be introduced to the idea of
probability, random variables, how we can describe them, and how we can make decisions even
in the face of randomness. Finally, you’ll hear about the environment in which data analytics is
happening today from Professor Bruce Vanstone, a Professor of Data Science at Bond
University.
Your’ Educators
Adrian has more than a decade of experience in teaching at a tertiary level that spans
undergraduate, postgraduate and online education in data analytics, economics and finance.
Adrian’s primary research interest is in applying big data and advanced statistical modelling to
reveal unique insights about problems of economic and social importance, including fraud
detection and business failure prediction. His research has attracted more than $500,000 in
external funding and he has more than 40 peer-reviewed research outputs.
James is a PhD candidate and teacher at Bond University. His previous studies have included a
Bachelors and Honours degree in Actuarial Science, graduating as Valedictorian and with First
Class Honours respectively. He has taught subjects in financial, statistical, and data analytics
areas at both undergraduate and postgraduate levels. Beyond experience as a research assistant in
various areas, his own research focuses on the practical application of data science tools in
healthcare systems and processes, with three peer-reviewed journal articles published.
Often people consider effective data analytics to be complicated, but that is definitely not always
the case. The goal of data analytics in business is to learn about the characteristics of the data to
make decisions. The better we understand it, the better we achieve that goal. The goal isn’t to
make sure that no one understands our analysis. Visualising data is often the most effective way
of describing it for human beings, because sight is such an important part of how we interact
with the world. A good graph can make insights obvious and doesn’t require the viewer to
decipher a complicated calculation. Even if it isn’t always the final result of an analysis, it often
provides an important starting point.
In the next activity, Adrian will talk through several useful graphical techniques. He will discuss
what type of data is appropriate for each technique and how to interpret the visualisations with
examples. You will learn about pie charts, bar charts, histograms, line plots, and scatter plots.
While pie and bar charts are useful for Visualising data where only a few values are possible
(such as hair colors or product names), we need a different tool for continuous data where many
values are possible.
This is where histograms are useful. In this video, Adrian shows their use with the example of
phone bills, but I’m sure you can think of other examples as well.
Share in the comments below some other examples of histograms you can think of or are familiar
with.
Next, we’ll look at two final graphical techniques for inspecting two variables and the
relationships between them. These are line plots and scatter plots…
Line Plots and Scatter Plots
We now have our last two plots. Line plots are valuable tools for visualising how a variable
changes across time and is one of the most common plots you will see.
We often care not just about the current state of a variable, such as profit or sales, but also about
how it has evolved. Has there been high growth? Is it stable? These are examples of questions
which line plots help us to answer.
Scatter plots are similar in that they visualise two variables at once that might be related to one
another. They are a useful exploratory tool to assess whether there might be relationships
between variables. For example, when I spend more on advertising do I see higher sales?
Hopefully you are comfortable with the ideas behind these and the other graphical techniques
now. If not, feel free to double back and watch a topic video again or ask your fellow learners
any questions you have in the comments below.
Once you’re ready, in the next step we’ll take it one step further and talk about what makes a
“good” graph.
Good Graphs
What makes a good graph?
It’s easy to spot a bad graph.
I’m sure everyone has seen graphs before that only made the data more confusing. A quick
search will turn up many. Sometimes they are simply hard to decipher, due to a bad choice of
graphical technique or because of poor labelling. Other times, they may be misleading and result
in incorrect interpretation of the data. To effectively communicate our results, we need to focus
on making good graphs.
Constructing a good graph can be a bit of an art form, but there are some basic principles.
Applying these principles puts us on the path to a final graph that is effective in communicating
important facts about the data without being misleading. In the above video, Adrian outlines
some of these principles. While we adhere to these principles in this course, others might not by
accident or to be deliberately misleading. When we talk about the role of ethics in analysis
(Activity five) we’ll show some examples of really bad graphs.
You’ve now finished with this first activity! Hopefully you have a greater appreciation for
different types of visualisations, when they should be applied, and what we need to consider to
make a ‘good’ graph. You now have the opportunity to check your understanding with a quick
quiz.
From there, we’ll step through these same graphical techniques and examples to demonstrate
how you can also construct them using Excel.
Now, in this activity, you’ll learn how to construct them yourself. We’ll be revisiting the
examples shown and going through their creation step-by-step. Unfortunately, FutureLearn
doesn’t support the sharing of Excel files within the comments. If you have your own data
available, however, we’d suggest you try experimenting and following along.
You can share your work with other learners and view examples from others on this course using
our Padlet wall here. Please do not share inappropriate materials or any personally identifiable
information, learners who do so will have their posts removed.
Actively doing a task helps to reinforce the lessons and ensures you get the most out of this
activity. If you don’t have access to Microsoft Excel, you could also use the free Google Sheets.
If you’re ready, Mark as Complete and move on to next step to get started with pie charts and
bar charts.
Remember to follow along with your own data or use the data in the video and share on our
Padlet wall if you wish to join in.
The choice of bins for the histogram has a big impact on the final result. Too wide will mean the
finer details of the data are obscured. Too narrow will mean that the bigger picture is hard to see.
You will normally experiment with different widths to see what happens. For a given problem
you’re unlikely to get it right the first time, but you can easily modify it in Excel until you find
the widths that work best!
You should now be comfortable with how to create bar charts, pie charts, histograms, line plots,
and scatter plots using Excel. I hope you’re excited to get to using those!
Descriptive Statistics
Often, we don’t struggle from a lack of data but instead too much of it. By having too much
information, it is difficult to identify important features of the data and then make data-driven
decisions. This is where descriptive statistics come in. Descriptive statistics can be used to
describe characteristics of vast quantities of data in just a few numbers, making them a useful
addition to your data analysis toolbox.
So what can descriptive statistics tell us? What insights can they provide? Well, we might care
about ‘typical’ customers, for which we can use measures of central location like the mean,
median and mode. We might care about how much variety there is in our data, for which we can
use measures of spread like variance or standard deviation. We might care about if we tend to
have a few really large or really small values, for which we can use measures of shape like
Skewness. Finally, we might care about how related two different variables might be, for which
we can use measures of association like correlation.
In this activity, you’ll learn about all of these measures. You’ll learn what they are, how they are
calculated, how to interpret them, and how to generate them using Excel.
Let’s discuss
When have you used descriptive statistics or seen them used effectively?
Measures of Spread - Range, Variance and Standard
Deviation
Measures of Spread
Variance and standard deviation are the most common measures of how spread out data is.
While the range of the data is also a measure of spread, it ignores a lot of important information
we normally want to capture. From this video, you should understand the relation between
standard deviation and variance as well as how we calculate them.
Discuss with your fellow learners any questions you may still have in the comments below.
In the next step, we’ll talk about how we can use and interpret our measure of spread once we
have calculated it.
68% of the data is within plus or minus one standard deviation around the mean
95% of the data is within plus or minus two standard deviations around the mean
99.7% of the data is within plus or minus three standard deviations around the mean
This result means that standard deviation is a very descriptive statistic! Now that we have some
tools for measuring spread, let’s look at how we measure shape in the next step.
Measures of Shape
We introduce the idea of ‘Skewness’ in this video.
Skewness tells us about the symmetry of the distribution of data. Skewness of 0 means that the
data is perfectly symmetric, so you can draw a vertical line and both sides of that line would look
identical. Positive and negative skew indicate that the distribution is not symmetric, with tails
either to the right or left.
In the next step, we demonstrate how to calculate the measures introduced so far using Excel.
Computing Descriptive Statistics in Excel
We’ve now covered various descriptive statistics, and now you should also be able to calculate
them yourselves using Excel.
Note that while most common statistics can be produced automatically using the analysis tool
pack in Excel, those values don’t automatically update. By calculating the statistics individually
with the easy to use Excel formulas, they will automatically update when you make changes to
your data. If you think there is a chance you will have to make changes to your data, then it’s
probably best to do it this way.
In the next step we’ll introduce the final descriptive statistic, which measures the association
between variables.
Measures of Association
In this video we have provided a new measure, demonstrated its calculation, and summarized our
statistics.
Our new measure of correlation is useful for determining whether there is a linear relationship
between two variables. One common pitfall encountered by analysts reporting correlation is to
confuse it with causation. It is very important to remember that a strong correlation does not
mean that one variable causes the other. Two variables might be correlated while actually having
very little to do with each other. The example given in the above video was of the strong
correlation between the number of films Nicholas Cage has appeared in and number of deaths by
drowning. Even if the correlation is strong, we don’t believe either of these are causing the other.
Many other examples of strong correlations between unrelated variables can be found here.
Now you have a collection of useful statistics in your data analyst toolbox. You can describe the
central location, spread, shape, and association of data. In addition to the graphical techniques
from activities one and two, you’re quickly gaining the key tools of data analysis.
Check your knowledge of this activity in the next step’s quiz before you move onto our final
activity for this week, where we discuss the importance of ethics in data analysis.
Ethics should be considered in being careful in how you do your analysis and interpreting the
results of others. It is increasingly relevant in today’s data-driven world. Relevant issues which
will be discussed in the following videos will include appropriate treatment of data which may
be sensitive, making sure outputs are faithful representations of the underlying data, and ensuring
conclusions from the analysis are suitably justified.
As budding data analysts, it will be your responsibility to ensure your own analysis is ethical. It
will also be your responsibility to draw attention to unethical, misleading and untrustworthy
analysis.
As economist Ronald Coase stated: “if you torture the data long enough, it will confess to
anything”.
Let’s discuss
Can you think of any real life examples where data has been misused or manipulated? Share your
thoughts in the comments below.
In the next step we discuss how data visualisation can also distort data and lead to incorrect
conclusions.
Throughout this course we’ve already talked about what we need to think about when making
good graphs, and here we’ve seen some examples of bad graphs. You’ve probably seen some
bad misleading graphs outside of our examples. An excellent collection of real-life examples can
be found here and I’d encourage everyone to have a look through them.
Week 1 Reflection
Well done on completing your first week of “Data Analytics for Decision Making: An
Introduction to Using Excel”.
We hope you’ve enjoyed it so far and have added some new skills to your analytical toolbox.
This week we’ve covered a lot. You learned about a variety of different graphs, when to use
them, and how to make them yourself using Excel. You learned about the major descriptive
statistics, where they are important, and again how to calculate them using Excel. Finally, you’ve
heard about why ethics are important to consider for any good data analyst.
Now that we are at the halfway point, it would be great to hear your reflections on the first
week.
Probability in Week 2
Welcome to week 2! Last week you saw graphical techniques, descriptive statistics, and heard
about ethics in analysis. This week will focus on how we can work with probability.
The key idea behind the content for this week is that in business and throughout life, we have to
make decisions under uncertainty. These decisions can be as small as whether we bring an
umbrella in case it rains to whether we invest in new technologies. Being able to work with this
uncertainty in a structured way is a vital skill. We will start in the next two steps by introducing
the idea of probability and talking about how we interpret probabilities.
In activity two, we’ll talk about how to calculate the probability of combinations of random
events occurring. In activity three, we’ll introduce the idea of discrete random variables and how
we can describe them with some familiar ideas. In activity four, we discuss how we can take our
understanding of probability and random variables to make decisions even when outcomes are
uncertain.
Activities one, two, three and four all relate to the primary goal for this week of being able to
work with uncertainty. We also believe it’s important to finish the course with a broad discussion
of today’s environment for data analytics. We live in an exciting time, where more and more data
is being stored and there is more and more need for data analytics to help make sense of it. In
activity five we will have the opportunity to hear about today’s environment from Bruce
Vanstone, Professor of Data Science at Bond University.
Introduction to Probability and Terminology
In this video, Adrian discusses the basic building blocks of probability as well as why it’s
important to understand the concept of probability.
Some terminology is also introduced so that we can be clearer when talking about the topic. In
the next video, we’ll finish our introduction by talking about how we assign probabilities to
events and how we interpret them.
If there is anything you’re unclear on, ask your fellow learners in the comments below.
You’ll likely most often be working from the empirical approach for assigning probabilities,
though in some examples the classical approach may be appropriate. In general, we try to avoid
the subjective approach unless we cannot apply either of the other two, as it’s harder to defend.
Now that we have a basic understanding of what probabilities are and how we assign them, we
are now ready to jump into using them. In the next activity we’ll demonstrate how we can work
out the probabilities of different combinations of events. See you there!
Event Relations
In the last activity, we introduced the idea of simple events. Now, we’ll turn our attention to
more complex events and look at how we can calculate probabilities for combinations of simple
events. Specifically, we’ll focus on how to answer questions in the form of:
These types of questions deal with relations between events, and we have a bit more terminology
about how we refer to them. You’ll hear about unions, intersections and complements in the next
video. Click through if you’re ready to get started!
Unions, Intersections and Complements
In the last activity we discussed how we assign probabilities to simple events. In this video we
look at how we can take these simple events and work out probabilities of more complicated
events. Together we have now examined the following event relations:
Unions: where we are interested in whether at least one of several events occur
Intersections: where we are interested in whether multiple events all occur
Complements: where we are interested in whether events do not occur
In the next step, Adrian will take us through a more detailed example.
Without using a formula, Adrian can calculate the desired probability through common sense.
The intuition displayed in this example is the basis of our formulas. Often it’s helpful to
approach probability problems involving multiple events this way, as our calculations should
always pass our logic checks.
How do you feel about this example? Can you think of your own example to share in the
comments below?
Now that we’ve got a better understanding of probability for both simple and more complicated
events, let’s move on. In the next activity, we’ll introduce discrete random variables and discuss
how we can work with them.
A random variable is simply a variable where the value is uncertain. For example, we may record
the number of customers entering a store each day. This is a variable. If we are interested in the
number of customers who will enter the store tomorrow, this is a random variable. It’s random
because the value is uncertain. It’s also a discrete random variable because there is a finite or
limited set of values it can take. We’ll describe the idea of random variables more in the next
video, but why do we need these?
We need to know what they are and how to deal with them because they pop up throughout
everyday life and business. The number of customers entering a store each day is just one
example, but it might be important to consider because it would influence the number of staff to
service them.
So if they are so common, how do we work with them? We’ll explore that question in the next
few videos as well!
It’s just a variable where the outcome or value is uncertain. That isn’t to say that you have no
idea what the value will be, just that it’s not guaranteed. A coin flip is a random variable in that
you don’t know what the outcome will be beforehand, but you do have knowledge of the
‘structure’ of the randomness. Only two outcomes are possible, and each has a 50% chance to
occur (for a fair coin).
Now that you’ve heard from Adrian talking about discrete random variables and we have a good
foundation, you should be ready for the next step where we will hear about how we calculate and
interpret the expectation of a random variable.
This is also important to measure for random variables. In this video Adrian re-introduces the
mean, or expectation now, for random variables. The expectation represents the long run average
of the random variable.
We now have tools for calculating expectation and variance for discrete random variables, with a
bit of help from Excel. We can also inspect shape by considering the skewness of our data
visually. Now that we have these, we can move onto our final practical topic for this course,
where we use these measures to make a business decision involving random variables.
Is there anything you’re struggling with, or any terms you don’t understand? Try and help other
learners with their questions if you can.
This is an example, but it isn’t a far-fetched one. You can imagine how similar scenarios of
choosing between two projects with uncertain outcomes might arise in businesses. To assess the
two projects, we are computing the expectation, variance and standard deviation of cash flows.
Let’s discuss
What other scenarios are you contemplating right now that you could use your Excel skills to
help decide? Feel free to share with your fellow learners in the comments!
In the next video, Adrian will discuss how we our calculated figures to determine which project
is more attractive.
To summarize, we prefer higher expected cash flows and lower variance. This example was nice
because one of the options had a higher expectation and lower variance. In the next video,
Adrian will discuss how we make a decision if one project has higher expected cash flows but
also higher variance.
Coefficient of Variation
To Lease or Not To Lease (Continued)
Coefficient of variation is our solution to the scenario when one project has better expected cash
flows but worse variance.
It provides a convenient measure of how much risk is taken on relative to the expected cash
flows. In this example, option X offered a higher expected return relative to its risk. So even
though it had a lower expectation it is the better project because of its substantially lower risk.
In the next step, we’ll summaries what we have learned so far this week before you start your
Test to check your understanding. If you pass the test and have marked each step as complete
along the way you’ll be eligible for our free Certificate of Achievement.
Next we have a short Test to assess your knowledge before you go into the last activity. In
activity five, we’ll hear from Professor Bruce Vanstone about the broader environment of data
analytics today. This is a fascinating topic in today’s rapidly changing world and it should give
you some idea as to how the skills you’ve learned in this course equip you to better face some
challenges faced by businesses.
Before you go into the Test and last activity, we want to briefly summarise what you’ve learned
in the past week and hear from you what you’ve made of it. This week you encountered the idea
of probability and uncertainty, maybe for the first time. As part of this week’s content you
learned about the following:
What simple events are and the three ways we can assign probabilities to events
How to use the probabilities assigned to simple events to work out probabilities for more
complicated events involving unions, intersections, and complements
What discrete random variables are and how we can describe them with measures of
central location, spread and Skewness
How to use our descriptive measures of random variables to make real business decisions
for projects even when outcomes are uncertain
How to use Excel to help us perform these tasks and keep the focus on the process rather
than complicated formulas
Adding all of this to our lessons from the first week, you’re almost ready to start performing your
own analysis or learn about more advanced analytical tools. All that is left is your Test and the
last activity. But before you start those, it would be great to hear from you once more in the
discussion below.
Let’s discuss
Is there a particular task you want to complete with your new skills? How might you apply what
you’ve learnt when you next perform a data related task in excel?
Here, we want to take a big step back away from learning specific methods and instead look at
the environment in which we are working as data analysts or working as a manager with data
analysts. The world is changing very fast and as you build your skills, it is important to
understand how you fit in. We hope that the following video helps with that. As a bit of a sneak
peek, the good news is that as businesses have increasingly realised their data are a very valuable
resource. Furthermore, with modern technology, businesses now have the ability to store and
potential to analyse massive amounts of data. The current need is for individuals and teams who
are able to make sense of it all. That puts data analysts, data scientists and data-savvy managers
in high demand.
It’s actually the subject that this online course is based on! As a result, you’ll notice that
Professor Vanstone refers to the content being topic 10. We felt that it was important information
to know even if you haven’t gone through the longer version of the course because it still applies
to you. So we’ve included it for you all as a special treat! If you want to learn all the information
in between the introduction to data analytics and this next video, then just contact the Bond
University team as you will benefit from completing the extended Data Analytics for Decision
Making subject offered by Bond.