Summarising and Analysing Data
Summarising and Analysing Data
$
Market 1 100,000
Market 2 149,000
Market 3 51,000
Total 300,000
In a pie chart representing the proportion of the sales made in each market,
what would be the angle of the section representing Market 3?
61 degrees
120 degrees
50 degrees
17 degrees
a) Describe the two types of data: categorical (nominal and ordinal) and numerical
(continuous and discrete). [S]
Nominal
Categorical
Ordinal
Data
Discrete
Numerical
Continuous
Categorical data
Nominal data are type of data that is used to label variables without providing any quantitative value. The
data cannot be ordered.
Example: Gender (0 = male, 1 = female)
Ordinal data are a kind of categorical data with a set order or scale to it
Example: Consumer satisfaction (not satisfied, somewhat not satisfied, neutral, somewhat satisfied,
satisfied)
Numerical Data
Discrete data are data which can only take on a finite or countable number of values within a given range.
Example: Number of workers (100, 250, 1000)
Inferential analysis: Drawing conclusions and/or making decisions concerning a population based only sample
data
Example: Is there any differences between electricity cost in Factory A and Factory B?
c) Calculate the mean, mode and median for ungrouped data and the mean for
grouped data. [S]
d) Calculate measures of dispersion including the variance, standard
deviation and coefficient of variation both grouped and ungrouped data
The number of daily complaints to a local government office has a mean of 12 and a
standard deviation of 3 complaints.
What is the coefficient of variation as a %?
%
The results of a chemistry examination are normally distributed with a mean score of
56 and a standard deviation of 12.
What is the percentage probability that a student will score more than 80?
%
Which TWO of the following are feasible values for the correlation coefficient?
+ 1.40
+ 1.04
0
- 0.94
The following statements relate to the advantages that linear regression analysis has
over the high-low method in the analysis of cost behaviour:
(1) The reliability of the analysis can be statistically tested
(2) It takes into account all of the data
(3) It assumes linear cost behaviour
Which of the above statements are TRUE?
2 and 3 only
1 and 2 only
1 only
1, 2 and 3
h) Describe the five characteristics of big data (volume, variety, velocity, value and
veracity).[K]
i) Explain the three types of big data: structured, semi-structured and unstructured.
[K]
j) Describe the main uses of big data and analytics for organisations.
Identify whether each of the following statements about the uses of big data
analytics in organisations is TRUE or FALSE.
. True False
It helps to analyse the efficiency of business processes in real time
It helps to better understand customer behaviour and preferences
Big Data
LO:
a) Describe the five characteristics of big data (volume, variety, velocity, value and veracity)
b) Explain the three types of big data: structured, semi-structured and unstructured
Variety: Big Data can include much more than simply financial information and can include other
organisational data which is operational in nature as well as other internal and external information. This data
can be both structured and unstructured in nature
Structured data – for example, a bank will hold a record of all receipts and payments (date, amount and
source) for a customer.
Unstructured data – can make up 80% of business data but is more difficult to store and analyse
Velocity: The data must be turned into useful information quickly enough to be of use in decision making and
performance management (in real time if possible). The sheer volume and variety of data makes this task
difficult and sophisticated methods are required to process the huge volumes of non-uniform data quickly
Value: Having access to big data is all well and good but that’s only useful if we can turn it into a value
LO:
c) Describe the main uses of big data and analytics for organisations
The processing of Big Data is known as Big Data analytics. For example, Google Analytics tracks many
features of website traffic
Mean
The arithmetic mean, also known as the ‘average’, is calculated by dividing the sum of the values in question by
the number of values
A shopkeeper is about to put his shop up for sale. As part of the details of the business, he wishes to quote the
average weekly sales. The sales in each of the last 6 weeks are:
Week 1 2 3 4 5 6
Sales $1,120 $990 $1,040 $1,030 $1,105 $1,015
Determine the mean weekly sale.
The following distribution shows the number of employees absent per day for a company over a 22 day period.
No of employees absent No. of days (frequency)
2 2
3 4
4 3
5 4
6 3
7 3
8 3
Find the arithmetic mean for the above distribution
Median
The median is defined as the middle of a set of values, when arranged in ascending (or descending) order. The
median can be used to overcome any issues of skewed data.
The previous illustration we saw that a shop's weekly sales were given by the following sample over six weeks.
Week 1 2 3 4 5 6
Sales $1,120 $990 $1,040 $1,030 $1,105 $1,015
The sample has an arithmetic mean of $1,050. A prospective buyer of the business notices that the mean is
higher than the sales in four of the 6 weeks.
Calculate the median for him
Mode
The mode or modal value of a data set is that value that occurs most often.
Learning outcomes:
b) Calculate measures of dispersion including the variance, standard deviation and coefficient of variation both
grouped and ungrouped data
Standard deviation
The standard deviation (σ) is a way of measuring how far away on average the data points are from the mean.
They measure average variability about the mean.
Ungrouped data
Grouped data
Coefficient of variation
The coefficient of variation is a statistical measure of the dispersion of data points in a data series around the
mean.
Learning outcomes:
c) Calculate expected values for use in decision-making
An expected value is a weighted average value of the different possible outcomes from a decision, where
weightings are based on the probability of each possible outcome
Expected values indicate what an outcome is likely to be in the long term if the decision can be repeated
many times over.
Example 1:
Merry joins a gamble by tossing a coin. Merry will get Rp100.000 if the coin toss result in head. However, if the
coin toss result in tail, Merry need to pay Rp50.000. What is the expected value?
Head (Kepala) / Tails (Ekor) = 50:50
Expected value?
Example 2:
A company has recorded the following daily sales over the last 200 days:
Daily sales (units) Number of days
100 40
200 60
300 80
400 20
What will be the expected sales level in the future? 240 units
Example 3:
An entity must make a decision between three options, A, B and C. The possible profits and losses are:
1. Option A: a profit of $2,000 with probability 0.5 or otherwise a loss of $500
2. Option B: a profit of $800 with probability 0.3 or otherwise a profit of $500
3. Option C: a profit of $1,000 with probability 0.7, or $500 with probability 0.1 or otherwise a loss of $400
Using EV, which option should be chosen? Option A
Option A
Outcome Probability Outcome x probability
2.000 0.5 1.000
-500 (1-0.5) = 0.5 -250
Expected value 750
Option B
Outcome Probability Outcome x probability
800 0.3 240
500 (1-0.3) = 0.7 350
Expected value 590
Option C
Outcome Probability Outcome x probability
1.000 0.7 700
500 0.1 50
-400 (1-0.7-0.1) = 0.2 -80
Expected value 670
Learning outcomes:
d) Explain the properties of a normal distribution
Learning outcomes:
e) Interpret normal distribution graphs and tables
RST is a food producer, specialising in dried fruit and nuts. The dried fruit and nuts are prepared within the
factory and packed into small bags which are sold as snacks in supermarkets.
The weights of the snack bags are normally distributed with a mean weight of 70g and a standard deviation of
5g.
RST can use normal distribution to calculate the probabilities that a bag selected at random would be of an
acceptable weight.
The food producer would like to know the probability that the bag selected at random weighs less than 60g
z is the z score
x is the value being considered
μ is the mean
σ is the standard deviation
60−70 −10
z= = =−2
5 5
Step 2: Looking up the normal distribution tables
Therefore, the probability that a bag selected at random weighs less than 60g = (0.5 – 0.4772) = 0.0228 or 2.3%.
Exercise:
A machine produces components with diameter of mean 5 cm and standard deviation 0.1 cm. The production of
this component follows a normal distribution.
What proportion of the components produced will have diameters of the following dimensions?
1) between 5 and 5.2 cm
2) over 5.15 cm
3) between 4.8 and 5.1 cm
A class has score of Accounting exam with a mean of 75 and a variance of 100. 1)
1) What is upper quartile of this distribution? (i.e What is the minimum score to be best 25% in this class?)
2) What is the minimum score to be best 10% in this class?