1 Intro To Stat & Data Presentation
1 Intro To Stat & Data Presentation
INTRODUCTION TO STATISTICS
Collect Data
Summarize Organize
Data Data
STATISTICS
Analyse
Present Data
Data
Interpret
Data
PURPOSES OF STATISTICS
• Statistical techniques are used extensively by marketing managers,
accountants, consumers, educators, politicians, physicians, etc.
• Statistical techniques are used to make many decisions that affect our
lives. Regardless what your future line of work is, you will make decisions
that involved data.
Chapter 1 – Page 1
Reasons for learning statistics:
Chapter 1 – Page 2
TYPES-OF VARIABLES
Variables measure the characteristics of the population that the researcher wants
to study.
Variable
Discrete Continuous
Chapter 1 – Page 3
DATA PRESENTATION
Raw data
• Data collected that have not been organized or processed are called
raw data.
• When every observed value of the random variable is listed, the data are
called ungrouped data.
• Grouping is one of the most common methods of organizing data. When
we group data, we are actually constructing frequency distributions for the
raw data.
Frequency Distribution
• A frequency distribution is a table in which possible values for a variable
are grouped into non–overlapping classes, and the number of observed
values which fall into each class is recorded.
• Data organized in a frequency distribution are called grouped data.
Example
The frequency distribution below represents the number of books read by
500 students in a school for one year:
Chapter 1 – Page 4
• Classes / class intervals set up should be non-overlapping and no double
counting.
• Commonly, the number of classes is between 5 to 15.
• Use equal class sizes / widths whenever possible.
• Sometimes, FREQUENCY is modified/given as PROPORTION or
PERCENTAGE.
* Exclusive class type is mainly used for continuous data or discrete data
which have been rounded to the nearest tens, hundreds, thousands, millions
etc.
** Inclusive class type is mainly used for discrete data where there is a gap
between classes.
*** An open-ended class size is assumed to be the same with the class size
of the nearest (immediate neighbor) class.
Chapter 1 – Page 5
Example
The following is a record of the number of books borrowed per week in the
library for 30 weeks: -
21 47 64 42 89 76 55 100 75 67
89 15 97 25 35 12 92 36 93 34
87 27 74 21 66 25 47 10 89 30
Solution:
The variable is the number of books borrowed per week which is discrete.
Number of classes is set to be 5.
Frequency distribution for the number of books borrowed per week in the
library for 30 weeks:
Chapter 1 – Page 6
Example
The amount of rainfall (in cm) for a small town was recorded for the month
of December.
Construct a grouped frequency distribution for the data using suitable class
size.
Solution:
Chapter 1 – Page 7
Basic components of a frequency distribution:
Class limits
the smallest and largest possible measurements in each class, i.e. the
upper and lower limits are known as class limits.
Class boundaries
the dividing lines (walls) between successive classes.
Class size / class width = upper class boundary – lower class boundary.
1
Class mark = x = (lower class limit + upper class limit)
2
or
1
Class mark = x = (lower class boundary + upper class boundary)
2
Chapter 1 – Page 8
Example
Class Class boundaries Class size Class mark, x
10 - 29 9.5 – 29.5 29.5 – 9.5 =20 19.5
30 - 49 29.5 – 49.5 49.5 – 29.5=20 39.5
50 - 69 49.5 – 69.5 69.5 – 49.5=20 59.5
70 - 89 69.5 – 89.5 89.5 – 69.5=20 79.5
90 - 109 89.5 – 109.5 109.5 –89.5=20 99.5
Example
Class Class boundaries Class size Class mark, x
19 – < 22 19 – 22 22 – 19 = 3 20.5
22 – < 25 22 – 25 25 – 22 = 3 23.5
25 – < 28 25 – 28 28 – 25 = 3 26.5
28 – < 31 28 – 31 31 – 28 = 3 29.5
31 – < 34 31 – 34 34 – 31 = 3 32.5
Chapter 1 – Page 9
Histogram
• is a graphical representation of the frequency distribution. A bar is drawn
for each class and the area of each bar is proportional to the class
frequency. The bars are drawn adjacent to another.
• The x-axis shows either the class BOUNDARIES or the class MID-POINT.
The y-axis shows the frequency.
• For frequency distribution with equal class size, the height of each bar
is drawn proportional to the actual frequency of each class and the width
of each bar extends from the lower class boundary to the upper class
boundary of the class.
Example
Construct a histogram for the frequency distribution of the number of books
borrowed per year in the library by 30 students:
Chapter 1 – Page 10
Solution:
Example
Construct a histogram for the frequency distribution the amount of rainfall in
the month of December:
Chapter 1 – Page 11
Solution:
Example
Construct a histogram for the frequency distribution of sales of 46 branches
of a company in one week.
Sales (units) No. of branches
0 – 99 10
100 – 199 18
200 – 299 8
300 – 499 6
500 – 699 4
Solution:
No. of branches Class Class *Adjusted
Sales (units)
(frequency) boundaries size frequency
0 – 99 10 -0.5 – 99.5 100 10
100 – 199 18 99.5 – 199.5 100 18
200 – 299 8 199.5 – 299.5 100 8
300 – 499 6 299.5 – 499.5 200 3
500 – 699 4 499.5 – 699.5 200 2
Chapter 1 – Page 12
• The term skewness is used to describe the shape of a frequency
distribution.
Chapter 1 – Page 13
Cumulative Frequency Distribution
• Given a frequency distribution, a cumulative frequency distribution can be
derived by the addition of the frequencies of the successive classes.
• There are two types of cumulative frequency distributions: “Less than”
and “More than”. In this course, the “Less than” cumulative frequency
distribution is used.
Example
Number of Number of Class ‘<’ Cum. Freq. table
books weeks (freq.) boundaries No. of books Cum. freq.
< 9.5 0
10 – 29 8 9.5 – 29.5 < 29.5 8
30 – 49 7 29.5 – 49.5 < 49.5 15
50 – 69 4 49.5 – 69.5 < 69.5 19
70 – 89 7 69.5 – 89.5 < 89.5 26
90 – 109 4 89.5 – 109.5 < 109.5 30
Example
Amount of rainfall Number of Class ‘<’ Cum. Freq. table
(cm) days (freq.) boundaries Amount of rainfall (cm) Cum. freq.
< 19 0
19 - < 22 4 19 – 22 < 22 4
22 - < 25 10 22 – 25 < 25 14
25 - < 28 12 25 – 28 < 28 26
28 - < 31 4 28 – 31 < 31 30
31 - < 34 1 31 – 34 < 34 31
Chapter 1 – Page 14
The “Less than” Cumulative Frequency Polygon (Ogive)
is a line chart of a cumulative frequency distribution that shows the
cumulative frequency less than the upper class boundary plotted against
the upper class boundary of a class.
Example
The following table shows the output produced by 20 employees in an hour
in a factory.
Output (units) Number of employees
1–5 1
6 – 10 2
11 – 15 3
16 – 20 9
21 – 25 5
Construct a ‘less than’ cumulative frequency distribution and plot a ‘less than’
cumulative frequency polygon. Then, estimate
Solution:
Output Number of Class ‘<’ Cum. Freq. table
(units) employees (freq.) boundaries Output (units) Cum. freq.
1–5 1
6 – 10 2
11 – 15 3
16 – 20 9
21 – 25 5
Chapter 1 – Page 15
'<' Ogive of output produced by 20 employees
20
18
16
14
Cumulative Frequency
12
10
0
0.5 5.5 10.5 15.5 20.5 25.5
Output
Chapter 1 – Page 16
(iii) 90% of the employees are producing more than x units
→ 10% of the other employees (10% x 20= 2 employees) are
producing less than x units.
From the ‘<’ cum. Freq. polygon, x = 8 units.
(i) Data collection and pre-processing are always the first steps of BA
project.
(ii) Data often need to be collected, cleansed and combined with other
sources, - not all current and historical data stored contains all the
information required for a certain analysis.
(iii) Descriptive analytics – data will be analyzed and patterns
(insight/information) are found.
Chapter 1 – Page 17
(iv) Predictive analytics – Insight found from predictive phase used in this
phase to predict what is likely to happen in the future, if the situation
remain the same.
(v) Prescriptive analytics – alternative decisions are determined that change
the situation and which will lead to desirable outcomes.
(vi) – Decision has to be implemented, this requires various skills such as
knowledge of change management.
– Some of the steps above need to be repeated depending on the
outcome. For eg. If predictions are not accurate enough for a particular
application, then extra data is required to improve them.
– Not all BA projects include all the steps above. For eg. Prescriptive
analytics are not included if the project achieve prediction goal. The
project finish after descriptive or predictive steps.
Example:
A hotel chain analyzes its reservations to look for patterns: which are the busiest
days of the week? What is the impact of events in the city? Is there a seasonal
pattern? Etc. The outcomes are used to make a prediction for the revenue in the
upcoming months. By changing the pricing of the rooms in certain situations (such
as sports events or school holiday), the expected revenue can be maximized.
Chapter 1 – Page 18
AAMS1773 QUANTITATIVE STUDIES
Tutorial 1: Introduction to Statistics and Data Presentation
(a) Tabulate the above data in the form of a frequency distribution, using 160 -
<165 as the first class, 165 - <170 as the second class and so on.
(b) Draw a histogram for the above data.
(c) Construct a “less than” cumulative frequency distribution.
(d) Draw a “less than” cumulative frequency polygon.
(e) Using the graph in part (d), estimate:
(i) the height which will be exceeded by 25% of the employees.
(ii) the number of employees who have heights less than 175 cm.
(iii) the proportion of employees who have heights exceeding 175 cm.
Chapter 1 – Page 19
3. The following table shows the gross profit of a random sample of 500 small
companies in a year.
4. The following data shows the number of rejects from the assembly line of a
local manufacturer recorded for a period of 80 days:
Number of rejects Number of days
0–4 1
5–9 14
10 – 14 23
15 – 19 20
20 – 24 16
25 – 29 6
Chapter 1 – Page 20
5. The following cumulative frequency distribution shows the duration of each
telephone call made by an employee recorded for a period of one month:
(a) Draw the cumulative frequency polygon for the above distribution.
(b) Use the graph to estimate:
(i) the number of calls that lasted between 5 and 10 minutes;
(ii) the duration not exceeded by 90% of the calls.
(c) Redraft the above data in the form of frequency distribution and construct a
histogram.
Answers:
2. (e) (i) 185.5 cm. (ii) 23 (iii) 0.7294
3. (c) (i) 100 (ii) 0.865
4. (b) (i) 26.5 days (ii) 24 rejects
5. (b) (i) 68 calls (ii) 14 min.
Chapter 1 – Page 21