0% found this document useful (0 votes)
45 views62 pages

Stat 02

The document provides an overview of descriptive statistics, focusing on frequency distributions, their construction, and graphical representations such as histograms, frequency polygons, and pie charts. It includes examples and explanations of concepts like class limits, midpoints, and relative frequencies, as well as instructions for an assignment on data visualization techniques. The aim is to help students understand how to organize and present data effectively.

Uploaded by

sniperaddieee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views62 pages

Stat 02

The document provides an overview of descriptive statistics, focusing on frequency distributions, their construction, and graphical representations such as histograms, frequency polygons, and pie charts. It includes examples and explanations of concepts like class limits, midpoints, and relative frequencies, as well as instructions for an assignment on data visualization techniques. The aim is to help students understand how to organize and present data effectively.

Uploaded by

sniperaddieee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Part 2 Statistics for IT

Descriptive Statistics
The Kyoto College of Graduate Studies for Informatics (KCGI)

Kwanjira Kaewfak(k_kaewfak@[Link])
© [Link] 2020
What You Should Learn

• How to construct a frequency distribution including limits,


midpoints, frequencies, relative frequencies, cumulative
frequencies, and boundaries.

• How to construct frequency histograms, frequency polygons,


relative frequency histograms, and ogives.

© [Link] 2020
Descriptive Statistics

Descriptive statistics is a statistical


measure used to describe data through
numbers, like mean, median and mode.

© [Link] 2020
Where We’re Going

frequency
histogram
distribution

© [Link] 2020
Frequency Distributions

• Important characteristics to look for when organizing and


describing a data set are
center
variability (or spread)
Shape

• Learn how to organize data sets by grouping the data into intervals
called class and forming a frequency distribution.

• Learn how to use frequency distribution to construct graphs.

© [Link] 2020
2-2

Frequency Distribution

 Frequency distribution: A grouping of data into


categories showing the number of observations in
each mutually exclusive category.

© [Link] 2020
Frequency Distribution

© [Link] 2020
Each class has a lower class limit, which is the least
number that can belong to the class, and an upper class
limit, which is the greatest number that can belong to
the class.

© [Link] 2020
Construction of a Frequency Distribution

Chart Title

question to collect data organizedata present data draw


beadressed (rawdata) (graph) conclusion

frequencydistribution

© [Link] 2020
Frequency Distribution

© [Link] 2020
Example of Frequency Distribution

© [Link] 2020
Example of Frequency Distribution

© [Link] 2020
Example of Frequency Distribution

© [Link] 2020
Example of Frequency Distribution

© [Link] 2020
2-4

Frequency Distribution

 Class mark (midpoint): A point that divides a


class into two equal parts. This is the average
between the upper and lower class limits.
 Class interval: For a frequency distribution having
classes of the same size, the class interval is
obtained by subtracting the lower limit of a class
from the lower limit of the next class.

© [Link] 2020
Additional Features of Frequency Distribution

© [Link] 2020
Additional Features of Frequency Distribution

© [Link] 2020
Additional Features of Frequency Distribution

The relative frequency could be written as a fraction, decimal,


or percent. The sum of the relative frequencies of all the classes
should be equal to 1, or 100%.
Due to rounding, the sum may be slightly less than or greater
than 1. So, values such as 0.99 and 1.01 are sufficient.

© [Link] 2020
2-5

EXAMPLE 1
 Dr. Tillman is the dean of the school of business and
wishes to determine the amount of studying business
school students do. He selects a random sample of
30 students and determines the number of hours
each student studies per week: 15.0, 23.7, 19.7, 15.4,
18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3,
13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0,
17.8, 33.8, 23.2, 12.9, 27.1, 16.6.
 Organize the data into a frequency distribution.

© [Link] 2020
EXAMPLE 1 continued
Consider the first two classes. The class marks are 10 and 15. The class
interval is 5 (12.5-7.5).

Hours studying Frequency, f


7.5 up to 12.5 1
12.5 up to 17.5 12
17.5 up to 22.5 10
22.5 up to 27.5 5
27.5 up to 32.5 1
32.5 up to 37.5 1
© [Link] 2020
Example

© [Link] 2020
Example

© [Link] 2020
Example

© [Link] 2020
Excel Function of Frequency

=FREQUENCY (data_array, bins_array)


Returns a frequency distribution, which is a
summary table that shows the frequency of each
value in a range.

Returns multiple values and must be entered as an


array formula with control-shift-enter.

© [Link] 2020
2-8

Stem-and-Leaf Displays

 Stem-and-Leaf Display: A statistical


technique for displaying a set of data. Each
numerical value is divided into two parts:
the leading digits become the stem and the
trailing digits the leaf.
 Note: An advantage of the stem-and-leaf
display over a frequency distribution is we
do not lose the identity of each observation.

© [Link] 2020
2-9

EXAMPLE 2
 Colin achieved the following scores on his twelve
accounting quizzes this semester: 86, 79, 92, 84,
69, 88, 91, 83, 96, 78, 82, 85. Construct a stem-
and-leaf chart for the data.

stem leaf
6 9
7 89
8 234568
9 126

© [Link] 2020
2-10
Graphic Presentation of a Frequency
Distribution
 The three commonly used graphic forms are histograms,
frequency polygons, and a cumulative frequency
distribution (ogive).
 Histogram: A graph in which the classes are marked on
the horizontal axis and the class frequencies on the
vertical axis. The class frequencies are represented by
the heights of the bars and the bars are drawn adjacent
to each other.

© [Link] 2020
Graphs of Frequency Distributions

© [Link] 2020
2-11
Graphic Presentation of a Frequency
Distribution

 A frequency polygon consists of line


segments connecting the points formed by
the class midpoint and the class frequency.
 A cumulative frequency distribution (ogive) is
used to determine how many or what
proportion of the data values are below or
above a certain value.

© [Link] 2020
Frequency Polygon

A frequency polygon is a line graph that emphasizes the


continuous change in frequencies.

© [Link] 2020
Frequency Polygon

© [Link] 2020
2-12

Histogram for Hours Spent Studying

14
12
Frequency

10
8
6
4
2
0
10 15 20 25 30 35
Hours spent studying

© [Link] 2020
2-13
Frequency Polygon for Hours Spent
Studying

14
12
10
Frequency

8
6
4
2
0
10 15 20 25 30 35
Hours spent studying

© [Link] 2020
2-14
Less Than Cumulative Frequency
Distribution For Hours Studying

35
30
25
20
Frequency
15
10
5
0
10 15 20 25 30 35
Hours Spent Studying

© [Link] 2020
Class Boundaries
Class boundaries are the numbers
that separate classes without
forming gaps between them.
If data entries are integers,
subtract 0.5 from each lower limit
to find the lower class boundaries.
To find the upper class boundaries,
add 0.5 to each upper limit.
The upper boundary of a class
will equal the lower boundary of
the next higher class.
© [Link] 2020
Example of Frequency Histogram

© [Link] 2020
2-15

Bar Chart

 A bar chart can be used to depict any of the levels of


measurement (nominal, ordinal, interval, or ratio).
 EXAMPLE 3: Construct a bar chart for the number of
unemployed people per 100,000 population for selected
cities of 1999.

© [Link] 2020
EXAMPLE 3 continued

City Number of unemployed


per 100,000 population
Atlanta, GA 7300
Boston, MA 5400
Chicago, IL 6700
Los Angeles, CA 8900
New York, NY 8200
Washington, D.C. 8900
© [Link] 2020
2-17

Bar Chart for the Unemployment Data

10000 8900 8900


# unemployed/100,000

8200
8000 7300
6700
Atlanta
6000 5400
Boston
4000 Chicago
Los Angeles
2000 New York
Washington
0
1 2 3 4 5 6
Cities

© [Link] 2020
2-18

Pie Chart

 A pie chart is especially useful in displaying a


relative frequency distribution. A circle is
divided proportionally to the relative frequency
and portions of the circle are allocated for the
different groups.
 EXAMPLE 4: A sample of 200 runners were
asked to indicate their favorite type of running
shoe.

© [Link] 2020
2-19

EXAMPLE 4 continued

 Draw a pie chart based on the following information.

Type of shoe # of runners


Nike 92
Adidas 49
Reebok 37
Asics 13
Other 9

© [Link] 2020
2-20

Pie Chart for Running Shoes

Asics
Reebok
Other
Nike
Adidas
Reebok
Adidas Asics
Other
Nike

© [Link] 2020
What You Should Learn

• How to graph and interpret quantitative data sets using


stem-and-leaf plots and dot plots

• How to graph and interpret qualitative data sets using pie


charts and Pareto charts

• How to graph and interpret paired data sets using scatter


plots and time series charts

© [Link] 2020
Stem-and-Leaf Plot
A stem and leaf plot is a special table
where each data value is split into a
"stem" (the first digit or digits) and a
"leaf" (usually the last digit).
Note:
1. Has as many leaves as
there are entries in the
original data set.
2. The leaves should be
single digits.

Stem "1" Leaf "5" means 15


[Link] [Link] 2020
Properties of a Stem-and-Leaf Plot

Similar to a histogram but has the


advantage that the graph still contains data
values.
Another advantage of a stem-and-leaf plot
is that it provides an easy way to sort data.

[Link] [Link] 2020


Example of the Stem-and Leaf Plot

© [Link] 2020
Example of the Stem-and Leaf Plot

© [Link] 2020
Dot Plot

In a dot plot, each data entry is plotted, using a point, above a


horizontal axis.
A dot plot allows you to see how data are distributed,
determine specific data entries, and identify unusual data
values.

© [Link] 2020
Graphing Qualitative Data Sets

A pie chart is a circle that is divided


into sectors that represent categories.
The area of each sector is
proportional to the frequency of
each category.

© [Link] 2020
Example of the Pie Chart

© [Link] 2020
Example of the Pie Chart

© [Link] 2020
Graphing Qualitative Data Sets
A Pareto chart is a vertical bar graph in which the
height of each bar represents frequency or relative
frequency.

The bars are positioned in order of decreasing


height, with the tallest bar positioned at the left.
Such positioning helps highlight important data
and is used frequently in business.

© [Link] 2020
Example of the Pareto Chart

© [Link] 2020
Example of the Pareto Chart

© [Link] 2020
Graphing Paired Data Sets

When each entry in one data set corresponds to one entry in a


second data set, the sets are called paired data sets.
eg. the costs of an item and the sales amounts for it.

One way to graph paired data sets is to use a scatter plot, where
the ordered pairs are graphed as points in a coordinate plane.
A scatter plot is used to show the relationship between two
quantitative variables.

© [Link] 2020
Example of the Scatter Plot

© [Link] 2020
Example of the Scatter Plot

© [Link] 2020
Graphing Paired Data Sets
A data set that is composed of
quantitative entries taken at regular
intervals over a period of time is called a
time series.
eg. the amount of precipitation
measured each day for one month is
a time series.
Use a time series chart to
graph a time series.

[Link] © [Link] 2020


Example of a Time Series Chart

© [Link] 2020
Example of a Time Series Chart

© [Link] 2020
Assignment 2 – Deadline 10/17 (11:59 pm)
Title: Exploring Frequency Distributions and Data Visualization

Objective: The aim of this assignment is to apply concepts learned about frequency distributions, stem-and-leaf plots,
histograms, and other data visualization techniques. You will work with a real dataset to demonstrate your understanding.

Instructions:
Data Collection:
Choose a dataset with at least 30 numerical entries. This can be from any domain (e.g., hours of study, sales figures, or survey
results).
Frequency Distribution:
Organize your data into a frequency distribution.
Clearly identify class limits, midpoints, class boundaries, and relative frequencies.
Graphical Representations:
Construct the following visual representations for your data (Pick only 1 type)
A Histogram
A Frequency Polygon
A Stem-and-Leaf Plot
A Pie Chart (if you have categorical data)
Data Analysis:
Summarize your findings by describing the center, variability, and shape of the data distribution.
Discuss any trends, patterns, or outliers that you notice in the dataset.
Submission:
Submit a report detailing your frequency distribution, graphs, and analysis.
Attach the Excel sheet or WORD file or PDF file.

© [Link] 2020
© [Link] 2020

You might also like