0% found this document useful (0 votes)
4 views

lecture 2 (1)

Uploaded by

3li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

lecture 2 (1)

Uploaded by

3li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

First Stage Statics and Probability lecture (2)

Second Semester

Definition 3.1. A frequency distribution


Is the organization of raw data in table form, using classes and
frequencies.
Definition 3.2. Categorical Frequency Distributions
The categorical frequency distribution is used for data that can be
placed in specific categories, such as nominal- or ordinal-level data. For
example, data such as political affiliation, religious affiliation, or major field
of study would use categorical frequency distributions.
Example 3.3. Distribution of Blood Types Twenty-five army inductees were
given a blood test to determine their blood type. The data set is

Construct a frequency distribution for the data.


Ans/ Since the data are categorical, discrete classes can be used. There are
four blood types: A, B, O, and AB. These types will be used as the classes for
the distribution. The procedure for constructing a frequency distribution for
categorical data is given next.
Step.1 Make a table as shown.

Step.2 Tally the data and place the results in column B.


Step.3 Count the tallies and place the results in column C.

1
First Stage Statics and Probability lecture (2)
Second Semester
Step.4 Find the percentage of values in each class by using the formula
% = f/n . 100%, where f = frequency of the class and n = total number of
values. For example, in the class of type A blood, the percentage is % = 5 /25
.100% = 20%
Percentages are not normally part of a frequency distribution, but they can be
added since they are used in certain types of graphs such as pie graphs. Also,
the decimal equivalent of a percent is called a relative frequency.
Step.5 Find the totals for columns C (frequency) and D (percent). The
completed table is shown.

3.2. Grouped Frequency Distributions. When the range of the data is large,
the data must be grouped into classes that are more than one unit in width, in
what is called a grouped frequency distribution.
Example 3.4. Data represent the record high temperature in F

2
First Stage Statics and Probability lecture (2)
Second Semester

The procedure for constructing a grouped frequency distribution for


numerical data follows.
 Determine the classes. Find the highest value , lowest value: H = 134
and L = 100.
Find the range: R = highest value - lowest value = H - L, so R = 134 - 100 =34
Select the number of classes desired (usually between 5 and 20). In this case,
7 is arbitrarily chosen.
Find the class width by dividing the range by the number of classes.

Round the answer up to the nearest whole number if there is a remainder: 4.9
5. (Rounding up is different from rounding off. A number is rounded up if
there is any decimal remainder when dividing. For example, 85  6 = 14.167
and is rounded up to 15. Also, 53  4 = 13.25 and is rounded up to 14. Also,
after dividing, if there is no remainder, you will need to add an extra class to
accommodate all the data.)
Select a starting point for the lowest class limit. This can be the smallest data
value or any convenient number less than the smallest data value. In this case,
100 is used. Add the width to the lowest score taken as the starting point to
get the lower limit of the next class. Keep adding until there are 7 classes, as
shown, 100, 105, 110, etc.
Subtract one unit from the lower limit of the second class to get the upper limit
of the first class. Then add the width to each upper limit to get all the upper
limits. 105 - 1 = 104.
The first class is 100-104, the second class is 105-109, etc.

3
First Stage Statics and Probability lecture (2)
Second Semester
 Tally the data.
 Find the numerical frequencies from the tallies. The completed
frequency distribution is

Sometimes it is necessary to use a cumulative frequency distribution. A


cumulative frequency distribution is a distribution that shows the number of
data values less than or equal to a specific value (usually an upper boundary).
The values are found by adding the frequencies of the classes less than or
equal to the upper class boundary of a specific class. This gives an ascending
cumulative frequency. In this example, the cumulative frequency for the first
class is 0 +2 = 2; for the second class it is 0 + 2 + 8 = 10; for the third class it
is 0 + 2 + 8 + 18 = 28.
Naturally, a shorter way to do this would be to just add the cumulative
frequency of the class below to the frequency of the given class. For example,
the cumulative frequency for the number of data values less than 114.5 can be
found by adding 10 + 18 = 28.
The cumulative frequency distribution for the data in this example is as
follows:

4
First Stage Statics and Probability lecture (2)
Second Semester

Remark 3.5. Constructing a grouped frequency distribution


 Determine the class.
 Find the highest and lowest value
 Find the range
 Select the number of classes desired
 Find the width by dividing the range by the number of classes and
rounding up
 Select a starting point (usually the lowest value or any convenient
number less than the lowest value), add the width to get the lower
limits.
 Find the upper class limits
 Find the boundary
 Tally the data
 Find the numerical frequency from the tallies
 Find the cumulative frequencies
4. Histogram, Frequency Polygons, and Ogives
The purpose of graphs in statistics is to convey the data to the viewers
in pictorial form. 1) It is easier for most people to comprehend the meaning
5
First Stage Statics and Probability lecture (2)
Second Semester
of data presented graphically than data presented numerically in tables or
frequency distributions.
This is especially true if the users have little or no statistical knowledge.
Statistical graphs can be used to describe the data set or to analyze it.2)
Graphs are also useful in getting the audience attention in a publication or a
speaking presentation. 3) They can be used to discuss an issue, reinforce a
critical point, or summarize a data set. They can also be used to discover a
trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are:
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).
An example of each type of graph is shown in following Figure. The data for
each graph are the distribution of the miles that 20 randomly selected runners
ran during a given week.

6
First Stage Statics and Probability lecture (2)
Second Semester

Definition 4.1. Histogram


The histogram is a graph that displays the data by using contiguous
vertical bars (unless the frequency of a class is 0) of various heights to
represent the frequencies of the classes.
Example 4.2. Construct a histogram to represent the data shown for the
record high temperatures for each of the 50 states
Class boundaries Frequency
99.5 – 104.5 2
104.5 – 109.5 8
109.5 – 114.5 18
114.5 – 119.5 13
119.5 – 124.5 7
124.5 – 129.5 1
129.5 – 134.5 1
Solution:
Step.1 Draw and label the x and y axis. The x axis is always the horizontal
axis, and the y axis is always the vertical axis.
Step.2 Represent the frequency on the y-axis and the class boundaries on the
x-axis.

7
First Stage Statics and Probability lecture (2)
Second Semester
Step.3 Using the frequencies as the heights, draw vertical bars for each class

Definition 4.3. The frequency polygon is a graph that displays the data by
using lines that connect points plotted for the frequencies at the midpoints of
the classes. The frequencies are represented by the heights of the points.
Example 4.4. For the same example
Step.1 Find the midpoints of each class
Class boundaries Midpoint Frequency
99.5 – 104.5 102 2
104.5 – 109.5 107 8
109.5 - 114:5 112 18
114:5 – 119.5 117 13
119.5 – 124.5 122 7
124.5 – 129.5 127 1
129.5 – 134.5 132 1

Step.2 Draw the x and y axes. Label the x axis with the midpoint of each class,
and then use a suitable scale on the y axis for the frequencies.
Step.3 Using the midpoints for the x values and the frequencies as the y
values, plot the points.

8
First Stage Statics and Probability lecture (2)
Second Semester
Step. 4 Connect adjacent points with line segments. Draw a line back to the x
axis at the beginning and end of the graph, at the same distance that the
previous and next midpoints would be located.

Definition 4.5. The Ogive The third type of graph that can be used represents
the cumulative frequencies for the classes. This type of graph is called the
cumulative frequency graph, or ogive. The cumulative frequency is the sum
of the frequencies accumulated up to the upper boundary of a class in the
distribution.
The ogive is a graph that represents the cumulative frequencies for the
classes in a frequency distribution.
Example 4.6. Step 1 Find the cumulative frequency for each class.

Step.2 Draw the x and y axes. Label the x axis with the class boundaries. Use
an appropriate scale for the y axis to represent the cumulative frequencies.

9
First Stage Statics and Probability lecture (2)
Second Semester
(Depending on the numbers in the cumulative frequency columns, scales such
as 0, 1, 2, 3, . . . , or 5, 10, 15, 20, . . . , or 1000, 2000, 3000, . . . can be used.
Do not label the y axis with the numbers in the cumulative frequency column.)
In this example, a scale of 0, 5, 10, 15, . . . will be used.
Step.3 Plot the cumulative frequency at each upper class boundary, as shown
in Figure. Upper boundaries are used since the cumulative frequencies
represent the number of data values accumulated up to the upper boundary of
each class.
Step.4 Starting with the first upper class boundary, 104.5, connect adjacent
points with line segments, as shown in Figure 2. Then extend the graph to the
first lower class boundary, 99.5, on the x axis.

10

You might also like