Creating and Using Frequency Distributions
Creating and Using Frequency Distributions
Definitions
Distribution: The general term for any organized set of data. We organize data so we can see the pattern they form. We need to know how many total scores were sampled (N). We are also concerned with how often each different score occurs in the data. How often a score occurs is symbolized f for frequency.
We want to know if Chimpanzees are social creatures so we looked at the work of Jane Goodall. She has recorded the size of the groups of chimpanzees that she sees. This data is the number of chimpanzees seen together at each sighting. This Information is meaningless in this collection of raw scores.
N = 28
6 9 2 3 5 7 8
6 5 6 8 9 7 6
1 9 7 3 7 6 7
9 2 6 8 9 8 7
Sf = 5+4+6+6+2+0+2+2+1=28=N From this table we can compute SX. We need to take each X score times its frequency of occurrence. SX=(9*5)+(8*4)+(7*6)+(6*6)+(5*2)+(4*0) +(3*2)+(2*2)+(1*1)=45+32+42+36+10+0 +6+4+1=176 We could also do this by adding together every raw score.
Score 9 8 7 6 5 4 3 2 1
f 5 4 6 6 2 0 2 2 1
A graph of a simple frequency distribution shows the relationship between each score and the frequency with which it occurs. We observe changes in frequency as a function of changes in the scores. We place the scores on the x-axis and the frequency on the y-axis.
Creating Simple Frequency Distributions Graphing a Simple Frequency Distribution Bar Graphs
If the different scores are nominal or ordinal, then we use a bar graph. Rank General Colonel Major Captain f
f
20 18 16 14 12 10 8 6 4 2 0
2 4 7 12
on el
Ca pt ai n
Lieutenant 18
Military Rank
Li eu te na nt
G en er al
Co l
aj or
Years in College
5 4
3 2
One Two Three Four Five Number of Years of College
3 6
8 12 4
f
6 4 2 0
Creating Simple Frequency Distributions Graphing a Simple Frequency Distribution Frequency Polygons
If we have a large range of interval or ratio scores we use a frequency polygon. Score 7 9 6 8 5 7 4 6 3 2 5 1 4 0 3 2 Number of Chimpanzees 1
x Se ve n Ei gh t Ni ne Te n Ze ro O ne Tw o Th re e Fo ur Fi ve Si
f 5 4 6 6 2 0 2 2 1
Specific mathematical properties describe this distribution. The height of the curve above any score reflects the number of people at that score. Scores far away from the middle (in the tails) are relatively infrequent. The further a score is from the center, the less frequently that score appears.
Sometimes we have two distributions plotted on the same set of x/y axes. We see that males are generally taller than females. But as we can see from the overlap, some females are taller than some males.
Females
Males
Kurtosis: Refers to how peaked or flat a distribution is. All of these are Normal distributions. Leptokurtic Mesokurtic Platykurtic
A skewed distribution is similar to a normal distribution, but it is not symmetrical. Negative Skew Positive Skew
Grades
Income
Sometimes data will have two scores which are very frequent. In a strange case, sometimes every score occurs exactly as often. Rectangular Bimodal
Simple frequency distributions are adequate for many things, but can be difficult because they lack a frame of reference. If a frequency is listed at 6, is this a lot? Well, I guess it would depend on the number of scores total. The formula for creating a scores relative frequency is f/N
We may want to know not only the frequency of each particular score, but we may also be interested in knowing the standing of the scores relative to other scores. Cumulative frequency is the frequency of all scores at or below a particular score.
Since data must be interval or ratio in order for a cumulative frequency to really make sense, it should be graphed as a frequency polygon.
30 25 20
cf
15 10 5 0
Number of Chimpanzees
x Se ve n Ei gh t Ni ne
Ze ro
Tw o Th re e Fo ur Fi ve
O ne
Si
Percentiles Finding Percentiles Using the Area under the Normal Curve
Percentile: Percent of the total scores which are at or below a certain point. Percentile = cumulative percent = (cf/N)*100
Dr. Goodalls data lent itself well to the frequency distribution because it is fairly easy to make a category for each X score. However, what if the scores ranged from 0 to 100, we certainly couldnt make a category for each X score, the frequency distribution would be 100 rows long. Here is a list of 25 exam scores that range from 53 to 94. 82 75 88 93 53 84 87 58 72 94 69 84 61 91 64 87 84 70 76 89 75 80 73 78 60
Rules for constructing grouped frequency distributions: 1. Should contain about 10 class intervals Too many and you arent really summarizing anything Too few and you lose too much information. 2. The width of each class interval should be a simple number. 3. The bottom score of each interval should be a multiple of its width. 4. All intervals should be the same width and should cover the full range of scores.