Chapter1 Stats
Chapter1 Stats
1
Lecture Notes for Introductory Statistics
In the first chapter we are introduced to several very important statistical terms
and concepts. Warning: Notice that in the previous sentence, there is no mention
of formulas or calculations. This is not a typical math class. It is really more
of a “critical thinking” course. It is essential to realize that you must approach the
study of statistics much differently than you would approach a class like College
Algebra or Calculus. Be sure to give yourself time to understand the concepts
deeply. Spend time thinking about the concepts and definitions. Read the book
and start the homework early enough that you have time to really understand
each problem. Memorizing definitions and techniques will not guarantee successful
completion of the course.
The most ideal sampling method is a simple random sample, in which every
set of n subjects has the same probability of being selected. This is like putting
names in a hat and randomly selecting 5 names, for example. All of the inferential
statistics methods we will study require that the data arise from a simple random
sample, or SRS. The reason for this requirement is that if the data are not represen-
tative of the population, then the conclusions will be meaningless (GIGO: Garbage
In, Garbage Out).
Often a simple random sample is not feasible, or at least not practical, so re-
searchers will do their best to use other sampling methods that are likely to result
in a representative sample. A few different sampling methods that may be used
successfully are:
• stratified sampling: subjects are categorized by similar traits, then sam-
ple subjects are randomly selected from each category in numbers that
are porportional to their numbers in the population. Example: 4 girls are
randomly selected and then 6 boys are randomly selected from a population
that is 40% female. Stratified sampling guarantees a representative sample
relative to the categories that are used.
• cluster sampling: there is already a natural categorization of subjects,
usually by location. A sample of categories is selected randomly, and every
subject in each of the selected categories is part of the sample. Exam-
ple: randomly select 10 elementary schools in Georgia, then sample every
teacher at each of those schools. Cluster sampling is done for the sake of
saving time and/or money.
• systematic sampling: sample every nth subject. Example: sample every
1000th m&m to weigh and measure. Systematic sampling is common in
manufacturing.
Sampling bias occurs when some members of the population are more likely
to be chosen than others, resulting in a sample that is not representative of the
population. The following sampling methods are common causes of sampling bias:
• convenience sample: collect data in a way that is convenient for the
researcher, with little regard for obtaining a representative sample. Ex-
ample: a teacher sampling the students in her own classes to estimate the
percentage of biology majors in the school.
• voluntary response/self selection: each subject actively chooses to par-
ticipate. Example: an internet survey posted on a website. This could also
be less obvious. Example: an observational study of the health benefits of
drinking wheatgrass juice.
Example 6. Suppose you are a reporter for the school newspaper. The university
just announced an institutional name change and you would like to write a story
that includes an analysis of student opinion about the change.
(1) Identify the population of interest.
(2) Describe a sampling design of each type below that should result in a rep-
resentative sample.
(a) simple random sample
(b) stratefied sample
(c) cluster sample
(d) systematic sample
(3) Describe a sampling design of each type below. Do you think it is likely
to result in a representative sample of the population for this particular
question?
Chapter 1 Notes Sampling and Data D. Skipper, p 4
for displaying categorical data, though it is often difficult to compare similarly sized
pie slices at a glance. Notice that with a pie chart, the slices must add up to 100%.
The following charts (from Section 1.2 of the textbook) illustrate three different
graphical representations of the same data.
We would like to organize quantitative data in a similar way, but the categories
are not immediately available. Instead, we must split the range of data values into
categories called classes. Then we can sort the data into these classes, treating
the data much like we treated categorical data, finding a frequency and relative
frequency for each class.
Chapter 1 Notes Sampling and Data D. Skipper, p 6
(1) What percentage of students in the class are 6” (5 feet,4 inches) or shorter?
(2) What percentage of students in the class are taller than 69” (5 feet, 9
inches)?
The next logical step is to draw a histogram of the data, which is a fancy word
for a bar graph for quantitative data that has been sorted into classes. Histograms
are the single most important graph in statistics, and we will be using them often.
An important feature of a histogram is that there is a natural order for the classes
(from left to right on the number line), so we will never sort a histograms bars by
height.
Example 8. Draw a relative frequency histogram of the class height data from
Example 7.
Example 9. Calculator histogram. Use the height data from Example 7.
(1) Clear calculator ram.
(2) Enter the height into L1 .
(3) Use stat plot to draw a histogram using default class boundaries. Use
trace to see the default class boundaries and frequencies.
(4) Use window then graph to change the histogram so that it uses the class
boundaries we created in Example 7.
The reason histograms are so important is because they display the distribution
of the data: where the data values fall along the range of possible values. They
help us answer important questions such as: Is the data spread out or mostly
concentrated in a small region? Is the data symmetrically distributed? Is the data
skewed left (a few extremely low data values) or skewed right (a few extremely high
data values)?
Example 10. What does the histogram of the height data tell us about the heights
of the people in the sampled statistics class? Sketch a histogram shape for each
class described below.
(1) The class has a lot of tall people.
(2) Everyone in the class is 64”.
(3) The students make a nice staircase shape when lined up from shortest to
tallest.
Chapter 1 Notes Sampling and Data D. Skipper, p 7
Ethics comes into play in statistical studies in a variety of ways. We have already
seen that the self-interest studies can inspire fraudulent data and analysis. Another
very important ethical component is related to the treatment of human subjects.
There are laws that require that studies are safe, that participants understand the
risks associated with a study, that subjects freely decide whether or not to partici-
pate, and that each subjects privacy is protected. Research institutions (including
Augusta University) have Institutional Review Boards (IRB) to oversee the
research at the institution and ensure the safety of all human subjects.