Lesson+1+Introduction+to+Statistics
Lesson+1+Introduction+to+Statistics
Objectives:
Definition of Statistics
1. Organize and summarize the information so that the researcher can see what
happened in the research study and can communicate results to others.
2. Help the researcher to answer the questions that initiated the research by
determining exactly what general conclusions are justified based on the specific
results that were obtained.
Example 1: You want to know whether students learn better through printed material or
online materials. You must gather information like their grades or activity scores. Once
you have this information, you use statistics to make sense of them. Statistics can help
you decide which set of grades (either with printed or online) is better. Statistics is a
universal set of techniques understood by the scientific community. A technique one uses
for research is known and is also being used by others.
In example 1, before you can start your study, you must define first who among
the students will be included in your study. Are they elementary, high school, or college
students? Will you include only 1st years, grade 8, or kinder students? Once you have
determined who you will be studying, you have your population. A population is the set
of all individuals of interest in a particular study. Typically, populations are large because
they include all target individuals. If for example 1, you decide to get all 1st year
students and find out that there are 1000 freshmen this semester, it will be hard to
include all of them in you study. Because of this, a sample is taken. A sample is a set
of individuals selected from a population. The sample is a smaller, more manageable
group from the population. This sample should be representative of its population.
Example 2: You have decided to study freshmen students. Since there are 1000 of them,
you take a sample of only 286 individuals which is more manageable. The freshmen
students are divided into 3 colleges, CAS, CBA, and CEA. In your sample, all 3 colleges
should also be present to make your sample representative of your population. Your
population only consist of freshmen who just enrolled during the year 2020-2021 so
your sample should also consist of students who only just enrolled during the year 2020-
2021.
Information from the sample is analyzed through statistics. The results will be
used to answer your initial questions from research like in example 1. This highlights the
relationship between a sample and a population. The goal is to generalize the results
from the sample back to the population. This relationship can be seen in the figure
below.
Once again, variables can be characteristics that differ from one individual to
another, such as height, weight, gender, or personality. Also, variables can be
environmental conditions that change such as temperature, time of day, or the size of
the room in which the research is being conducted.
When describing data, it is necessary to distinguish whether the data come from
a population or a sample. A characteristic that describes a population—for example, the
average score for the population—is called a parameter. A characteristic that describes
a sample is called a statistic. Thus, the average score for a sample is an example of a
statistic. Typically, the research process begins with a question about a population
parameter. However, the actual data come from a sample and are used to compute
sample statistics.
The two main branches of statistics are descriptive and inferential statistics.
Descriptive statistics are statistical procedures used to summarize, organize, and simplify
data. The data are commonly presented in a table or graph. Descriptive statistics includes
the following:
• Hypothesis testing
• ANOVA
• Regression analysis
• Chi-square
Because populations are typically very large, it usually is not possible to measure
everyone in the population. Therefore, a sample is selected to represent the population.
By analyzing the results from the sample, we hope to make general statements about
the population. Typically, researchers use sample statistics as the basis for drawing
conclusions about population parameters. One problem with using samples, however, is
that a sample provides only limited information about the population. Although samples
are generally representative of their populations, a sample is not expected to give a
perfectly accurate picture of the whole population. There usually is some discrepancy
between a sample statistic and the corresponding population parameter. This discrepancy
is called sampling error, and it creates the fundamental problem inferential statistics
must always address. A sampling error is the naturally occurring discrepancy, or error,
that exists between a sample statistic and the corresponding population parameter. An
example of how sampling error occurs can be seen below.
Some variables, such as height, weight, and eye color are well-defined, concrete
entities that can be observed and measured directly. On the other hand, many variables
studied by behavioral scientists are internal characteristics that people use to help
describe and explain behavior. For example, we say that a student does well in school
because he or she is intelligent. Or we say that someone is anxious in social situations,
or that someone seems to be hungry. Variables like intelligence, anxiety, and hunger are
called constructs, and because they are intangible and cannot be directly observed, they
are often called hypothetical constructs.
On the other hand, many variables are not discrete. Variables such as time, height,
and weight are not limited to a fixed set of separate, indivisible categories. You can
measure time, for example, in hours, minutes, seconds, or fractions of seconds. These
variables are called continuous because they can be divided into an infinite number of
fractional parts. For a continuous variable, there are an infinite number of possible values
that fall between any two observed values. A continuous variable is divisible into an
infinite number of fractional parts.
Real limits are the boundaries of intervals for scores that are represented on a
continuous number line. The real limit separating two adjacent scores is located exactly
halfway between the scores. Each score has two real limits. The upper real limit is at
the top of the interval, and the lower real limit is at the bottom. The example above
can be summarized by the figure below.
The terms continuous and discrete apply to the variables that are being measured
and not to the scores that are obtained from the measurement. For example, measuring
people’s heights to the nearest inch produces scores of 60, 61, 62, and so on. Although
the scores may appear to be discrete numbers, the underlying variable is continuous.
One key to determining whether a variable is continuous or discrete is that a continuous
variable can be divided into any number of fractional parts. Height can be measured to
the nearest inch, the nearest 0.5 inch, or the nearest 0.1 inch. Similarly, a professor
evaluating students’ knowledge could use a pass/fail system that classifies students into
two broad categories. However, the professor could choose to use a 10-point quiz that
divides student knowledge into 11 categories corresponding to quiz scores from 0 to 10.
Or the professor could use a 100-point exam that potentially divides student knowledge
into 101 categories from 0 to 100. Whenever you are free to choose the degree of
precision or the number of categories for measuring a variable, the variable must be
continuous.
Data Structures, Research Methods, and Statistics
Descriptive Research
Correlational Method
Experimental Method
Nonexperimental methods also compare groups, but this method does not
have control over the groups. If in experimental methods, the researcher can
determine who will be in the groups to be compared, in nonexperimental, it is
already preexisting. Examples of studies using this method include comparing 8-
year-old children and 10-year-old children, people with an eating disorder and
those with no disorder and comparing children from a single-parent home and
those from a two-parent home. Before and after studies are also examples like
comparing the stress rate of students before exams and after exams. The
researcher cannot control what will happen between the “before” and “after” exams
of the students. Changes in scores after the exams can be due to the exams or
other factors which the researcher has no control over.
Scales of Measurement
Nominal Scale
Although the categories on a nominal scale are not quantitative values, they are
occasionally represented by numbers. For example, the rooms or offices in a building
may be identified by numbers. You should realize that the room numbers are simply
names and do not reflect any quantitative information. Room 109 is not necessarily
bigger than Room 100 and certainly not 9 points bigger. It also is common to use
numerical values as a code for nominal categories when data are entered into computer
programs. For example, the data from a survey may code males with a 0 and females
with a 1. Again, the numerical values are simply names and do not represent any
quantitative difference. The scales that follow do reflect an attempt to make quantitative
distinctions.
Ordinal Scale
The categories that make up an ordinal scale not only have different names (as
in a nominal scale) but also are organized in a fixed order corresponding to differences
of magnitude. An ordinal scale consists of a set of categories that are organized in an
ordered sequence. Measurements on an ordinal scale rank observation in terms of size
or magnitude. Ordinal scales imply ranks and order like 1st, 2nd, and 3rd or small, medium,
and large. Since there is ranking or order, there is a directional relationship between the
categories typically smallest to largest or largest to smallest.
With measurements from an ordinal scale, you can determine whether two
individuals are different, and you can determine the direction of difference. However,
ordinal measurements do not allow you to determine the size of the difference between
two individuals. In a NASCAR race, for example, the first-place car finished faster than
the second-place car, but the ranks do not tell you how much faster. Other examples
of ordinal scales include socioeconomic class (upper, middle, lower) and T-shirt sizes
(small, medium, large). In addition, ordinal scales are often used to measure variables
for which it is difficult to assign numerical scores. For example, people can rank their
food preferences but might have trouble explaining “how much” they prefer chocolate
ice cream to steak.
Both an interval scale and a ratio scale consist of a series of ordered categories
(like an ordinal scale) with the additional requirement that the categories form a series
of intervals that are all the same size. Thus, the scale of measurement consists of a
series of equal intervals, such as inches on a ruler. Other examples of interval and ratio
scales are the measurement of time in seconds, weight in pounds, and temperature in
degrees Fahrenheit. Note that, in each case, one interval (1 inch, 1 second, 1 pound, 1
degree) is the same size, no matter where it is located on the scale. The fact that the
intervals are all the same size makes it possible to determine both the size and the
direction of the difference between two measurements. For example, you know that a
measurement of 80° Fahrenheit is higher than a measure of 60°, and you know that it
is exactly 20° higher.
An interval scale consists of ordered categories that are all intervals of the same
size. Equal differences between numbers on scale reflect equal differences in magnitude.
However, the zero point on an interval scale is arbitrary and does not indicate a zero
amount of the variable being measured. A ratio scale is an interval scale with the
additional feature of an absolute zero point. With a ratio scale, ratios of numbers do
reflect ratios of magnitude.
The factor that differentiates an interval scale from a ratio scale is the nature of
the zero point. An interval scale has an arbitrary zero point. That is, the value 0 is
assigned to a particular location on the scale simply as a matter of convenience or
reference. A value of zero does not indicate a total absence of the variable being
measured. For example, a temperature of 0º Fahrenheit does not mean that there is no
temperature, and it does not prohibit the temperature from going even lower. Interval
scales with an arbitrary zero point are relatively rare. The two most common examples
are the Fahrenheit and Celsius temperature scales. Other examples include golf scores
(above and below par) and relative measures such as above and below average rainfall.
A ratio scale is anchored by a zero point that is not arbitrary but rather is a
meaningful value representing none (a complete absence) of the variable being measured.
The existence of an absolute, non-arbitrary zero point means that we can measure the
absolute amount of the variable; that is, we can measure the distance from 0. This
makes it possible to compare measurements in terms of ratios. For example, a gas tank
with 10 gallons (10 more than 0) has twice as much gas as a tank with only 5 gallons
(5 more than 0). Also note that a completely empty tank has 0 gallons. To recap, with
a ratio scale, we can measure the direction and the size of the difference between two
measurements and we can describe the difference in terms of a ratio. Ratio scales are
quite common and include physical measures such as height and weight, as well as
variables such as reaction time or the number of errors on a test.