0% found this document useful (0 votes)
23 views

Lesson+1+Introduction+to+Statistics

Uploaded by

Mamamie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lesson+1+Introduction+to+Statistics

Uploaded by

Mamamie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lesson 1: Introduction to Statistics

Objectives:

1. Define relevant terms and describe the relationship between them


2. Identify and differentiate the branches of statistics
3. Explore the concept of sampling error
4. Identify and differentiate the scales of measurement and other types of variables

Definition of Statistics

Statistics is generally known as a field of mathematics. It is associated with


numbers and figures such that the term “statistics” is a shortened version of “statistical
procedures”. Therefore, statistics is defined as a set of mathematical procedures used
for organizing, summarizing, and interpreting information. For behavioral sciences,
statistics is used to assess and evaluate the research results specifically to:

1. Organize and summarize the information so that the researcher can see what
happened in the research study and can communicate results to others.

2. Help the researcher to answer the questions that initiated the research by
determining exactly what general conclusions are justified based on the specific
results that were obtained.

Example 1: You want to know whether students learn better through printed material or
online materials. You must gather information like their grades or activity scores. Once
you have this information, you use statistics to make sense of them. Statistics can help
you decide which set of grades (either with printed or online) is better. Statistics is a
universal set of techniques understood by the scientific community. A technique one uses
for research is known and is also being used by others.

Population and Samples

In example 1, before you can start your study, you must define first who among
the students will be included in your study. Are they elementary, high school, or college
students? Will you include only 1st years, grade 8, or kinder students? Once you have
determined who you will be studying, you have your population. A population is the set
of all individuals of interest in a particular study. Typically, populations are large because
they include all target individuals. If for example 1, you decide to get all 1st year
students and find out that there are 1000 freshmen this semester, it will be hard to
include all of them in you study. Because of this, a sample is taken. A sample is a set
of individuals selected from a population. The sample is a smaller, more manageable
group from the population. This sample should be representative of its population.
Example 2: You have decided to study freshmen students. Since there are 1000 of them,
you take a sample of only 286 individuals which is more manageable. The freshmen
students are divided into 3 colleges, CAS, CBA, and CEA. In your sample, all 3 colleges
should also be present to make your sample representative of your population. Your
population only consist of freshmen who just enrolled during the year 2020-2021 so
your sample should also consist of students who only just enrolled during the year 2020-
2021.

Information from the sample is analyzed through statistics. The results will be
used to answer your initial questions from research like in example 1. This highlights the
relationship between a sample and a population. The goal is to generalize the results
from the sample back to the population. This relationship can be seen in the figure
below.

Variables and Data

Typically, researchers are interested in specific characteristics of the individuals in


the population (or in the sample), or they are interested in outside factors that may
influence the individuals. For example, a researcher may be interested in the influence
of the weather on people’s moods. As the weather changes, do people’s moods also
change? Something that can change or have different values is called a variable. A
variable is a characteristic or condition that changes or has different values for different
individuals.

Once again, variables can be characteristics that differ from one individual to
another, such as height, weight, gender, or personality. Also, variables can be
environmental conditions that change such as temperature, time of day, or the size of
the room in which the research is being conducted.

To demonstrate changes in variables, it is necessary to make measurements of


the variables being examined. The measurement obtained for someone is called a datum,
or more commonly, a score or raw score. The complete set of scores is called the data
set or simply the data. Data (plural) are measurements or observations. A data set is a
collection of measurements or observations. A datum (singular) is a single measurement
or observation and is commonly called a score or raw score.

As we defined a sample or population, we talked about individuals. But, they can


also refer to populations or samples of scores. In research, you measure an individual
to obtain a score. Therefore, the scores of your sample will produce a sample of scores.

Parameters and Statistics

When describing data, it is necessary to distinguish whether the data come from
a population or a sample. A characteristic that describes a population—for example, the
average score for the population—is called a parameter. A characteristic that describes
a sample is called a statistic. Thus, the average score for a sample is an example of a
statistic. Typically, the research process begins with a question about a population
parameter. However, the actual data come from a sample and are used to compute
sample statistics.

A parameter is a value, usually a numerical value, that describes a population. A


parameter is usually derived from measurements of the individuals in the population. A
statistic is a value, usually a numerical value, that describes a sample. A statistic is
usually derived from measurements of the individuals in the sample.

Descriptive and Inferential Statistics

The two main branches of statistics are descriptive and inferential statistics.
Descriptive statistics are statistical procedures used to summarize, organize, and simplify
data. The data are commonly presented in a table or graph. Descriptive statistics includes
the following:

• Measures of central tendency (mean, median, mode)


• Measures of variability (range, standard deviation, variance)
• Measures of relative position (percentile, quartile)

Inferential statistics consists of techniques that allow us to study samples and


then generalize about the populations from which they were selected. This includes:

• Hypothesis testing
• ANOVA
• Regression analysis
• Chi-square

Because populations are typically very large, it usually is not possible to measure
everyone in the population. Therefore, a sample is selected to represent the population.
By analyzing the results from the sample, we hope to make general statements about
the population. Typically, researchers use sample statistics as the basis for drawing
conclusions about population parameters. One problem with using samples, however, is
that a sample provides only limited information about the population. Although samples
are generally representative of their populations, a sample is not expected to give a
perfectly accurate picture of the whole population. There usually is some discrepancy
between a sample statistic and the corresponding population parameter. This discrepancy
is called sampling error, and it creates the fundamental problem inferential statistics
must always address. A sampling error is the naturally occurring discrepancy, or error,
that exists between a sample statistic and the corresponding population parameter. An
example of how sampling error occurs can be seen below.

Variables and Measurement

Some variables, such as height, weight, and eye color are well-defined, concrete
entities that can be observed and measured directly. On the other hand, many variables
studied by behavioral scientists are internal characteristics that people use to help
describe and explain behavior. For example, we say that a student does well in school
because he or she is intelligent. Or we say that someone is anxious in social situations,
or that someone seems to be hungry. Variables like intelligence, anxiety, and hunger are
called constructs, and because they are intangible and cannot be directly observed, they
are often called hypothetical constructs.

Although constructs such as intelligence are internal characteristics that cannot


be directly observed, it is possible to observe and measure behaviors that are
representative of the construct. For example, we cannot “see” intelligence, but we can
see examples of intelligent behavior. The external behaviors can then be used to create
an operational definition for the construct. An operational definition defines a construct
in terms of external behaviors that can be observed and measured. For example, your
intelligence is measured and defined by your performance on an IQ test, or hunger can
be measured and defined by the number of hours since last eating.

Constructs are internal attributes or characteristics that cannot be directly


observed but are useful for describing and explaining behavior. An operational definition
identifies a measurement procedure (a set of operations) for measuring an external
behavior and uses the resulting measurements as a definition and a measurement of a
hypothetical construct. Note that an operational definition has two components. First, it
describes a set of operations for measuring a construct. Second, it defines the construct
in terms of the resulting measurements.

Discrete and Continuous Variables

A discrete variable consists of separate, indivisible categories. No values can exist


between two neighboring categories. Discrete variables are commonly restricted to whole,
countable numbers—for example, the number of children in a family or the number of
students attending class. If you observe class attendance from day to day, you may
count 18 students one day and 19 students the next day. However, it is impossible ever
to observe a value between 18 and 19. A discrete variable may also consist of
observations that differ qualitatively. For example, people can be classified by gender
(male or female), by occupation (nurse, teacher, lawyer, etc.), and college students can
by classified by academic major (art, biology, chemistry, etc.). In each case, the variable
is discrete because it consists of separate, indivisible categories.

On the other hand, many variables are not discrete. Variables such as time, height,
and weight are not limited to a fixed set of separate, indivisible categories. You can
measure time, for example, in hours, minutes, seconds, or fractions of seconds. These
variables are called continuous because they can be divided into an infinite number of
fractional parts. For a continuous variable, there are an infinite number of possible values
that fall between any two observed values. A continuous variable is divisible into an
infinite number of fractional parts.

Suppose, for example, that a researcher is measuring weights for a group of


individuals participating in a diet study. Because weight is a continuous variable, it can
be pictured as a continuous line. Note that there are an infinite number of possible
points on the line without any gaps or separations between neighboring points. For any
two different points on the line, it is always possible to find a third value that is between
the two points. Two other factors apply to continuous variables:

1. When measuring a continuous variable, it should be very rare to obtain


identical measurements for two different individuals. Because a continuous
variable has an infinite number of possible values, it should be almost
impossible for two people to have the same score. If the data show a
substantial number of tied scores, then you should suspect that the
measurement procedure is very crude or that the variable is not continuous.
2. When measuring a continuous variable, each measurement category is an
interval that must be defined by boundaries. For example, two people who
both claim to weigh 150 pounds are probably not the same weight. However,
they are both around 150 pounds. One person may weigh 149.6 and the
other 150.3. Thus, a score of 150 is not a specific point on the scale but
instead is an interval. To differentiate a score of 150 from a score of 149
or 151, we must set up boundaries on the scale of measurement. These
boundaries are called real limits and are positioned exactly halfway between
adjacent scores. Thus, a score of X = 150 pounds is an interval bounded
by a lower real limit of 149.5 at the bottom and an upper real limit of
150.5 at the top. Any individual whose weight falls between these real limits
will be assigned a score of X = 150.

Real limits are the boundaries of intervals for scores that are represented on a
continuous number line. The real limit separating two adjacent scores is located exactly
halfway between the scores. Each score has two real limits. The upper real limit is at
the top of the interval, and the lower real limit is at the bottom. The example above
can be summarized by the figure below.

The terms continuous and discrete apply to the variables that are being measured
and not to the scores that are obtained from the measurement. For example, measuring
people’s heights to the nearest inch produces scores of 60, 61, 62, and so on. Although
the scores may appear to be discrete numbers, the underlying variable is continuous.
One key to determining whether a variable is continuous or discrete is that a continuous
variable can be divided into any number of fractional parts. Height can be measured to
the nearest inch, the nearest 0.5 inch, or the nearest 0.1 inch. Similarly, a professor
evaluating students’ knowledge could use a pass/fail system that classifies students into
two broad categories. However, the professor could choose to use a 10-point quiz that
divides student knowledge into 11 categories corresponding to quiz scores from 0 to 10.
Or the professor could use a 100-point exam that potentially divides student knowledge
into 101 categories from 0 to 100. Whenever you are free to choose the degree of
precision or the number of categories for measuring a variable, the variable must be
continuous.
Data Structures, Research Methods, and Statistics

Individual (One) Variable

Descriptive Research

A descriptive research is conducted to describe individual variables as they


exist naturally. For example, a college official may conduct a survey to describe
the eating, sleeping, and study habits of a group of college students. The results
consist of numerical scores, such as the number of hours spent studying each
day. Another example, a recent newspaper article reported that 34.9% of Americans
are obese, which is roughly 35 pounds over a healthy weight.

Two or More Variables

Correlational Method

The correlational method examines the relationship between two variables


as they exist naturally for a set of individuals. For example, measuring the
relationship between wake-up time and academic performance. Researchers will
take note of the wake-up time of each individual and check each of their grades.
The researchers will look at patterns across the time and grades and if there is
a consistent pattern, then they are related.

Experimental Method

Experimental method compares groups of scores. In the experimental


method, one variable is manipulated while another variable is observed and
measured. To establish a cause-and-effect relationship between the two variables,
an experiment attempts to control all other variables to prevent them from
influencing the results. Its goal is to demonstrate a cause-and-effect relationship
between two variables. This method shows how changing the value of one variable
cause change to occur in the second variable. To accomplish this goal, the
experimental method has two characteristics that differentiate experiments from
other types of research studies:

1. Manipulation: The researcher manipulates one variable by changing its value


from one level to another.
Example 1: In examining the effect of violence in video games, the
researchers manipulate the amount of violence by giving one group of boys
a violent game to play and giving the other group a nonviolent game. A
second variable is observed (measured) to determine whether the
manipulation causes changes to occur.
2. Control: The researcher must exercise control over the research situation
to ensure that other, extraneous variables do not influence the relationship
being examined.

In example 1, to be able to say that the difference in aggressive behavior


is caused by the amount of violence in the game, the researcher must rule out
any other possible explanation for the difference. That is, any other variables that
might affect aggressive behavior must be controlled. There are two general
categories of variables that researchers must consider:

1. Participant Variables: These are characteristics such as age, gender, and


intelligence that vary from one individual to another. Whenever an
experiment compares different groups of participants (one group in
treatment A and a different group in treatment B), researchers must ensure
that participant variables do not differ from one group to another. In
example 1, the researchers would like to conclude that the violence in the
video game causes a change in the participants’ aggressive behavior. In the
study, the participants in both conditions were 10-year-old boys. If for
example, the 1st group was all boys and those in the 2nd group were all
girls. The results would be labelled as confounded because the researchers
are not sure if the differences are due to the amount of violence or gender.

2. Environmental Variables: These are characteristics of the environment such


as lighting, time of day, and weather conditions. A researcher must ensure
that the individuals in treatment A are tested in the same environment as
the individuals in treatment B. In example 1, suppose that the individuals
in the nonviolent condition were all tested in the morning and the individuals
in the violent condition were all tested in the evening. Again, this would
produce a confounded experiment because the researcher could not
determine whether the differences in aggressive behavior were caused by
the amount of violence or caused by the time of day.

There two main variables in experimental method. The variable that is


manipulated by the experimenter is called the independent variable. It can be
identified as the treatment conditions to which participants are assigned. In
example 1, the amount of violence in the video game is the independent variable.
The variable that is observed and measured to obtain scores within each condition
is the dependent variable. In example 1, the level of aggressive behavior is the
dependent variable. The independent variable is the variable that is manipulated
by the researcher. In behavioral research, the independent variable usually consists
of the two (or more) treatment conditions to which subjects are exposed. The
dependent variable is the one that is observed to assess the effect of the
treatment.
Nonexperimental Method

Nonexperimental methods also compare groups, but this method does not
have control over the groups. If in experimental methods, the researcher can
determine who will be in the groups to be compared, in nonexperimental, it is
already preexisting. Examples of studies using this method include comparing 8-
year-old children and 10-year-old children, people with an eating disorder and
those with no disorder and comparing children from a single-parent home and
those from a two-parent home. Before and after studies are also examples like
comparing the stress rate of students before exams and after exams. The
researcher cannot control what will happen between the “before” and “after” exams
of the students. Changes in scores after the exams can be due to the exams or
other factors which the researcher has no control over.

Scales of Measurement

Data collection requires that we make measurements of our observations.


Measurement involves assigning individuals or events to categories. The categories can
simply be names such as male/female or employed/unemployed, or they can be
numerical values such as 68 inches or 175 pounds. The categories used to measure a
variable make up a scale of measurement, and the relationships between the categories
determine different types of scales.

Scales of measurement are important because they highlight the limitations of


each scale and scales of measurement are important because they help determine the
statistics that are used to evaluate the data. Specifically, there are certain statistical
procedures that are used with numerical scores from interval or ratio scales and other
statistical procedures that are used with nonnumerical scores from nominal or ordinal
scales.

Nominal Scale

The word nominal means “having to do with names.” Measurement on a nominal


scale involves classifying individuals into categories that have different names but are
not related to each other in any systematic way. For example, if you were measuring
the academic majors for a group of college students, the categories would be art,
biology, business, chemistry, and so on. Each student would be classified in one category
according to his or her major. The measurements from a nominal scale allow us to
determine whether two individuals are different, but they do not identify either the
direction or the size of the difference. If one student is an art major and another is a
biology major, we can say that they are different, but we cannot say that art is “more
than” or “less than” biology and we cannot specify how much difference there is between
art and biology. Other examples of nominal scales include classifying people by race,
gender, or occupation.
A nominal scale consists of a set of categories that have different names.
Measurements on a nominal scale label and categorize observations, but do not make
any quantitative distinctions between observations.

Although the categories on a nominal scale are not quantitative values, they are
occasionally represented by numbers. For example, the rooms or offices in a building
may be identified by numbers. You should realize that the room numbers are simply
names and do not reflect any quantitative information. Room 109 is not necessarily
bigger than Room 100 and certainly not 9 points bigger. It also is common to use
numerical values as a code for nominal categories when data are entered into computer
programs. For example, the data from a survey may code males with a 0 and females
with a 1. Again, the numerical values are simply names and do not represent any
quantitative difference. The scales that follow do reflect an attempt to make quantitative
distinctions.

Ordinal Scale

The categories that make up an ordinal scale not only have different names (as
in a nominal scale) but also are organized in a fixed order corresponding to differences
of magnitude. An ordinal scale consists of a set of categories that are organized in an
ordered sequence. Measurements on an ordinal scale rank observation in terms of size
or magnitude. Ordinal scales imply ranks and order like 1st, 2nd, and 3rd or small, medium,
and large. Since there is ranking or order, there is a directional relationship between the
categories typically smallest to largest or largest to smallest.

With measurements from an ordinal scale, you can determine whether two
individuals are different, and you can determine the direction of difference. However,
ordinal measurements do not allow you to determine the size of the difference between
two individuals. In a NASCAR race, for example, the first-place car finished faster than
the second-place car, but the ranks do not tell you how much faster. Other examples
of ordinal scales include socioeconomic class (upper, middle, lower) and T-shirt sizes
(small, medium, large). In addition, ordinal scales are often used to measure variables
for which it is difficult to assign numerical scores. For example, people can rank their
food preferences but might have trouble explaining “how much” they prefer chocolate
ice cream to steak.

Interval and Ratio Scale

Both an interval scale and a ratio scale consist of a series of ordered categories
(like an ordinal scale) with the additional requirement that the categories form a series
of intervals that are all the same size. Thus, the scale of measurement consists of a
series of equal intervals, such as inches on a ruler. Other examples of interval and ratio
scales are the measurement of time in seconds, weight in pounds, and temperature in
degrees Fahrenheit. Note that, in each case, one interval (1 inch, 1 second, 1 pound, 1
degree) is the same size, no matter where it is located on the scale. The fact that the
intervals are all the same size makes it possible to determine both the size and the
direction of the difference between two measurements. For example, you know that a
measurement of 80° Fahrenheit is higher than a measure of 60°, and you know that it
is exactly 20° higher.

An interval scale consists of ordered categories that are all intervals of the same
size. Equal differences between numbers on scale reflect equal differences in magnitude.
However, the zero point on an interval scale is arbitrary and does not indicate a zero
amount of the variable being measured. A ratio scale is an interval scale with the
additional feature of an absolute zero point. With a ratio scale, ratios of numbers do
reflect ratios of magnitude.

The factor that differentiates an interval scale from a ratio scale is the nature of
the zero point. An interval scale has an arbitrary zero point. That is, the value 0 is
assigned to a particular location on the scale simply as a matter of convenience or
reference. A value of zero does not indicate a total absence of the variable being
measured. For example, a temperature of 0º Fahrenheit does not mean that there is no
temperature, and it does not prohibit the temperature from going even lower. Interval
scales with an arbitrary zero point are relatively rare. The two most common examples
are the Fahrenheit and Celsius temperature scales. Other examples include golf scores
(above and below par) and relative measures such as above and below average rainfall.

A ratio scale is anchored by a zero point that is not arbitrary but rather is a
meaningful value representing none (a complete absence) of the variable being measured.
The existence of an absolute, non-arbitrary zero point means that we can measure the
absolute amount of the variable; that is, we can measure the distance from 0. This
makes it possible to compare measurements in terms of ratios. For example, a gas tank
with 10 gallons (10 more than 0) has twice as much gas as a tank with only 5 gallons
(5 more than 0). Also note that a completely empty tank has 0 gallons. To recap, with
a ratio scale, we can measure the direction and the size of the difference between two
measurements and we can describe the difference in terms of a ratio. Ratio scales are
quite common and include physical measures such as height and weight, as well as
variables such as reaction time or the number of errors on a test.

You might also like