100% found this document useful (1 vote)
87 views70 pages

Educ 502 1 1

This document outlines the objectives and content of an educational measurement and statistics course. It discusses key topics including descriptive versus inferential statistics, types of data and variables, measurement levels, and sampling techniques. Descriptive statistics involves collecting, organizing and presenting data, while inferential statistics allows generalizing from samples to populations through estimations, hypothesis testing, and predictions. The document also differentiates between observational studies, where researchers observe past events, and experimental studies, where researchers manipulate variables to determine influence.

Uploaded by

david rentoria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
87 views70 pages

Educ 502 1 1

This document outlines the objectives and content of an educational measurement and statistics course. It discusses key topics including descriptive versus inferential statistics, types of data and variables, measurement levels, and sampling techniques. Descriptive statistics involves collecting, organizing and presenting data, while inferential statistics allows generalizing from samples to populations through estimations, hypothesis testing, and predictions. The document also differentiates between observational studies, where researchers observe past events, and experimental studies, where researchers manipulate variables to determine influence.

Uploaded by

david rentoria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Educ 502- Educational Measurement

and Statistics

I. Objectives
1. Demonstrate knowledge of statistical terms.
2. Differentiate between the two branches of statistics.
3. Identify types of data.
4. Identify the measurement level for each variable.
5. Identify the four basic sampling techniques.
6. Explain the difference between an observational and an experimental

What is statistics?
Statistics is the science of collecting data, organizing , summarizing and analyzing information to
draw conclusions or answer to the question. In addition statistics is about providing measure of
confidence in any conclusion

The science of collecting describing, and interpreting data ( Johnson & Kuby, 2007)
The science of conducting studies to collect, organize summarize analyze and draw conclusion
from the data ( Bluman 2008)
The art ans science of collecting ,analyzing and interpreting data ( Cochran, Anderson &
etl, 2013)
Study statistics for several reasons
Must read and understand the various statistical studies performed in their fields
To conduct research in their fields, since statistics are basic to research

To use knowledge gained from studying statistics to become better consumer and citizen
Descriptive and Inferential Statistics
Data can be used in different ways. The body of knowledge called statistics is sometimes divided
into two main areas, depending on how data are used. The two areas are
1. Descriptive statistics
2. Inferential statistics
Descriptive statistics consists of the collection, organization, summarization, and presentation of
data.
Inferential statistics consists of generalizing from samples to populations, performing estimations
and hypothesis tests, determining relationships among variables, and making predictions.
Here, the statistician tries to make inferences from samples to populations. Inferential statistics
uses probability, i.e., the chance of an event occurring. You may be familiar with the concepts of
probability through various forms of gambling.
A population consists of all subjects (human or otherwise) that are being studied.
Most of the time, due to the expense, time, size of population, medical concerns, etc., it is not
possible to use the entire population for a statistical study; therefore, researchers use samples
A sample is a group of subjects selected from a population.
An area of inferential statistics called hypothesis testing is a decision-making process for
evaluating claims about a population, based on information obtained from samples.

Qualitative variables are variables that can be placed into distinct categories, according to some
characteristics or attributes. For example , if the subjects are classified according to sex then
variable sex is qualitative. Other examples of qualitative variables are political affiliation, religious
preference , geographic location and more.
Quantitative variables are numerical and can be ordered, for example the variable age and people
can be ranked in order according to the value of their age. Other examples of quantitative variables
are the height, weight, body temperature and so on.
Discrete variables assume that value can be counted. Example number of children in a family,
number of students in the online class in statistics.
Continuous variables can assume an infinite number of values between any two specific values.
They are obtained by measuring. They often include fractions and decimals

Measurement level of the variables are : nominal, ordinal, interval and ratio. Nominal level of
measurement classifies data into mutually exclusive, exhausting categories in which no order of
ranking can be imposed on the data.

Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist.

Interval level of measurement ranks data and precise differences between units of measure do
exist; however, there is no meaningful zero.

Ratio level of measurement possesses all the characteristics of interval measurement, and these
exist a true zero. In addition, true ratio exists when the same variable is measured by two different
members of the population.
Classify the level of measurement of the following:
Zip code Gender
Grade ( A, B, C, D F) Eye color
Judging( First, second, third) Rating Scale ( poor, good, excellent) SAT
score Political Affiliation
IQ Major Field ( Math ..Science)
Temperature Ranking of Tennis players
Height Nationality
Weight Age
Time Religion affiliation
Salary Score in the test

Exercises:
Classify each as nominal-level, ordinal-level, interval level, or ratio-level measurement.
a. Pages in the 25 best-selling mystery novels.
b. Rankings of golfers in a tournament.
c. Temperatures inside 10 pizza ovens.
d. Weights of selected cell phones.
e. Salaries of the coaches in the NFL.
f. Times required to complete a chess game.
g. Ratings of textbooks (poor, fair, good, excellent).
h. Number of amps delivered by battery chargers.
i. Ages of children in a day care center.
j. Categories of magazines in a physician’s office (sports, women’s, health, men’s,
news).
2. In each of these statements, tell whether descriptive or inferential statistics have been
used.
a. By 2040 at least 3.5 billion people will run short of water (World Future Society). b.
Nine out of ten on-the-job fatalities are men (Source: USA TODAY Weekend ). c.
Expenditures for the cable industry were $5.66 billion in 1996 (Source: USA TODAY ).
d. The median household income for people aged 25–34 is $35,888 (Source: USA TODAY ).
e. Allergy therapy makes bees go away (Source: Prevention).
f. Drinking decaffeinated coffee can raise cholesterol levels by 7% (Source: American
Heart Association).
g. The national average annual medicine expenditure per person is $1052(Source: The
Greensburg Tribune Review).
h. Experts say that mortgage rates may soon hit bottom (Source: USA TODAY ).
3. Classify each variable as qualitative or quantitative.
a. Marital status of nurses in a hospital.
b. Time it takes to run a marathon.
c. Weights of lobsters in a tank in a restaurant.
d. Colors of automobiles in a shopping center parking lot.
e. Ounces of ice cream in a large milkshake.
f. Capacity of the NFL football stadiums.
g. Ages of people living in a personal care home.
4. Classify each variable as discrete or continuous.
a. Number of pizzas sold by Pizza Express each day.
b. Relative humidity levels in operating rooms at local hospitals.
c. Number of bananas in a bunch at several local supermarkets.
d. Lifetimes (in hours) of 15 iPod batteries.
e. Weights of the backpacks of first graders on a school bus.
f. Number of students each day who make appointments with a math tutor at a
local college.
g. Blood pressures of runners in a marathon.

Random Sampling
Simple random sampling
Systematics Sampling - Researcher obtained systematic samples by numbering each subjects of
the population and selecting every kth subject. For example, suppose there are 2000 subjects in
the population and a sample of 50 subjects are needed. Since 2000 divided by 50= is 40 and every
40th subject would be selected, however, the first ( numbered between 1 and 40) will be selected
at random. Suppose 12 were the first subject then the sample consisted of the subjects whose
numbers were 12, 52, 92, etc. until the desired number of subjects were obtained.
Stratified Sampling is done by dividing the population into groups( called strata) according to
some characteristics that are important to the study, then sampling from each group.Sample within
the group should be randomly selected.
Cluster Sampling Here the population is divided into groups called clusters by some means such
as geographic area or school in a large district, etc. Then the researcher randomly selects some of
these clusters and uses all members of the selected clusters as the subject of the samples.
Checking the comprehension
Read the following information about the transportation industry and answer the questions.
Transportation Safety The chart shows the number of job-related injuries for each of the
transportation industries for 1998.
Industry Number of injuries
Railroad 4520
Intercity bus 5100
Subway 6850
Trucking 7144
Airline 9950
1. What are the variables under study?
2. Categorize each variable as quantitative or qualitative.
3. Categorize each quantitative variable as discrete or continuous.
4. Identify the level of measurement for each variable.
5. The railroad is shown as the safest transportation industry. Does that mean
railroads have fewer accidents than the other industries? Explain.
6. What factors other than safety influence a person’s choice of transportation?
7. From the information given, comment on the relationship between the
variables
Observational and Experimental Studies
There are several different ways to classify statistical studies. This section
explains two types of studies: observational studies and experimental studies.
In an observational study, the researcher merely observes what is happening or what has
happened in the past and tries to draw conclusions based on these observations
In an experimental study, the researcher manipulates one of the variables and tries to
determine how the manipulation influences other variables.
Statistical studies usually include one or more independent variables and one dependent
variableThe independent variable in an experimental study is the one that is being manipulated by
the researcher. The independent variable is also called the explanatory variable. The resultant
variable is called the dependent variable or the outcome variable.
The group that received the special instruction is called the treatment group while the
other is called the control group. The treatment group receives a specific treatment (in this case,
instructions for improvement) while the control group does not.
A confounding variable is one that influences the dependent or outcome variable but
was not separated from the independent variable

Example.
As the evidence on the adverse effects of cigarette smoke grew, people tried many different ways
to quit smoking. Some people tried chewing tobacco or, as it was called, smokeless tobacco. A
small amount of tobacco was placed between the cheek and gum. Certain chemicals from the
tobacco were absorbed into the bloodstream and gave the sensation of smoking cigarettes. This
prompted studies on the adverse effects of smokeless tobacco. One study in particular used 40
university students as subjects. Twenty were given smokeless tobacco to chew, and twenty given a
substance that looked and tasted like smokeless tobacco, but did not contain any of the harmful
substances. The students were randomly assigned to one of the groups. The students’ blood pressure
and heart rate were measured before they started chewing and 20 minutes after they had been
chewing. A significant increase in heart rate occurred in the group that chewed the smokeless
tobacco.
Answer the following questions.
1. What type of study was this (observational, quasi-experimental, or
experimental)?
(This was an experiment, since the researchers imposed a treatment on each of the
two groups involved in the study)

2. What are the independent and dependent variables?


(The independent variable is whether the participant chewed tobacco or
not. The dependent variables are the students’ blood pressures and heart
rates.)

3. Which was the treatment group?


(The treatment group is the tobacco group—the other group was used as a
control)
4. Could the students’ blood pressures be affected by knowing that they are
part of a study?
(A student’s blood pressure might not be affected by knowing that
he or she was part of a study. However, if the student’s blood pressure were affected
by this knowledge, all the students (in both groups) would be affected similarly.
This might be an example of the placebo effect )

5. List some possible confounding variables.


(Answers will vary. One possible answer is that confounding variables
might include the way that the students chewed the tobacco, whether or not the
students smoked (although this would hopefully have been evened out with the
randomization), and that all the participants were university students)
6. Do you think this is a good way to study the effect of smokeless tobacco?
(Answers will vary. One possible answer is that the study design was fine, but that
it cannot be generalized beyond the population of university students (or people
around that age).

Exercises:
1. Identify each study as being either observational or experimental. a. Subjects were
randomly assigned to two groups, and one group was given an herb and the other group
a placebo. After 6 months, the numbers of respiratory tract infections each group had
were compared.
b. A researcher stood at a busy intersection to see if the color of the automobile that a
person drives is related to running red lights.
c. A researcher finds that people who are more hostile have higher total cholesterol
levels than those who are less hostile.
d. Subjects are randomly assigned to four groups. Each group is placed on one of four
special diets—a low-fat diet, a high-fish diet, a combination of low-fat diet and high-
fish diet, and a regular diet. After 6 months, the blood pressures of the groups are
compared to see if diet has any effect on blood pressure.
2. Classify each sample as random, systematic, stratified, or cluster.
a. In a large school district, all teachers from two buildings are interviewed to
determine whether they believe the students have less homework to do now than
in previous years.

b. Every seventh customer entering a shopping mall is asked to select her or his
favorite store.

c. Nursing supervisors are selected using random numbers to determine annual


salaries.
d. Every 100th hamburger manufactured is checked to determine its fat content.

e. Mail carriers of a large city are divided into four groups according to gender
(male or female) and according to whether they walk or ride on their routes.
Then 10 are selected from each group and interviewed to determine whether
they have been bitten by a dog in the last year.

Frequency Distributions and Graphs


Objectives
1. Organize data using a frequency distribution.
2. Represent data in frequency distributions graphically using histograms, frequency
polygons, and ogives.
3. Represent data using bar graphs, Pareto charts, time series graphs, and pie graphs.
4. Draw and interpret a stem and leaf plot
Introduction:
When conducting a statistical study, the researcher must gather data for the particular
variable under study. For example, if a researcher wishes to study the performance of CTE in the
Licensure examination for the teachers or LET , researcher gather information related to the
performance in LET such as the score , the number of passes etc...To describe situations, draw
conclusions, or make inferences about events, the researcher must organize the data in some
meaningful way. The most convenient method of organizing data is to construct a frequency
distribution. After organizing the data, the researcher must present them so they can be understood
by those who will benefit from reading the study. The most useful method of presenting the data is
by constructing statistical charts and graphs. There are many different types of charts and graphs,
and each one has a specific purpose
This chapter explains how to organize data by constructing frequency distributions and
how to present the data by constructing charts and graphs.

Organizing Data
A frequency distribution is the organization of raw data in table form, using classes and
frequencies.
A frequency distribution consists of classes and their corresponding frequencies. Each raw
data value is placed into a quantitative or qualitative category called a class. The frequency of a
class then is the number of data values contained in a specific class. A frequency distribution is
shown for the preceding data set
Two types of frequency distributions that are most often used are the categorical frequency
distribution and the grouped frequency distribution. The categorical frequency distribution is
used for data that can be placed in specific categories, such as nominal- or ordinal-level data. For
example, data such as political affiliation, religious affiliation, or major field of study would use
categorical frequency distributions
Example.
Distribution of Blood Types Twenty-five army inductees were given a blood test
to determine their blood type. The data set is
A B B AB O
O O B AB B
BBOAO
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution:
Since the data are categorical, discrete classes can be used. There are four blood types: A,
B, O, and AB. These types will be used as the classes for the distribution. The procedure for
constructing a frequency distribution for categorical data is given next.
Step 1 Make a table as shown.

Class Tally Frequency Percent


A |||
B ||||-||

O ||||-|||| 8
AB ||||
Step 2 Tally the data and place the results in column B.
Step 3 Count the tallies and place the results in column C.
Step 4 Find the percentage of values in each class by using the formula

where f frequency of the class and n total number of values. For example, in the
class of type A blood, the percentage is
Percentages are not normally part of a frequency distribution, but they can be added
since they are used in certain types of graphs such as pie graphs. Also, the decimal
equivalent of a percent is called a relative frequency

Step 5 Find the totals for columns C (frequency) and D (percent). The completed
table is shown

Class
Tally Frequency
Percentage

A ||||- 5 20 B ||||-|| 7 28

Total
O

AB
||||-|||| 9 36 |||| 4 16 25 100

Grouped Frequency Distributions When the range of the data is large, the data must be grouped
into classes that are more than one unit in width, in what is called a grouped frequency
distribution. 24, 25, 26, 27, 28, 29 30
Class Classlimits boundaries Tally Frequency cf< cf > rf 24–30 23.5–30.5 |||
31–37 30.5–37.5 |
38–44 37.5–44.5 ||||-
45–51 44.5–51.5 ||||-||||- 10
52–58 51.5–58.5
59–65 58.5–65.5
TOTAL
In this distribution, the values 24 and 30 of the first class are called class limits. The lower class
limit is 24; it represents the smallest data value that can be included in the class. The upper class
limit is 30; it represents the largest data value that can be included in the class. The numbers in the
second column are called class boundaries. These numbers are used to separate the classes so that
there are no gaps in the frequency distribution. The gaps are due to the limits; for example, there is
a gap between 30 and 31. Students sometimes have difficulty finding class boundaries when given
the class limits. The basic rule of thumb is that the class limits should have the same decimal place
value as the data, but the class boundaries should have one additional place value and end in a 5.
For example, if the values in the data set are whole numbers, such as 24, 32, and 18, the limits for
a class might be 31–37, and the boundaries are 30.5–37.5. Find the boundaries by subtracting 0.5
from 31 (the lower class limit) and adding 0.5 to 37 (the upper class limit)
Lower limit -0.5 31 - 0.5 = 30.5 lower boundary
Upper limit +0.5 37 + 0.5 = 37.5 upper boundary
To construct a frequency distribution, follow these rules:
1. There should be between 5 and 20 classes. Although there is no hard-and-fast rule for
the number of classes contained in a frequency distribution, it is of the utmost importance
to have enough classes to present a clear description of the collected data.
2. It is preferable but not absolutely necessary that the class width be an odd number. This
ensures that the midpoint of each class has the same place value as the data. The class
midpoint is obtained by adding the lower and upper boundaries and dividing by 2, or
adding the lower and upper limits and dividing by 2:

Or

For example, the midpoint of the first class in the example

3. The classes must be mutually exclusive. Mutually exclusive classes have non overlapping class
limits so that data cannot be placed into two classes.

4. The classes must be continuous. Even if there are no values in a class, the class must be
included in the frequency distribution. There should be no gaps in a frequency distribution.
The only exception occurs when the class with a zero frequency is the first or last class. A
class with a zero frequency at either end can be omitted without affecting the distribution

5. The classes must be exhaustive. There should be enough classes to accommodate all the
data. 6. The classes must be equal in width. This avoids a distorted view of the data

Sometimes it is necessary to use a cumulative frequency distribution. A cumulative


frequency distribution is a distribution that shows the number of data values less than or equal to
a specific value (usually an upper boundary). The values are found by adding the frequencies of
the classes less than or equal to the upper class boundary of a specific class. This gives an ascending
cumulative frequency.

Constructing a Grouped Frequency Distribution

Step 1: Determine the classes.

Find the highest and lowest values.


Find the range. Select the number of classes desired.

Find the width by dividing the range by the number of classes and rounding up.

Select a starting point (usually the lowest value or any convenient number less than the
lowest value); add the width to get the lower limits.

Find the upper class limits. Find the boundaries

Step 2 Tally the data.

Step 3 Find the numerical frequencies from the tallies, and find the cumulative frequencies.

All the different types of distributions are used in statistics and are helpful when one is
organizing and presenting data. The reasons for constructing a frequency distribution are as
follows: 1. To organize the data in a meaningful, intelligible way.

2. To enable the reader to determine the nature or shape of the distribution. 3. To

facilitate computational procedures for measures of average and spread 4. To

enable the researcher to draw charts and graphs for the presentation of data 5. To

enable the reader to make comparisons among different data sets.

Exercises I- Presentation of Data

Lets try a practice problem. Given these 90 scores, construct frequency distribution of
grouped scores having approximately

112 68 55 33 72 80 35 55 62

102 65 104 51 100 74 45 60 58

92 44 122 73 65 78 49 61 65

83 76 95 55 50 82 51 138 73

82 72 89 37 63 95 109 93 65

75 24 60 43 130 107 72 86 71

128 90 48 22 67 76 57 86 114

33 54 64 82 47 81 28 79 85

42 62 86 94 52 106 30 117 98
58 32 68 77 28 69 46 53 38

Histograms, Frequency Polygons, and Ogives

After you have organized the data into a frequency distribution, you can present them in
graphical form. The purpose of graphs in statistics is to convey the data to the viewers in pictorial
form. It is easier for most people to comprehend the meaning of data presented graphically than
data presented numerically in tables or frequency distributions. This is especially true if the users
have little or no statistical knowledge. Statistical graphs can be used to describe the data set or to
analyze it. Graphs are also useful in getting the audience’s attention in a publication or a speaking
presentation. They can be used to discuss an issue, reinforce a critical point, or summarize a data
set. They can also be used to discover a trend or pattern in a situation over a period of time.

The three most commonly used graphs in research are

1. The histogram.

2. The frequency polygon.

3. The cumulative frequency graph, or ogive (pronounced o-jive)

The histogram is a graph that displays the data by using contiguous vertical bars (unless the
frequency of a class is 0) of various heights to represent the frequencies of the classes

The frequency polygon is a graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are represented by the heights
of the points

The Ogive The third type of graph that can be used represents the cumulative frequencies
for the classes. This type of graph is called the cumulative frequency graph, or ogive. The
cumulative frequency is the sum of the frequencies accumulated up to the upper boundary of a
class in the distribution

The ogive is a graph that represents the cumulative frequencies for the classes in a
frequency distribution

Constructing Statistical Graphs

Step 1 Draw and label the x and y axes.

Step 2 Choose a suitable scale for the frequencies or cumulative frequencies, and

label it on the y axis.

Step 3 Represent the class boundaries for the histogram or ogive, or the midpoint for the

frequency polygon, on the x axis. Step 4 Plot the points and then draw the bars or

lines.

Distribution Shapes

When one is describing data, it is important to be able to recognize the shapes of the
distribution values. In later chapters you will see that the shape of a distribution also determines
the appropriate statistical methods used to analyze the data. A distribution can have many shapes,
and one method of analyzing a distribution is to draw a histogram or frequency polygon for the
distribution. Several of the most common shapes are the bell-shaped or mound-shaped, the uniform
shaped, the J-shaped, the reverse J-shaped, the positively or right-skewed shape, the negatively or
left-skewed shape, the bimodal-shaped, and the U-shape

Distributions are most often not perfectly shaped, so it is not necessary to have an exact
shape but rather to identify an overall pattern. A bell-shaped distribution shown in Figure (a) has a
single peak and tapers off at either end. It is approximately symmetric; i.e., it is roughly the same
on both sides of a line running through the center
A uniform distribution is basically flat or rectangular. See Figure b. A J-shaped distribution is
shown in Figure (c), and it has a few data values on the left side and increases as one moves to the
right. A reverse J-shaped distribution is the opposite of the J-shaped distribution. See Figure (d).

When the peak of a distribution is to the left and the data values taper off to the right, a distribution
is said to be positively or right-skewed. See Figure (e). When the data values are clustered to the
right and taper off to the left, a distribution is said to be negatively or left-skewed. See Figure(f)

When a distribution has two peaks of the same height, it is said to be bimodal. See Figure (g).
Finally, the graph shown in Figure (h) is a U-shaped distribution

Exercises.

1. Construct a histogram, frequency polygon, and ogive for the data using your prepared
frequency distribution in the previous exercises
2. Number of College Faculty The number of faculty listed for a variety of private colleges that
offer only bachelor’s degrees is listed below. Use these data to construct a frequency
distribution with 7 classes, a histogram, a frequency polygon, and an ogive. Discuss the
shape of this distribution. What proportion of schools have 180 or more faculty?

165 221 218 206 138 135 224 204


70 210 207 154 155 82 120 116

176 162 225 214 93 389 77 135

221 161 128 310

Source: World Almanac and Book of Fact

3. Using the histogram shown here, do the following

a. Construct a frequency distribution; include class limits, class frequencies,


midpoints, and cumulative frequencies.
b. Construct a frequency polygon.
c. c. Construct an ogive. 20.

5. Using the results from Exercise 4, answer these questions.

a. How many values are in the class 27.5–30.5?


b. How many values fall between 24.5 and 36.5? 14
c. How many values are below 33.5? 10
d. How many values are above 30.5? 16

Source: Bureau of Labor Statistics.Bar Graphs

When the data are qualitative or categorical, bar graphs can be used to represent the data.
A bar graph can be drawn using either horizontal or vertical bars.

A bar graph represents the data by using vertical or horizontal bars whose heights or
lengths represent the frequencies of the data

Example
College Spending for First-Year Students The table shows the average money
spent by first-year college students. Draw a horizontal and vertical bar graph for the data.

Electronics $728

Dorm decor 344

Clothing 141

Shoes 72

Solution

1. Draw and label the x and y axes. For the horizontal bar graph place the frequency
scale on the x axis, and for the vertical bar graph place the frequency scale on the y
axis. 2. Draw the bars corresponding to the frequencies. See Figure 2–10.

The graphs show that first-year college students spend the most on electronic equipment
including computers.

A Pareto chart is used to represent a frequency distribution for a categorical variable, and the
frequencies are displayed by the heights of vertical bars, which are arranged in order from highest
to lowes
The Time Series Graph When data are collected over a period of time, they can be represented by
a time series graph.

A time series graph represents data that occur over a specific period of

time. Example:

Workplace Homicides The number of homicides that occurred in the workplace for the
years 2003 to 2008 is shown. Draw and analyze a time series graph for the data.

Year ’03 ’04 ’05 ’06 ’07 ’08

Number 632 559 567 540 628 517

Solution

Step 1 Draw and label the x and y axes.

Step 2 Label the x axis for years and the y axis for the number.

Step 3 Plot each point according to the table. Step

4 Draw line segments connecting adjacent points. Do not try to fit a smooth curve through
the data points.

From the figure it can be seen that there was a slight decrease in the years ’04, ’05, and
’06, compared to ’03, and again an increase in ’07. The largest decrease occurred in ’08.
The Pie Graph Pie graphs are used extensively in statistics. The purpose of the pie graph
is to show the relationship of the parts to the whole by visually comparing the sizes of the sections.
Percentages or proportions can be used. The variable is nominal or categorical.A pie graph is a
circle that is divided into sections or wedges according to the percentage of frequencies in each
category of the distribution

Stem and Leaf Plots The stem and leaf plot is a method of organizing data and is a
combination of sorting and graphing. It has the advantage over a grouped frequency distribution of
retaining the actual data while showing them in graphical form

A stem and leaf plot is a data plot that uses part of the data value as the stem and part of
the data value as the leaf to form groups or classes.
Exercises.

1. Math and Reading Achievement Scores The math and reading achievement scores from the
National Assessment of Educational Progress for selected states are listed below. Construct
a back-to back stem and leaf plot with the data and compare the distributions.

Math Reading

52 66 69 62 61 65 76 76 66 67 63 57 59 59 55 71 70 70 66 61 55 59 74 72 73

61 69 78 76 77 68 76 73 77 77 80

Source: World Alman


Objectives

1. Summarize data, using measures of central tendency, such as the mean, median, mode, and
midrange.
2. 2Describe data, using measures of variation,such as the range, variance, and standard
deviation.
3. Identify the position of a data value in a data set, using various measures of position, such
as percentiles, deciles, and quartiles.
4. Use the techniques of exploratory data analysis, including boxplots and five-number
summaries, to discover various aspects of data.

Measures found by using all the data values in the population are called parameters.
Measures obtained by using the data values from samples are called statistic; hence, the average of
the sales from a sample of representatives is a statistic, and the average of sales obtained from the
entire population is a parameter.

A statistic is a characteristic or measure obtained by using the data values from a sample.

A parameter is a characteristic or measure obtained by using all the data values from a

specific population.

The mean is the sum of the values, divided by the total number of values. The
symbol represents the sample mean.

Where n represents the total number of values in the sample. For population , the greek

letter mu is used for the mean

Where N represent the total number of values in the population


Example .

Find the means of the sample

110 76 29 38 105 31

Solution

Using the frequency distribution for finding the mean is given here

Step 1 Make a table as shown

Find the sum of column D


Divide the sum by n to get the mean.
The Median
The median is the halfway point in a data set. Before you can find this point, the data

must be arranged in order. When the data set is ordered, it is called a data array. The

median is the midpoint of the data array. The symbol for the median is MD.

Example:

. Find the median.713, 300, 618, 595, 311, 401, and 292

Solution:

Step I

Arrange the data in order.

292, 300, 311, 401, 595, 618, 713

Step 2

Select the middle value.


292, 300, 311, 401, 595, 618, 713

Hence, the median is 401 rooms.

The Mode
The third measure of average is called the mode. The mode is the value that occurs most often in the data set. It
is sometimes said to be the most typical case.

The value that occurs most often in a data set is called the mode.

A data set that has only one value that occurs with the greatest frequency is said to be unimodal.If
a data set has two values that occur with the same greatest frequency, both values are considered
to be the mode and the data set is said to be bimodal. If a data set has more than two values that
occur with the same greatest frequency, each value is used as the mode, and the data set is said to
be multimodal. When no data value occurs more than once, the data set is said to have no mode. A
data set can have more than one mode or no mode at all

The Weighted Mean

Properties and Uses of Central Tendency

Properties and Uses of Central Tende

Properties of the measure of Central Tendencies

The Mean
1. The mean is found by using all the values of the data.

2. The mean varies less than the median or mode when samples are taken from the same

population and all three measures are computed for these samples.

3. The mean is used in computing other statistics, such as the variance.

4. The mean for the data set is unique and not necessarily one of the data values. 5.

The mean cannot be computed for the data in a frequency distribution that has an

open-ended class.

6. The mean is affected by extremely high or low values, called outliers, and may not be the

appropriate average to use in these situations.

The Median

1. The median is used to find the center or middle value of a data set.

2. The median is used when it is necessary to find out whether the data values fall into the

upper half or lower half of the distribution.

3. The median is used for an open-ended distribution.

4. The median is affected less than the mean by extremely high or extremely low values.

The Mode

1. The mode is used when the most typical case is desired.

2. The mode is the easiest average to compute.

3. The mode can be used when the data are nominal or categorical, such as religious

preference, gender, or political affiliation.

4. The mode is not always unique. A data set can have more than one mode, or the

mode may not exist for a data set.

The Midrange

1. The midrange is easy to compute.

2. The midrange gives the midpoint.

3. The midrange is affected by extremely high or low values in a data set


Computing Central Tendency of the the grouped Data
Class limit Frequency Class
Boundaries

Lower Cf <
Upper

13-19 2 2

20-26 7 9

27-33 12 21

34-40 15 33.5 36

41-47 9 45

48-54 3 48

55-61 5 53

62-68 2 55

Total 55

1. Computing the mean

41-47 9
Class limit frequency(f)
48-54 3

13-19 2

20-26 7

16 32
27-33 12
23 161 30 360 37 555 44 396
34-40 15 51 153

55-61 5 58 290 62-68 2 65 130 Total 55 2077


Interpretation: The average performance is 37.37. There 15 students perform
averagely, 21 students are below average and 19 students above average in their
performance.

Other method of computing the mean


The formula is

48 54 3 2
A B C
55 61 5 3

f d

D
13 19 2 -3
fd

20 26 7-6 -2
-14 -12 0
27 33 12 -1
9
6
34 40 15 0
15
41 47 9 1

62 68 2 4 8
Total 55 6
-6 + 14 + -12= -32
9 + 6 + 15 +8= 38 38- 32 = 6 =+6
AM= 37
1. The zero deviation is placed to frequency
2. AM is the assumed mean which is the midpoint of the zero deviation
3. Multiple the frequency to the deviation column
4. Then add the fd column.
5. Substitute to the formula

2. Computing the Median


f cf<

13 19 2 2

20 26 7 9

27 33 12 21

34 40 15 36

41 47 9 45

48 54 3 48

55 61 5 53

62 68 2 55

55

Compute

cf<= 21

34-40 15 33.5

Class Bounderies 21

40.5 36

LCB= 33.5
f= 15
n= 55
i= 7
Interpretation: The median of 36.53 divides the distribution into upper 50% and
lower 50%
3. Computing the Mode

27-33
12

34-40 15 33.5 9

41-47

Measure of Variation
Consider the two set of score
SET A SET B
10 35
60 45
50 30
30 35
40 40
20 25
EDUC
5https://docs.google.com/spreadsheets/u/3/d/1OwAzwVgqMORVrRBl40Klix0wji
2 BGuMnFcdefsPlBT4/edit02
Since the means are equal in above, you might conclude that both brands of paint last equally well.
However, when the data sets are examined graphically, a somewhat different conclusion might be
drawn.
Even though the means are the same for both brands, the spread, or variation, is quite different.
Figure shows that brand B performs more consistently; it is less variable. For the spread or
variability of a data set, three measures are commonly used: range, variance, and standard
deviation. Each measure will be discussed in this section.
The variance is the average of the squares of the distance each value is from
the mean. The symbol for the population variance is s2 (s is the Greek lowercase
letter sigma). The formula for the population variance is

Where
X= individual value

N= population size

The standard deviation is the square root of the variance. The symbol for the
population standard deviation is s. The corresponding formula for the population
standard deviation is

Example
Find the variance and standard deviation of:
35, 45, 30, 35, 40, 25
Solution
1. Find the mean

2. Subtract the mean from each value, and place the result in the second
column ( )

3. Square each result and place the squares in column C of the table.
X

35 0 0

45 10 100

30 -5 25
35 0 0

40 5 25

25 -10 100

4. Find the sum of the squares in column

5. Divide the sum by N

6. Take the square root to get the standard deviation.

Hence, the standard deviation is 6.5.

Variance and Standard Deviation for Grouped Data


The formula for the variance, denoted by is

Compute the Standard Deviation of the grouped data


B

Class f

6-10 1

11-15 2

16-20 3

21-25 5

26-30 4

31-35 3

36-40 2
1. Find the mean
To find the mean find the midpoint and multiply the frequency to the midpoint as in the
column D. Then add the column D. Then the sum id to be divided by total frequency
which is 20.
36 40 2 38
A B C
Total 20
LC UC f

6 10 1 8
D

11 15 2 f Midpoint13
8

26
16 20 3 18
54
21 25 5 115 23
112
26 30 4 28
99
76
31 35 3 33
490

2. Subtract the mean from midpoint


11 15 2 13
Class A 16 C 20 3 18

D
6 10 1 8
-16.5
-11.5
-6.5

21 25 5 3.
23Square the column D
Class B C
26 30 4 28
f
31 35 3 33

36 40 2 38 6 10 1 8

Total 11 15 2 13
3.5
16 20 3 18
8.5
21 25 5 23
13.5

26 30 4 28

31 35 3 33

36 40 2 38
DE
Total
(Mdpt-mran)^2 -16.5

272.25 -11.5 132.25


-6.5 42.25 -1.5 2.25 3.5 12.25
-1.5
8.5 72.25 13.5 182.25

4. Multiply the square oof deviation to frequency then add.


31 35 3 33
Class B C
36 40 2 38
f

6 10 1 8
DEF

11 15 2 13 -16.5 272.25 272.25 -11.5


132.25 264.5
16 20 3 18 -6.5 42.25 126.75 -1.5 2.25 11.25
3.5 12.25 49 8.5 72.25 216.75 13.5
21 25 5 23
182.25 364.5

26 30 4 28

Total 1305 5. Compute now the variance and standard deviation


Other method of Computing the Standard deviation (deviation method)

1. Find the deviation


2. Solve for fd and then find sum
3. Solve fd^2 and find sum
4. Substitute to the formula

,
Steps in solving the variance and standard for grouped data
1. Make a table as shown, and find the midpoint of each class.
LC UC f

A B C D

6 10 1
11 15 2 31 35 3

16 20 3 36 40 2

21 25 5

26 30 4 E

2. Multiply the frequency by the midpoint for each class, and place the products in column D.
6 10 1 8
A B C

LC UC f
DE

8
11 15 2 13 26
31 35 3 33
16 20 3 18
36 40 2 38
21 25 5 23

26 30 4 28
54 115 112 99 76

3. Multiply the frequency by the square of the midpoint, and place the products
in column E.
A B C DE

LC UC f

6 10 1 8 8 64

11 15 2 13 26 338

16 20 3 18 54 972
21 25 5 23 115 2645

26 30 4 28 112 3136

31 35 3 33 99 3267

36 40 2 38 76 2888

4. Find the sums of columns B, D, and E. The sum of column B is n, the sum of column D is f
Xm, and the sum of column E is f . The completed table is shown.
A B C DE

LC UC f

6 10 1 8 8 64

11 15 2 13 26 338

16 20 3 18 54 972

21 25 5 23 115 2645

26 30 4 28 112 3136

31 35 3 33 99 3267

36 40 2 38 76 2888

Total 20 ( 13310

5. Substitute in the formula and solve for to get the variance

Measure of other location


Percentiles
Percentiles are position measures used in educational and health-related fields to indicate the position of an
individual in a group.
Percentiles divide the data set into 100 equal groups.
Quartile Measure
This measure divides the distribution into four parts,

Consider the data below


82, 72, 64 71, 72,
54, 66, 56, 80, 72
,64, 64, 96, 58, 66
64 , 84 , 82, 70 , 74 ,
86, 90, 88, 90 , 90 ,
94, 68 , 90 , 82, 80

To determine the measure of other location of the ungrouped data


1. Arrange the data in descending or ascending order.

https://docs.google.com/spreadsheets/u/3/d/1OwAzwVgqMORVrRBl40Klix0wji2BGuMnFcdef
s PlBT4/edit

Formula

Set 1 94
90
90
90
90
86
84
84
82
82
80
80
74
72
72
72
72
71
70
68
68
66
66
66
64
64
64
64
58
54

SET 2
97
98
96
94
90
89
87
86
84
80
78
75
70
68
65
53
48
46
42
40
35
33
33
32
28
26
25
22
20
29

Sample computation will be shown in the class.


Other Location of the Grouped Data
For the grouped data
k = 1, 2, 3,

k= 1, 2, 3…...99

k= 1, 2, 3, ...9 Consider the


grouped data below
6 10 1
A B 11 15 7
16 20 8
LC UC f LCB UCB

Cf<

21 25 9

26 30 8

31 35 5

36 40 2

Total 40

Activity 4 Measure of Variation


Activity 5 Measure of other Location
I.From the data compute the following Include the interpretation ( ungrouped)
1. Q1
2. Q3
3. P45
4. P 75
5. P90
II. Compute the following using the grouped frequency distribution
6. Q1
7. Q3
8. P 45
9. P75
10. D9
Part II- Educational Measurement
Measurement
This will deal with item analysis , validity and reliability of a test.
ITEM ANALYSIS
After you have given a test to your students and graded it, you are not necessarily finished with
the test. You may choose to use the test or items from the test again sometime in the future.
Therefore, it is useful to know how well the individual test items measure the students’
knowledge of the material. By analyzing the test, you can gain information on how well you
taught the concepts to your students. Test items can tell you many things about student
performance.

Item analysis is a process of examining the student’s response to individual items in the test. It
consists of different procedures for assessing the quality of the test items given to the students.
Through item analysis we can identify which of the given are goods and defective test items.
Good items are to be retained and defective items are to be improved , to be revised or to be
rejected.

There are thee common types of quantitative item analysis which provide teacher with
three different types of information about individual test items.These are the difficulty index,
discrimination index and the response option analysis.
Difficulty index refers to the proportion of the number of students in the upper and lower
group who answered an item correctly.

Discrimination index is the power of the item to identify the low performing students
and fast performing students.

To determine the level of difficulty of an item using the range give below
Index range Difficulty level
0.00-0.20 Very Difficult
0.21-0.40 Difficult
0.41-0.60 Moderately difficult
0.61-0.80 easy
0.81-1.00 Very Easy

For determine the level of discrimination , Ebel and Frisbie ( 1985) recomend the use of the
range below.
0.19 and below Poor item, should be eliminated, or revised
0.20-0.29 Marginal iten, needs some revisiosions
0.30-0.39 Reasonably good item but possible for improvement 0.40 and above
Very good item

Steps in solving the difficulty index and discrimination index


1. Arrange the test paper from highest to lowest test score
2. Separate the scores into upper and lower groups. There are different methods to do this: :If a
class consists of 30 students who took the examination, arrange the score from highest to
lowest, then divide them into two groups..The highest scores belong to the upper group and
the lowest score belong to the lower group.. Other literatures suggest the use of 27 % , 30
% or 33%
3. Count the number of those who chose the alternative in the upper group and lower group
and record the information using the template below. Put mark to the correct answer let say
*
Options A BCDE

Upper Group

Lower Group

4. Compute the difficulty index and discrimination index and also the analysis of each
reponse in the distracters
5. Make an analysis,
Computing the difficulty index and discrimination index
`1.
Option A B* C D Difficulty Interpr Remarks
Interpretati etation
Discriminatio
E
Index
on
index

Upper group 3 10 4 0 3 Good Revised


(10+4)/40
(10-4)/20
Lower Group 4 4 8 0 4 =0.30 =0.35 difficulty

Example of Analysis
a. Only 35 percent of the examinees got the answer correctly, hence the test item is difficult
b. .More students from upper group got the answer answer correctly, hence, it has positive
discrimination.
c. Retain option A, C and E most students because most of the students in the lower group
selected it. Those options attract most students in the lower group.These options are
plausible but incorrect.

Conclusion. Retain the test item but change option D to make it effective (Plausible but
incorrect) for the upper and lower group. At least 5 % of the examinee choose the incorrect
option
2.
Option A B C D* E Difficulty Index Interpre Discrimin Interpret Remark
tat ion atioindex ati on s

Upper group 3 1 2 6 2 difficult (6-4 )/14 poor revised


(6+4)/28=
=0.14
0.36 1
Lower Group 5 0 4 4

3.
Option A B C D E* Difficulty Index Interpre Discrimin Interpret Remarks
tat ion atioindex ati on

Upper group 2 3 2 2 5 Moderat (5-8)/14


(5+8)/28
el

Lower Group 2 2 1 1 8 = - 0.21 POOr reject=0.46y d

4. Potentially Miskeyed Item


Option A* B C D Interpretati Discrimin Interpret Remarks
E Difficulty Index atioindex ati on
on

Upper group 1 2 3 10 4
5
Lower Group 3 4 4 4

5. Ambiguous Item.( This happens when more students from upper group choose equally an
incorrect option and the keyed ,
Option A B C D E* Difficulty Index Interpre Discrimin Interpret Remarks
tat ion atioindex ati on

Upper group 7 1 1 2 8
6
Lower Group 6 2 3 3

6. Guessing Item ( Students from the upper group have equal spread of choices among the given
alternatives.)
Option A B C* Difficulty Interpre Discrimin Interpret Remarks
DE tat ion atioindex ati on
Index

Upper group 4 3 4 36
Lower Group 3 4 3
45

7.
Option A B C* Difficulty Interpre Discrimin Interpret Remarks
DE tat ion atioindex ati on
Index

Upper 4 3 4 36
group 3 4 3 45
Lower
Group

II.
The first 13 students are the upper group and next 13 students are the lower group
Stude 1 2 3 4 5 6 8 9 10 11 12 13 15
nts
/Item
Number

A 1 0 1 1 111111 0 1 1

B 1 1 1 0 111110 1 1 1

C 1 0 1 1 111110 1 1 1

D 1 1 1 0 111110 0 1 1

E 1 0 1 1 110110 0 1 1

F 1 0 1 0 111101 0 0 1

G 0 1 1 1 011111 1 0 1

H 1 1 0 0 111101 1 0 1

I 0 1 0 1 111100 0 0 1

J 0 1 0 1 110110 0 1 1

K 0 1 1 1 011111 0 1 1
L 0 1 0 1 110101 110110 110101 110110 1 1 1
M 0 1 1 1 010111 110110 1 0 1
N 0 1 0 0 1 0 1
O 0 0 0 0 1 0 1
P 0 0 0 0 1 0 1
Q 1 0 0 0 1 0 1

R 0 0 1 0 010111 1 1 1

S 0 1 1 0 110110 011101 1 1 1
T 1 0 0 0 1 1 1

U 0 0 1 1 110110 1 1 1

V 0 0 1 0 011101 1 0 1

W 0 0 0 0 110110 1 0 1

X 1 0 1 1 011101 1 0 1

Y 1 1 0 1 110111 1 0 1

Z 0 0 0 1 011101 1 0 0

U 7 9 9

L 4 3 5

Dif. 0.42 =(9 =14/


Index +3) 260.
/26 54

VI MD MD

Dis. 0.23 (9- 9-


Index 3)/1 5/13
3 0.31

IVI margi RG
n

al

Remark Revi Revi


se d se d
ESTIMATING RELIABILITY
I. Test–Retest Reliability
If you are primarily concerned about a test’s stability over time, then administering the same test
to the same group of students at two different times would appear to be the way to go. We simply
compute a correlation between the scores from two testing situations and we have an estimate of
test–retest reliability. This procedure is most appropriate for tests that measure stable traits, such
as intelligence.
However, there are two major drawbacks with this technique
1. The first has to do with memory.Unless the tests are separated by a substantial amount of
time (let’s say a year or more), those taking the test the second time may remember how
they answered the items the first time they took it and simply repeat the same answers.
2. The second drawback is the lack of stability of human characteristics over time. Most
psychological traits, such as personality characteristics, are not extremely stable over time.
If the test–retest reliability coefficient is not acceptably high, we will not be able to tell if
the differences in scores are because of the low reliability of the test or individual changes
in the psychological characteristic. We can reduce the stability of the characteristic-over-
time issue by administering the second test soon after the first administration. However,
doing so increases the memory problem—the likelihood that the students simply remember
how they answered the questions the last time they took the test and answer them in the
same way.

II. Alternate Form Reliability


When we use alternate forms of the same test, we would like to be able to demonstrate that
if a student obtained a certain score on one form of the test then he or she would obtain a
very similar score on any other form of the test . If the test is brief the students can complete
both forms the same day. If the test is longer the students could take the tests on successive
days. Finally, you need to compute the correlation to obtain the alternate form reliability
estimate.

III Internal Consistency Reliability


The third way of estimating reliability is with internal consistency: Is each item on the
test essentially measuring the same general skill? This approach has an advantage over
the other two approaches in that it requires only one administration of the test.There are
basically two ways to measure internal consistency reliability. You can use either split-
half reliability or a variety of internal consistency formulas such as KR-20.
Administered the test once. Score the total test and apply the Kud-Richardson
formula.The Kuder-Richardson 20 formula is applicable only in situation where students’
response are scored dichotomously, and therefore, is most useful with traditional test
items that are scored as right or wrong, true or false, and yes or no type This estimates of
reliability provide information whether the degree to which the test items in the test
measure is of the same characteristic, it is an assumption that all items are equal in
difficulty.
Another formula for testing the internal consistency of a test is the KR 21
formula, which is not limited to test items that are score dichotomously

Split-half reliability resembles alternate form reliability in that you separate the test into halves and
then compare each half. For example, let’s say that we have a 50-item test. The most common
approach is to split the test into two tests by separating the odd numbered and even-numbered
items.2 First, compute the score that each student obtained on the 25 odd-numbered items (items 1,
3, 5, 7, . . ., 49). Then, compute the score that each student obtained on the 25 even-numbered items
(items 2, 4, 6, 8, . . ., 50). Now you have two scores for each student. Next, you compute the
correlation between the two sets of scores

Comparison of methods for estimating reliability

Method Type of Reliability Procedure


Test–retest Stability over time Give a group the test and repeat several months later.

Alternate form Alternate forms Develop two parallel forms of the same test. Give each test to the same
Group.

Split-half Internal consistency Give the test to a group. Divide the items into odd and even. Correlate the
scores for each half test.

KR-20 and Cronbach’s α Internal consistency Give the test to a group. Compute the KR-21 KR-20 reliability or
Cronbach’s α.

Level of Reliability Coefficient

Reliability Coefficient Interpretation

Above 0.90 Excellent reliability/ very high reliability

0.81-0.90 Very good classroom test

0.71-0.80 Good for classroom test. There are probably few items needs to be
improved

0.61- 0.70 Somewhat low. The test need to be supplemented by other

measures( more test) to determine the grades

0.51-0.60 Suggest need for revision of test, unless it is quite short(ten or fewer
items)Needs to be supplemented by other measures (
more test) for grading

0.50 and below Questionable reliability. Thus test should not contribute heavily
to the course grade, and it needs revision

Computing the Reliability Coefficient


For Test–retest
This formula Pearson’s product moment correlation coefficient is use to
computest the reliability of the test through test -retest

A test was conducted to a group of 10 students in a mathematics I twice after one


day.The test conducted after one day is exactly the same test given on the first day. Data shown
below. Find the reliability coefficient then interpret?
Students First Secon
d

1 36 38

2 26 34

3 38 38

4 15 27

5 17 25

6 28 26

7 32 35

8 35 36

9 12 19

10 35 38

Find the

Students First (x) Second (y) xy x^{2} y^2

1 36 38 1368 1296 1444

2 26 34 884 676 1156

3 38 38 1444 1444 1444


4 15 27 405 225 729

5 17 25 425 289 625

6 28 26 728 784 676

7 32 35 112 1024 1225


0

8 35 36 1260 1225 1296

9 12 19 228 144 361

10 35 38 1330 1225 1444

sum 274 316 9192 8332 10400

Then plugin to the formula

The reliability coefficient using the pearson r 0.91, means that it has a very high reliability. The
score of ten students conducted twice with one-day intervals are consistent. Hence the test has
very high reliability.

Alternate form
Use also the person's product moment correlation coefficient.
`
Two forms of test were administered to ten students. Is the test reliable?
Stud Form Form II
ent s !

1 12 20

2 20 22
3 19 23

4 17 20

5 25 25

6 22 20

7 15 19

8 16 18

9 23 25

10 21 24

Solution

Find:

Split-half Test
Reliability of the half test is computed by using the pearson’s product moment
correlation

Formula for the reliability of the whole test

stud Odd Even


en t

1 15 20

2 19 17

3 20 24

4 25 21

5 20 23
6 18 22

7 19 25

8 26 24

9 20 18

10 18 17

Internal Consistency

Where
k is the number of items
p is the proportion of student got the item correctly( index of
difficulty)
q =1 - p
is the variance of total score
C

KR-21

Where
k is the number of item
is the mean
variance of the total score

a 40 item test was administered to 15 students. Find the reliability of the test using
Kuder-Richardson formula

students score

1 16

2 25
3 35

4 39

5 25

6 18

7 19

8 22

9 33

10 36

11 20

12 17

13 26

14 35

15 39

Teacher A administered 20 true or false test for his Math class. Below are the test score of
40 students. Find the reliability coefficient using KR 20 formula and interpret the computed
value,and solve also the coefficient of determination

Item x
Number

1 25

2 36

3 28

4 23
5 25

6 33

7 38

8 15

9 23
10 25
11 36
12 35
13 19

14 39
15 28

16 33

17 19

18 37

19 36

20 25

QUESTIONS

stu 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
de
nts

1 0 1 0 1 1 1 1 111 0100 01 1 0 1 1

2 1 0 0 1 0 0 1 101 1010 10 1 0 1 1

3 1 0 1 1 1 1 1 111 0111 01 1 1 1 1
4 1 1 0 1 1 1 0 111 1100 01 1 1 1 1

5 1 1 1 1 1 1 1 111 1101 11 1 1 1 1

6 1 0 0 0 0 1 0 010 0000 10 1 0 1 1

7 1 0 0 0 0 1 1 011 0010 00 0 0 1 0

8 1 0 0 0 1 0 0 011 0100 00 0 1 1 0

9 1 0 1 1 1 0 0 100 1010 11 0 0 1 0

10 1 1 0 1 1 0 0 111 0111 10 0 1 1 1

11 1 1 1 1 1 1 1 000 1010 10 1 1 0 1

12 1 1 0 0 1 0 0 101 0010 10 1 0 0 1

13 1 0 0 1 0 1 0 101 1110 10 1 0 1 0

14 1 0 0 1 0 0 1 100 0110 10 0 0 1 0

1.Compute the difficulty Index


2. Test reliability of the

Validity
The most important characteristic of a test is validity Does a test actually measure what
We think it is measuring?

Three ways to look at validity.


1. content-related evidence of validity,
2. criterion-related evidence of validity,
3. and construct-related evidence of validity

Content-Related Evidence of Validity

Content-related evidence of validity refers to the match between the test items
and the content that was taught

One way to look at validity is to use the content-related approach. We will be examining
content-related evidence of validity as a general concept. We will also look at three
specifics types of content-related evidence of validity: instructional validity, curricular
validity, and face validity

Instructional validity refers to the match between the items on the test and the
material that was taught

Curricular validity refers to the match between the items on the test and the official
curriculum

If the items on a test appear to be measuring the appropriate skills, and appear to be
appropriate for the students taking the test, the test is said to have face validity

Criterion-Related Evidence of Validity


A test has criterion-related evidence of validity if the test scores correlate well with
another method of measuring the same behavior or skill.

To establish criterion-related validity, you actually have to follow three steps. You give a
test to a group of students. Next, each student is required to perform a series of tasks that
also measure the same skill. Finally, you correlate the test scores with the scores that the
students obtain with the alternative assessment. This correlation coefficient can be
interpreted as a validity coefficient.

Let’s say that you are teaching your 7th-grade students a variety of math skills. At some
point, you will want to see how well they can perform those skills. If you give them
problems to solve demonstrating all of the skills that were taught, it might take them
several hours to be able to demonstrate all of those skills. Instead, you can develop a test
that samples the skills that were taught and can be administered in 40 minutes. Hopefully,
the briefer test will be as effective in measuring their skills as would the longer procedure.
Essentially, you are using the test score as an estimate of how the students would do using
all of the skills taught. In this case the longer procedure is the criterion and, hopefully, the
shorter test demonstrates criterion-related evidence of validity.

Concurrent Validity
Criterion-related evidence of validity comes in two forms, the first of which is known as
concurrent validity. Concurrent validity is demonstrated when a test is correlated with another
measure of the same behavior or skill that is taken at about the same time as the test is given. The
test measures the student’s current skill level

A test has concurrent validity if it displays a positive correlation with another method of
measuring the same behavior or skill given at about the same time.

For another example of concurrent validity, imagine that you are a driver’s education
teacher and that you have developed a paper-and-pencil test of driving skill. After you
administer the written test to your students you also evaluate them on a driving course
where they actually have to deal with simulated driving challenges. If the scores on the
written driver’s test correlate well with the scores that the students received in negotiating
the driving course, then you have demonstrated evidence that your written test has
concurrent validity

Predictive Validity
The other type of criterion-related evidence of validity is known as predictive validity.
We can sometimes use a test to predict future performance. Good examples of this
include the SATs and the ACTs.

A test is said to have predictive validity if it is positively correlated with some future
behavior or skill

Intelligence tests are also expected to have predictive validity. When Alfred Binet
developed the first intelligence test in France in the early 1900s, it was expected to be able
to predict those children who would be successful in school and those who would struggle
with the academic demands of the classroom. Since intelligence tests do predict future
school performance, that is the primary reason for using them in school With criterion-
related evidence of validity we are correlating a test with some other
measure. Therefore, the correlation coefficient is frequently interpreted as a validity
coefficient.
Construct-Related Evidence of Validity
A measurement device is said to display construct-related evidence of validity if it measures what
the appropriate theory says that it should be measuring.

Sometimes, it is difficult to measure construct-related evidence of validity because you


must have another way (besides the test) to measure the construct. Finding another way to
measure the construct can require creativity. In other cases, we must rely on a number of
ways to measure the construct. For example, if you develop a new intelligence test, you
have to relate it to the way you define intelligence. If it is defined as the ability to solve
problems, you would need to find other ways to measure problem
solving skills. You might actually need to have a group of people solve problems in a
variety of contexts and then correlate their success in this with the scores they received on
the intelligence test. A high correlation would provide evidence of construct-related
validity. In other situations, it is easier to demonstrate construct-related evidence of
validity. If there is already a test available that has a reputation for displaying construct
related evidence validity, then you only have to administer your test and the established test
to the same group of people and correlate the test results. Since the Stanford-Binet and
Wechsler intelligence tests have such good reputations, most new intelligence tests
are compared to them. A high correlation with one of those tests is often one source of
adequate evidence of construct-related validity.

Part III Testing Hypothesis


This chapter introduces the basic concepts of hypothesis testing. A statistical
hypothesis is a conjecture about a population. There are two types of statistical
hypotheses: the null and the alternative hypotheses. The null hypothesis states that
there is no difference, and the alternative hypothesis specifies a difference. To test
the null hypothesis, researchers use a statistical test. Many test values are computed
by using

Researchers compute a test value from the sample data to decide whether the null
hypothesis should be rejected. Statistical tests can be one-tailed or two-tailed,
depending on the hypotheses.
The null hypothesis is rejected when the difference between the population
parameter and the sample statistic is said to be significant. The difference is
significant when the test value falls in the critical region of the distribution. The
critical region is determined by the level of significance of the test. The level is
the probability of committing a type I error. This error occurs when the null
hypothesis is rejected when it is true. Three generally agreed upon significance
levels are 0.10, 0.05, and 0.01.

A statistical hypothesis is a conjecture about a population parameter. This


conjecture may or may not be true.
There are two types of statistical hypotheses for each situation: the null hypothesis
and the alternative hypothesis.

1. The null hypothesis, symbolized by H0, is a statistical hypothesis that states


that there is no difference between a parameter and a specific value, or that
there is no difference between two parameters.
2. The alternative hypothesis, symbolized by H1, is a statistical hypothesis that
states the existence of a difference between a parameter and a specific value,
or states that there is difference between two parameters
The level of significance is the maximum probability of committing a type I error.
This probability is symbolized by a (Greek letter alpha). That is, P(type I error) a.

Solving Hypothesis-Testing Problems


Step 1 State the hypotheses and identify the claim.
Be sure to state both the null and the alternative hypotheses

Step 2 Find the critical value(s)


Step 3 Compute the test value.
Step 4 Make the decision to reject or not reject the null hypothesis.
Step 5 Summarize the result

Testing the Difference Between Two Means of Independent Samples:


Using the t Test
The formula

Follows the format

Where
is the the observed difference between sample means
where the expected value which is equal to zero

is the standard error of the difference between two means

Assumption for the T test for independent means when the variance of population
are unknown

1. The sample are random samples


2. The sample data are independent of one another
3. When the sample sizes are less than 30, the population must be normally or
approximately normally distributed

Example;
Can it be concluded that data below are different at 0.05 level of significance?
GROUPS A B

Sample mean 191 199


Sample standard deviation 38 12

Sample size 8 10

Assume the populations are normally distributed.

Solution
Step I. State the hypothesis and identify the claim for the mean
There is no significant difference between the group A and groupB
There is significant difference between the group A and groupB

Step 2: Find the critical values. Since the test is two-tailed, since a 0.05, and since the
variances are unequal, the degrees of freedom are the smaller of n -1 or n -1. In 1 2
this case, the deg rees of freedom are 8- 1= 7. Hence, from

Hence, fromTable F, the critical values are +2.365 and -2.365.


Step 3 Compute the test valueSince the variances are unequal, use the first
formula.

Step 4Make the decision. Do not reject the null hypothesis, since - 0.57> -2.365.

Step 5. Summarize
the results. There is not enough evidence to support the claim that the
The mean of the two groups are not different.

Try This
1. Hours Spent Watching Television According to Nielsen Media Research, children
(ages 2–11) spend an average of 21 hours 30 minutes watching television per week
while teens (ages 12–17) spend an average of 20 hours 40 minutes. Based on the
sample statistics obtained below, is there sufficient evidence to conclude a difference
in average television watching times between the two groups? Use a 0.01.
Children Teens

Sample mean 22.45 18.50

Sample Variance 16.4 18.2

Sample size 15 15

2. Test the claim that the means are different. Test at 0.05 level of significant
Group I Group II

10.4 12.6 10.6 10.2 8.8

11.1 14.7 9.6 9.5 9.5


10.8 12.9 10.1 11.2 9.3

11.7 13.3 9.4 10.3 9.5

12.8 14.5 9.8 10.3 11.0

Testing the Difference Between Two Means: Dependent Samples Samples


are considered to be dependent samples when the subjects are paired or matched in some
way.
Examples:
1. A researcher may want to design an SAT preparation course to help students raise
their test scores the second time they take the SAT.Hence, the differences
between the two exams are compared.
2 Students might be matched or paired according to some variables that is pertinent to the
study; then one a student is assigned to one group, and the other student is assigned to a
second group. For Instance, in a study involving learning, students can be selected and
paired according to their IQs. That is, two students with the same IQ will be paired. Then
one will be assigned to one sample group (which might receive instruction by
computers), and the other student will be assigned to another sample group (which might
receive instruction by the lecture discussion method). These assignments will be done
randomly. Since a student’sIQ is important to learning, it is a variable that should be
controlled

When the samples are dependent, a special t test for dependent means is used.
This test employs the difference in values of the matched pairs. The hypotheses
are as follows:

Whe
Example
Try this….

1. Retention Test Scores A sample of non-English majors at a selected college was used in
a study to see if the student retained more from reading a 19th-century novel or by
watching it in DVD form. Each student was assigned one novel to read and a different
one to watch, and then they were given a 20-point written quiz on each novel. The test
results are shown below. At a 0.05, can it be concluded that the book scores are higher
than the DVD scores?
BOOK 90 80 90 75 80 90 84

DVD 85 72 80 80 70 75 80

2. Improving Study Habits As an aid for improving students’ study habits, nine students
were randomly selected to attend a seminar on the importance of education in life. The
table shows the number of hours each student studied per week before and after the
seminar. At a 0.10, did attending the seminar increase the number of hours
the students studied per week?
Before 9 12 6 15 3 18 10 13 7

After 9 17 9 20 2 21 15 22 6

Correlation and Regression


Correlation is a statistical method used to determine whether a linear relationship
between variables exists. Regression is a statistical method used to describe the nature of
the relationship between variables, that is, positive or negative, linear or nonlinear.
The purpose of this chapter is to answer these questions statistically:
1. Are two or more variables linearly related?
2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?

To answer the first two questions, statisticians use a numerical measure to determine whether two
or more variables are linearly related and to determine the strength of the relationship between or
among the variables. This measure is called a correlation coef icient.

Scatter Plots and Correlation


In simple correlation and regression studies, the researcher collects data on two
numerical or quantitative variables to see whether a relationship exists between the
variables.

For example, if a researcher wishes to see whether there is a relationship between


number of hours of study and test scores on an exam, she must select a random sample
of students, determine the hours each studied, and obtain their grades on the exam. A
table can be made for the data, as shown here. Test at 0.05 level of significant
Hours of study x
B
Students Grade y
C

6 82 2 63 1 57

F
D

E
5 88 2 68 3 75

D 5
students Hour/s of
StudyE 2

A 6
F 3

B 2

C 1
Grade 82
63
57 68
88 75

A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the
independent variable x and the dependent variable y.
The scatter plot is a visual way to describe the nature of the relationship between the
independent and dependent variables. The scales of the variables can be different, and
the coordinates of the axes are determined by the smallest and largest data values of
the
variables.

Examples

The range of the correlation coefficient is from -1 to 1. If there is a strong positive


linear relationship between the variables, the value of r will be close to 1. If there is a
strong negative linear relationship between the variables, the value of r will be close
to
1. When there is no linear relationship between the variables or only a weak relation
ship, the value of r will be close to 0
The Significance of the Correlation Coefficient
The range of the correlation coefficient is between 1 and- 1. When the value of r is near 1 or
1, there is a strong linear relationship

To make this decision, you use a hypothesis-testing procedure. The traditional


method is similar to the one used in previous chapters.
Step 1 State the hypotheses.
Step 2 Find the critical values.
Step 3 Compute the test value.
Step 4 Make the decision.
Step 5 Summarize the results.
The population correlation coefficient is computed from taking all possible (x, y) pairs; it is designated by the
Greek letter r (rho). The sample correlation coefficient canmthen be used as an estimator of r if the following
assumptions are valid.
1. The variables x and y are linearly related.
2. The variables are random variables.
3. The two variables have a bivariate normal distribution.

A bivariate normal distribution means that for the pairs of (x, y) data values, the corresponding y
values have a bell-shaped distribution for any given x value, and the x values for any given y
value have a bell-shaped distribution.

Formally defined, the population correlation coefficient r is the correlation computed by using
all possible pairs of data values (x, y) taken from a population.

In hypothesis testing, one of these is true:


H0: rho = 0 This null hypothesis means that there is no correlation between the x and y
variables in the Population.
H1: rho is not equal to zero This alternative hypothesis means that there is a significant
correlation between the variables in the population.

When the null hypothesis is rejected at a specific level, it means that there is a significant
difference between the value of r and 0. When the null hypothesis is not rejected, it means that
the value of r is not significantly different from 0 (zero) and is probably due to chance.

An educator wants to see how the number of absences for a student in her class affects the
student’s final grade. The data obtained from a sample are shown.
Number of absences(x) 10 12 2 0 8 5
Final grade (y) 70 65 96 94 75 82
What is the final grade if the student have 4 absences? 6 absences?

Determination of the Regression Line Equation

( ) ∑��2
( )− ∑��
( ) ∑����
∑��
()
�� = or
2 �� = �� − ����
�� ∑��2
( )− ∑��
()

( )− ∑��
( ) ∑��
�� = ()
�� ��
∑���� ∑��2

( )−(��)2
Where a is the y’ intercept and b is the slope of the line

y’= a +bx

Activity
SAT Scores Educational researchers desired to find if a relationship exists between the average
SAT verbal score and the average SAT mathematical score. Several states were randomly
selected, and their SAT average scores are recorded below. Is there sufficient evidence to
conclude a relationship between the two scores?

Verbal x
526 504 594 585 503 589
Math y
530 522 606 588 517 589

The Chi- Square


The chi- square distribution can be used the independence of two variables. And to test
homogeneity of proportion ( If proportion of high school senior who attended college immediately
after graduation the same for the northern, southern, eastern and westerns parts of United Stated
Formula for the Chi -square
��2 = ∑( 0−��)2
��

Two assumption are for the chi-square goodness to fit


1. The data are obtained from random samples
2. The expected frequency in each category must be more than 5.

For example .
Suppose a market analyst wished to see weather consumers have any preference among the five
flavors of a new fruit soda.
Cherry Strawberry Orange Lime Grape

32 28 16 14 10

If there is no preference, one would expect each flavor be selected with equal frequency

Part of the table for chi-square distribution. Degree of freedom ids defined as number of
categories minus one that there is preference in the selection of fruit soda flavor

Is there enough evidence to reject the claim that there is no preference in the selection of fruit
soda flavor .
Cherry Strawberry Orange Lime Grape

32 28 16 14 10

Solution
Step 1 State the hypotheses and identify the claim.
H0: Consumers show no preference for flavors (claim).
H1: Consumers show a preference.

Step 2 Find the critical value. The degrees of freedom are 5 - 1 = 4, and a = 0.05. Hence, the
critical value from Table G in Appendix C is 9.488.

Step 3 Compute the test value by subtracting the expected value from the corresponding
observed value, squaring the result and dividing by the expected value, and finding the sum. The
expected value for each category is 20, as shown previously

Tests Using Contingency Tables


When data can be tabulated in table form in terms of frequencies, several types of hypotheses can
be tested by using the chi-square test. Two such tests are the independence of variables test and the
homogeneity of proportions test. The test of independence of variables is used to determine
whether two variables are independent of or related to each other when a single sample is selected.
The test of homogeneity of proportions is used to determine whether the proportions for a variable
are equal when several samples are selected from different populations.
Relationship for ordinal data
Spearman Rho

Individual Test x Test y 1 18 24


2 17 28
3 14 30
4. 13 26
5. 12 22
6 18 18
7. 8 15
8 8 12
Solution
x y

1 18 24

2 17 28

3 14 30

4 13 26

5 12 22

6 18 18

7 8 15
8 8 12

F- test

One-Way Analysis of Variance


Use the one-way ANOVA technique to determine if there is a significant
difference among three or more means.
When an F test is used to test a hypothesis concerning the means of three or more popu
lations, the technique is called analysis of variance (commonly abbreviated as
ANOVA).

With the F test, two different estimates of the population variance are made. The first
estimate is called the between-group variance, and it involves finding the variance of
the means. The second estimate, the within-group variance, is made by computing the
variance using all the data and is not affected by differences in the means.

If there is no difference in the means, the between-group variance estimate will be


approximately equal to the within-group variance estimate, and the F test value will be
approximately equal to 1. The null hypothesis will not be rejected.

However, when the means differ significantly, the between-group variance will be much
larger than the within-group vari-ance; the F test value will be significantly greater than 1;
and the null hypothesis will be rejected. Since variances are compared, this procedure is
called analysis of variance(ANOVA).

For a test of the difference among three or more means, the following hypotheses
should be used:
�� : =
0µ1 = μ2µ3 = . . . = μ��
H1: At least one mean is different from the others.
The degrees of freedom for this F test are d.f.N. k -1, where k is the number of groups, and
d.f.D. N - k, where N is the sum of the sample sizes of the groups
�� = ��1 + ��2 + .. . ����
the sample sizes need not be equal. The F test to compare means is always right-tailed.

A researcher wishes to try three different techniques to lower the blood pressure of individuals
diagnosed with high blood pressure. The subjects are randomly assigned to three groups; the
first group takes medication, the second group exercises, and the third group follows a special
diet. After four weeks, the reduction in each person’s blood pressure is
recorded. At a 0.05, test the claim that there is no difference among the means. The data are
shown.
Medication Exercise Diet
10 6 5
12 8 9
9 3 12
15 0 8
13 2 4

Mean 11.8 3.8 7.6


Variance 5.7 10.2 10.3

You might also like