0% found this document useful (0 votes)
30 views35 pages

STAT 101 Chapter 2 PPT

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views35 pages

STAT 101 Chapter 2 PPT

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 2: Data Collection

University of the Philippines Cebu


Mathematics and Statistics Programs

February 15, 2024

(STAT101- Elementary Statistics) Chapter 2: Data Collection February 15, 2024 1 / 35


Statistical Inquiry

2 / 35
Outline

1 Measurement and Levels of Measurement


2 Methods of Data Collection
3 Sampling Methods
4 Questionnaire Construction

3 / 35
2. Data Collection

2.1 Measurement and Levels of Measurement


Measurement
Measurement is the process of assigning a number or a numerical value to
a characteristic of the object that is being measured.

Properties of Numbers
Identity - property of a number that enables a person to distinguish one number
from the other and are used for classification purposes only.
Order - refers to the way the numbers are arranged in a sequence.
Additivity - is the property that allows us to add numbers of equal scales.

4 / 35
2. Data Collection

Levels of Measurement
1 Nominal Scale - possess only the property of identity (e.g. religion, gender)
2 Ordinal Scale - possess the properties of both identity and order (e.g. education
level, customer satisfaction rating).
3 Interval Scale - possess the properties of identity, order, and equality of scale but
do not have the property of absolute zero (e.g. temperature).
4 Ratio Scale - possess all the properties of identity, order, equality of scale, and
absolute zero (e.g. height, weight).

5 / 35
2. Data Collection

6 / 35
2. Data Collection

Exercise 2.1: Identify the most appropriate level of measurement of the


given variables.

1 IQ scores
2 Education Level
3 Income
4 Ethnicity
5 Nationality
6 Allowance of a student (in pesos)
7 Marital status
8 Job Ranking

7 / 35
2. Data Collection

Exercise 2.1 Answers: Identify the most appropriate level of measurement


of the given variables.

1 IQ scores - Interval
2 Education Level - Ordinal
3 Income - Ratio
4 Ethnicity - Nominal
5 Nationality - Nominal
6 Allowance of a student (in pesos) - Ratio
7 Marital status - Nominal
8 Job Ranking - Ordinal

8 / 35
2. Data Collection
2.2 Methods of Data Collection

1 Use of Documented Data - recorded data from either a primary or secondary


source.
• Primary Data - documented by the primary source or data collectors
themselves (e.g. Central Bank, PSA, Pulse Asia, DOH).
• Secondary Data - documented by an the secondary source or agency or
entity other than the data collectors (e.g. medical student documented data
for his/her research paper, which were originally collected by the DOH).
2 Survey - a method of collecting data on the variable of interest by asking people
questions. The people who answer the questions in a survey are the respondents.
• Census - data came from asking all the people in the population (e.g.
Philippines Population Census, Labor Force Survey.
• Sample Survey - data came from asking a sample of people selected from a
well-defined population.
3 Experiments - is a method of collecting data where there is a direct human
intervention on the conditions that may affect the values of the variable of interest
(e.g. The Effect of Sunlight to the Growth Of Monggo Seeds Experiment).
9 / 35
2. Data Collection

4 Observation - is a method of collecting data on the phenomenon of interest by


recording the observations made about the phenomenon as it actually happens
(e.g. Studying the culture and norms of a particular ethnic group).
5 Interviews - are mostly face to face and personal approach type of data collection.
This is mostly used in qualitative research in which the inquiry will start with a set
of questions but can be expanded to other sets based on the responses of the
interviewer.
6 Focus Groups - is achieved when you have a set of people (usually six to twelve
participants) who you will interview to gather responses that will add value to your
inquiry. This allows interaction among your respondents which may give more
valuable inputs. You can serve as a moderator while documenting information
from their interaction, or you can have another moderator while you observe and
document the interaction.

10 / 35
2. Data Collection
2.3 Sampling Methods

11 / 35
2. Data Collection

2.3 Sampling Methods

Sampling methods can be classified as Probability Sampling and


Non-Probability Sampling.

Probability Sampling
Probability sampling is based on the principle that every element in the
population has a non-zero chance of being chosen.

Non-probability Sampling
Non-probability sampling methods do not use any randomization
mechanism in identifying the sampling units included in the sample.
Rather, it allows the researcher to choose the units in the sample
subjectively.

12 / 35
2. Data Collection
Probability Sampling
1 Simple Random Sampling (SRS)
• Simple Random Sampling With Replacement (SRSWR)
• Simple Random Sampling Without Replacement (SRSWOR)
2 Systematic Random Sampling
3 Stratified Random Sampling
4 Cluster Sampling
5 Multistage Sampling

Non-probability Sampling
1 Haphazard or Convenience Sampling
2 Judgment or Purposive Sampling
3 Quota Sampling
4 Snowball Sampling
13 / 35
2. Data Collection

Simple Random Sampling (SRS)

Simple Random Sampling (SRS)


SRS is a probability sampling method wherein all possible subsets
consisting of n elements selected from the N elements of the population
have the same chances of selection.

14 / 35
2. Data Collection
Simple Random Sampling
Simple random sampling can be done with replacement (with repetition) or
without replacement (without repetition).
SRSWR
In SRSWR, the n elements in the sample need not be distinct, that is, an
element can be selected more than once to be a part of the sample.

SRSWOR
In SRSWOR, all of the n elements in the sample must be distinct from
each other.

Probability of Inclusion
The probability that an element will be included in the selected sample is
 2
N −1 n
1− (in SRSWR) and (in SRSWOR).
N N
15 / 35
2. Data Collection

How to draw/get a random sample of size n from a population of N


elements?

16 / 35
2. Data Collection

How to draw/get a random sample of size n from a population of N


elements?

By random selection mechanism.

17 / 35
2. Data Collection

Simple Random Sampling (SRS)

Randomization Mechanism
1 The Lottery Method - drawing n slips of paper (balls) from an opaque bowl (any
nontransparent container) containing N slips of paper.
2 Generating random numbers using a calculator - using the syntax RanInt#(1, N)
where N is the population size.
3 Generating random numbers using Microsoft Excel - using the syntax
= RANDARRAY (n, 1, 1, n, TRUE ) for SRSWR and
= INDEX (UNIQUE (RANDARRAY (n2 , 1, 1, n, TRUE )), SEQUENCE (n)) for
SRSWOR
4 The use of Table of Random Numbers - a computer-generated random numbers.

18 / 35
2. Data Collection

SRS Illustration:

19 / 35
2. Data Collection
Systematic Random Sampling

Systematic Random Sampling


Systematic sampling is a probability sampling method wherein the selection
of the first element is at random and the selection of the other elements in
the sample is systematic by subsequently taking every kth element from
the random start r , where k is the sampling interval and is given by the
formula k = Nn .

reliable when the arrangement of the elements in a population is


according to the magnitude of the variable of interest.
less reliable when there are periodic regularities.
Example: Suppose we wish to conduct a survey on the opinions of senior BA Com
students on the computerized registration system. Moreover, suppose the list contains
the names of N = 50 seniors arranged alphabetically, and our sample size is n = 13. Use
systematic random sampling to obtain the sample of 13 seniors.
20 / 35
2. Data Collection

Systematic Random Sampling


1 Calculate k = Nn :
Given N = 50 and n = 13, then k = 50 13 ≈ 3.846. The greatest integer
(greatest integer function or floor function) of 3.846 is 3, therefore
k = 3.
2 Use randomization mechanism to generate a number r from
the list of 50 seniors:
Suppose the result is r = 15. Then the first element in our sample is
the 15th student in the list (r + 0k = r = 15).

21 / 35
2. Data Collection
3 Generate the other 11 students:
r + 1k = 15 + 1(3) = 18
r + 2k = 15 + 2(3) = 21
r + 3k = 15 + 3(3) = 24
r + 4k = 15 + 4(3) = 27
r + 5k = 15 + 5(3) = 30
r + 6k = 15 + 6(3) = 33
r + 7k = 15 + 7(3) = 36
r + 8k = 15 + 8(3) = 39
r + 9k = 15 + 9(3) = 42
r + 10k = 15 + 10(3) = 45
r + 11k = 15 + 11(3) = 48
r + 12k = 15 + 12(3) = 52 = 2

Therefore, the students included in the sample survey are the 15th, 18th, 21st,
24th, 27th, 30th, 33rd, 36th, 39th, 42nd, 45th, 48th, and 2nd student in the list.

22 / 35
2. Data Collection

Stratified Random Sampling


Stratified Random Sampling
Stratified sampling is a probability sampling method where we divide the
population into nonoverlapping subpopulations or strata, and then select
one sample from each stratum. The sample consists of all the samples in
the different strata.

23 / 35
2. Data Collection

Stratified Random Sampling Illustration:

24 / 35
2. Data Collection

Example: Suppose we want to get the opinion of 200 college students in


University X with 500 students regarding a particular topic. If University X
has 4 colleges with the following number of students:

College No. of Students


College of Science 186
College of Social Sciences 115
College of Communication, Art, and Design 97
School of Management 102
Total 500

, determine the sample size in each stratum.

25 / 35
2. Data Collection

Cluster Sampling
Cluster Sampling
Cluster sampling is a probability sampling method wherein we divide the
population into nonoverlapping groups or clusters consisting of one ore
more elements, and then select a sample of clusters. The sample will
consist of all the elements in the selected clusters.

26 / 35
2. Data Collection

Simple One-stage Cluster Sampling Illustration:

27 / 35
2. Data Collection

Example: Suppose we wish to conduct an opinion poll survey of


households in Metro Manila. Suppose we decide to include n = 5 cities in
our study. Given the list of all cities in metro Manila below, select a sample
of households using simple one-stage cluster sampling.

Caloocan Navotas
Las Piñas Parañaque
Makati Pasay
Malabon Pasig
Mandaluyong Quezon City
Manila San Juan
Marikina Taguig
Muntinlupa Valenzuela

28 / 35
2. Data Collection

Multistage Sampling
Multistage Sampling
Multistage Sampling is an extension of one-stage cluster sampling.

It is more cost-efficient than cluster sampling when the clusters are


large and the elements are homogeneous with respect to the
characteristic under study.
Since there is more than one stage of sampling, there will also be
more than one source of sampling errors.
The more stages there are, the more sources of sampling variations
there will be. Having more sources of sampling variations makes the
estimation procedure more difficult.

29 / 35
2. Data Collection

Non-probability Sampling
does not make use of any randomization mechanism
subjectively chooses the units as part of the sample

Haphazard or Convenience Sampling


In haphazard or convenience sampling, the sample consists of elements that
are most accessible or easiest to contact.

Judgment or Purposive Sampling


In purposive sampling, sampling respondents are chosen based on the
judgment or opinion of the researcher or upon the advice of certain
experts. This is merely relying on the researcher’s expertise in
identifying the criteria of a representative sample.

30 / 35
2. Data Collection
Non-probability Sampling
Quota Sampling
Quota sampling is like stratified random sampling but without
randomization.

31 / 35
2. Data Collection

Non-probability Sampling
Snow-ball Sampling
In snowball sampling, a few initial samples are taken by SRS. The
sample is then expanded through referrals. Sometimes, referrals are done
through social networks, so this method is also called network sampling.

cost-effective in developing a sample especially when a sampling frame


is difficult to construct

32 / 35
2. Data Collection

Non-probability Sampling

less costly and easier to implement than probability sampling


sample may not entirely represent the larger population
nonprobability sample may be biased
sampling error cannot be determined
no objective way of assessing the reliability

33 / 35
Group Activity 2

Instruction: Search on the internet for any data from a reliable source and
identify the data collection method and sampling method they used.
Determine also the levels of measurement on each variable included or
defined in the data.

34 / 35
-END

You might also like