Unit IV - Sampling and Data Analysis
Unit IV - Sampling and Data Analysis
Stratified Judgmental
Cluster Snowball
Multistage
Probability and Non Probability
Sampling
• All the items under consideration
in any field of inquiry constitute a
‘universe’ or ‘population’.
• A sample design is a definite plan
determined before any data are
actually collected for obtaining a
sample from a given population
• With probability samples each Sampling Method
Cluster
Multistage
Simple Random
• This type of sampling is also known as
chance sampling or probability sampling
where each and every item in the
population has an equal chance of
inclusion in the sample and each one of
the possible samples, in case of finite
universe, has the same probability of
being selected.
• For example,
– if we have to select a sample of 300 Sampling Method
Cluster
Multistage
Systematic Sampling
• In some instances the most
practical way of sampling is to
select every 15th name on a list,
every 10th house on one side of
a street and so on. Sampling of
this type is known as systematic
sampling. An element of
randomness is usually Sampling Method
Stratified Judgmental
Cluster
Multistage
Stratified Sampling
• If the population from which a sample is to
be drawn does not constitute a
homogeneous group, then stratified
sampling technique is applied so as to
obtain a representative sample. In this
technique, the population is stratified into
a number of non overlapping
subpopulations or strata and sample
items are selected from each stratum. If
the items selected from each stratum is Sampling Method
Stratified Judgmental
Cluster
Multistage
Cluster Sampling
• Cluster sampling involves grouping the
population and then selecting the groups or
the clusters rather than individual elements
for inclusion in the sample. Suppose some
departmental store wishes to sample its
credit card holders. It has issued its cards to
15,000 customers. The sample size is to be
kept say 450. For cluster sampling this list of
15,000 card holders could be formed into
100 clusters of 150 card holders each. Three
clusters might then be selected for the Sampling Method
sample randomly. The sample size must
Probability Sampling Non-Probability Sampling
often be larger than the simple random
sample to ensure the same level of accuracy Simple Random Convenience
because is cluster sampling procedural
potential for order bias and other sources of Systematic Sampling Quota
error is usually accentuated.
Stratified Judgmental
Cluster
Multistage
Multistage Sampling
• This is a further development of the
idea of cluster sampling. This
technique is meant for big inquiries
extending to a considerably large
geographical area like an entire
country. Under multi-stage
sampling the first stage may be to
select large primary sampling units
such as states, then districts, then Sampling Method
Multistage
Convenience Sampling
• Deliberate sampling is also known as purposive
or non-probability sampling. This sampling
method involves purposive or deliberate
selection of particular units of the universe for
constituting a sample which represents the
universe. When population elements are
selected for inclusion in the sample based on
the ease of access, it can be called convenience
sampling. If a researcher wishes to secure data
from, say, gasoline buyers, he may select a fixed
number of petrol stations and may conduct Sampling Method
interviews at these stations. This would be an
Probability Sampling Non-Probability Sampling
example of convenience sample of gasoline
buyers. At times such a procedure may give Simple Random Convenience
very biased results particularly when the
population is not homogeneous. Systematic Sampling Quota
Stratified Judgmental
Cluster
Multistage
Quota sampling
• In stratified sampling the cost of taking
random samples from individual strata is
often so expensive that interviewers are
simply given quota to be filled from
different strata, the actual selection of
items for sample being left to the
interviewer’s judgment. This is called
quota sampling.
• The size of the quota for each stratum is
generally proportionate to the size of that Sampling Method
Cluster
Multistage
Judgemental Sampling
• In judgmental sampling the
researcher’s judgement is used for
selecting items which he considers
as representative of the
population.
• For example,
– a judgement sample of college
students might be taken to secure
reactions to a new method of Sampling Method
Cluster
Multistage
Concept of Measurement
• Definition Metrology is the name given to the
science of pure measurement. Engineering
Metrology is restricted to measurements of
length & angle. Measurement is defined as the
process of numerical evaluation of a dimension
or the process of comparison with standard
measuring instruments
• Quantifying the dependent variable
•
Importance or Need of measurement
• Research conclusions are only as good as the data on which they
are based
• Observations must be quantifiable in order to subject them to
statistical analysis
• The dependent variable(s) must be measured in any quantitative
study.
• The more precise, sensitive the method of measurement, the
better.
• Establish Standard Interchangeability.
• To check Customer Satisfaction
• To Validate the design
• Physical parameter into meaningful number.
• True dimension Evaluate the Performance.
Direct measures
• Physiological measures
– heart rate, blood pressure, galvanic skin response,
eye movement, magnetic resonance imaging, etc.
• Behavioral measures
– in a naturalistic setting.
• example: videotaping leave-taking behavior
(how people say goodbye) at an airport.
– in a laboratory setting
• example: videotaping married couples’
interactions in a simulated environment
Indirect measures
• Relying on observers’ estimates or perceptions
– indirect questioning
• example: asking executives at advertising firms if they think their
competitors use subliminal messages.
• example: asking subordinates, rather than managers, what
managerial style they perceive their supervisors employ.
• Unobtrusive measures.
– measures of accretion, erosion, etc.
• example: studying discarded trash for clues about lifestyles, eating
habits, consumer purchases, etc.
Levels of data
• Nominal
• Ordinal
• Interval (Scale in SPSS)
• Ratio (Scale in SPSS) ratio
interval
ordinal
nominal
Nominal data
• a more “crude” form of data: • nominal categories aren’t
limited possibilities for statistical hierarchical, one category isn’t
analysis “better” or “higher” than
• categories, classifications, or another
groupings • assignment of numbers to the
– “pigeon-holing” or labeling categories has no mathematical
• merely measures the presence meaning.
or absence of something • nominal categories should be
– gender: male or female mutually exclusive and
exhaustive.
– immigration status;
documented,
undocumented
– zip codes, 90210, 92634,
91784.
Nominal data-continued
…in science…
“The conclusion of the study was not valid”
(Nelson 1997)
Types of Experimental Validity
• Internal.
• External.
Attitude
Intention
Cognitive Affective
or Action
Attitude
• Cognitive component: Represents and
individual’s information and knowledge about an
object. Example of remembering about
Tupperware..
• Affective Component: Summarizes a person’s
overall feeling or emotions towards the object.
Example of tasty food cooked in pressure cooker
• Intention or Action component: It also reflects a
person’s expectation future behavior towards an
object. Example: Purchase intention to buy things
in future
Scaling
• Scaling describes the procedures of assigning
numbers to various degrees of opinion,
attitude and other concepts.
• It may be stated here that a scale is a
continuum, consisting of the highest point (in
terms of some characteristic e.g., preference,
favourableness, etc.) and the lowest point
along with several intermediate points
between these two extreme points.
Classification of Scales
Scaling Techniques
Itemized Rating
Paired Comparison Graphic Rating Scale
Scale
Semantic
Rank Order
Differential
A B C D E
A - 1 0 1 0
B 0 - 0 1 0
C 1 1 - 1 0
D 0 0 0 - 0
E 1 1 1 1 -
Total 2 3 1 4 0
Comparative Scale – Rank Order Scale
• In Rank order scaling, Soft Drink Rank
respondents are presented Coke
with several objects
Pepsi
simultaneously and asked
to order or rank them Limca
according to some Sprite
criterion. Consider, for Mirinda
example the following Seven up
question:
Fanta
Comparative Scale – Constant Sum
Rating Scale
• In this the respondents School Points
are asked to allocate a DPS
total of 100 points Jagran Public
between various School
Mount Litera
objects and brands. The
DAV Public School
respondent distribute
Jaipuria
the points to the
Subeam
various objects in the International
order of his preference. Atulanand
Heritage
Total 100
Comparative Scale – Q-Sort
• The Q-sort technique was developed to discriminate
among a large number of objects quickly. This
technique makes use of the rank order procedure in
which objects are sorted into different piles based on
their similarity with respect to certain criterion.
• Example: Group of data can be piled up in five group
– Strongly agree
– Some what agree
– Neutral
– Some what disagree
– Strongly disagree
Non Comparative Scales
• In this the respondents do Non
Comparative
not make use of any Scales
frame of reference before
answering the questions. Graphic Rating Itemised Rating
The resulting data is Scales Scale
generally assumed to be
interval or ratio scale. Likert Rating
Scale
• The respondent may be
asked to evaluate the
Semantic
quality of food in a differential
restaurant on a five point Rating Scale
scale. (1=very poor,
2=poor, and 5=very good) Stapel Rating
Scale
Graphic Rating Scale
• This continuous scale, also called graphic
rating Scale. In the graphic rating scale the
respondent is asked to tick mark on the
following question:
Least Most
Proffered Preffered
Itemized Rating Scale
• The respondent are provided with a scale that
has a number or descriptions associated with
each of the response categories.
• Issues related to the Itemized Rating Scale
– Number of categories to be used.
– Odd or even number of categories.
– Balanced versus unbalanced scales.
– Nature and degree of verbal description.
– Forced Versus non-forced scales.
– Physical form.
I. Analysis of Data
• “in the process of analysis,
– relationships or differences supporting or conflicting
– with original or new hypotheses
– should be subjected to statistical tests of significance
– to determine with what validity data can be said to
indicate any conclusions”
• Analysis serves the purpose of
– Giving proper result.
– Data to be made available for interpretation.
– Establishes relationship between different Variables.
II. Processing Operations
1. Editing
2. Coding
3. Classification
4. Tabulation
II. Processing Operations
i) Editing
• Editing is done to assure that the data are
accurate, consistent with other facts
gathered, uniformly entered, as completed as
possible and have been well arranged to
facilitate coding and tabulation.
a. Field Editing and Central Editing.
1. Editing
2. Coding
3. Classification
4. Tabulation
II. Processing Operations
i) Editing
I. The Interviewer, Editor and
coder remain in constant touch.
II. Objective of Editing remain the
same throughout the editing
process.
III. Apart from numerical value we
may use colour for the same.
IV. They should notify with every 1. Editing
change made in answers.
V. Editor’s initials and the date of 2. Coding
editing should be placed on each 3. Classification
completed form or schedule
4. Tabulation
II. Processing Operations
ii) Coding
• Coding refers to the process of
assigning numerals
• Coding is necessary for efficient
analysis and through it the
several replies may be reduced
to a small number of classes
which contain the critical
information required for 1. Editing
analysis. 2. Coding
• Computer based and manual
3. Classification
4. Tabulation
II. Processing Operations
iii) Classification
• Classification according to attributes
– can either be descriptive (such as
literacy, sex, honesty, etc.) or numerical
(such as weight, height, income, etc.).
• Classification according to class-
intervals
– Data relating to income, production,
age, weight, etc
– class-intervals
• How many classes should be there? 1. Editing
• How to choose class limits ?
– Exclusive/Inclusive. 2. Coding
• How to determine the frequency of each
class ? 3. Classification
4. Tabulation
II. Processing Operations
iv) Tabulation.
• Tabulation is the process of summarizing raw
data and displaying the same in compact
form (i.e., in the form of statistical tables) for
further analysis.
• Tabulation is an orderly arrangement of data
in columns and rows.
Tabulation is essential because of the following
reasons.
1. It conserves space and reduces explanatory
and descriptive statement to a minimum.
2. It facilitates the process of comparison.
1. Editing
3. It facilitates the summation of items and
the detection of errors and omissions. 2. Coding
4. It provides a basis for various statistical
computations. 3. Classification
4. Tabulation
III. Various Kinds of Charts and Diagrams Used in
Data Analysis
Types of Graph
I. Bar Graph
II. Line Graph
III. Stacked Graph
IV. Pie Graph
V. Pictograph
I. Bar Graph
• It is used to make comparisons about groups of data
Production
250
200
Axis Title
150
100
50
0
Factory A Factory B Factory C Factory D
Production 200 150 145 220
6
4
Series 1
3
Series 2
2 Series 3
0
Category 1 Category 2 Category 3 Category 4
Wheat
300
250
200
Axis Title
150
100
50
0
2011 2012 2013 2014
Wheat 150 175 200 250
I. Bar Graph
Wheat Rice Cereals
2011 150 100 50
2012 175 125 75
2013 200 150 100
2014 225 175 125
250
200
150 Wheat
Rice
100
Cereals
50
0
2011 2012 2013 2014
II. Line Graph
Wheat
2011 150
2012 175
2013 200
2014 225
Wheat
250
200
150
100 Wheat
50
0
2011 2012 2013 2014
II. Line Graph
Wheat Rice Cereals
2011 150 100 50
2012 175 125 75
2013 200 150 100
2014 225 175 125
250
200
150 Wheat
Rice
100
Cereals
50
0
2011 2012 2013 2014
III. Stacked Graph
Wheat Rice Cereals
2011 150 100 50
2012 175 125 75
2013 200 150 100
2014 225 175 125
2014
2013
Cereals
Rice
2012
Wheat
2011
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
V. Pictograph
IV. Bar and Pie Diagrams and their
Significance
• A bar chart is particularly useful
Series 1 when one or two categories
5 'dominate‘ results.
4.5 – It can be very clear and easy to
4 read.
3.5 – Most people understand what is
presented without having to have
Axis Title
3
detailed statistical knowledge.
2.5
– It can represent data expressed as
2 actual numbers, percentages and
1.5 frequencies.
1 – A bar chart can represent either
0.5 discrete or continuous data.
0 – If the data is discrete there should
Cate Cate Cate Cate be a gap between the bars (as in
gory gory gory gory the diagram above).
1 2 3 4 – If the data is continuous there
should be no gap between the
Series 1 4.3 2.5 3.5 4.5 bars.
IV. Bar and Pie Diagrams and their
Significance
Sales • A pie chart shows data
in terms of proportions
3rd Qtr
10% 4th Qtr of a whole. The 'pie' is
9% divided into segments
that represent this
proportion. This is done
2nd Qtr 1st Qtr by dividing the angles at
23% 58%
the centre.
IV. Bar and Pie Diagrams and their
Significance
Sales • It is best used to present the
proportions of a sample.
• It is most useful where one or
3rd Qtr two results dominate the
10% 4th Qtr findings.
9%
• It can represent data
expressed as actual numbers
or percentages.
• Do not use where there are a
2nd Qtr 1st Qtr large number of categories, or
23% 58% where each has a small, fairly
equal share, as this can be
unclear.
Calculation
No. of Siblings No. of Conversion No. of Degrees
Students
0 4 4/30X360 48
1 12
2 8
3 3
4 2
More 1
Test of Hypothesis
• Hypothesis is generally considered the most
important instrument in research. Its main
function is to suggest new functions and
ideas. In social sciences where direct
knowledge of population parameters is rare
hypothesis testing is the often used for
deciding whether sample data supports our
purpose
WHAT IS A HYPOTHESIS
• Generally hypothesis is considered as an
assumption or a supposition which has to be
proved or disproved.
• But for a researcher hypothesis is a formal
question that he has to resolve
WHAT IS A HYPOTHESIS
• A hypothesis may be defined as a proposition
or a set of propositions set forth as an
explanation for the occurrence of some
specific group of phenomena . A research
hypothesis is a predictive statement capable
of being tested by scientific methods that
relate an independent variable to some
dependent variable
WHAT IS A HYPOTHESIS
• Example • The mileage of automobile A is as
good as automobile b
• The customer loyalty of brand A is better than
brand B.
• These hypotheses are capable of objectively
verified and tested
Characteristics of a hypothesis
• Hypothesis should be clear and precise
• Hypothesis should be capable of being tested
• Hypothesis should be able to relate to a variable.
• Hypothesis must be limited in scope and must be
specific
• Hypothesis must be stated in very simple terms.
• Hypothesis must be consistent with most known facts.
• Hypothesis must be testable within a reasonable time
• Hypothesis must explain the facts which most need
explaining
Steps for Hypothesis Testing
Formulate H0 and H1
Select Appropriate Test
Choose Level of Significance
Calculate Test Statistic TSCAL