Lecture 4.bb - Measurement - Part2
Lecture 4.bb - Measurement - Part2
We prefer multi-item scales for the same reasons that we employ multiple judges for sporting contests. The
average score across multiple judges is more likely to reflect the true value after accounting for inter-judge error.
Validity
Validity
• The extent to which an instrument measures what it is supposed to
measure.
• Validity requires reliability (minimizing error) plus an estimate of the
construct that is true to the construct’s form.
• Example: what does the SAT actually measure?
• intelligence? college-preparedness? success in life? the ability to “pass tests?”
• Example: what does GPA actually measure?
• It is a valid measure of “success in school,” but any interpretation beyond that is invalid.
• Can a measure be reliable but not valid?
• Yes! See next slide.
Reliability vs. Validity
Kinds of validity we care about
1. Face validity
• You should be able to infer the construct being measured by reading the test
questions.
• As constructs become complex and abstract, face validity loses its appeal.
• Ex: I want to measure “how likely is it that you are lying to me?” without you knowing
that I am measuring your likelihood of lying.
2. Content validity
• Items should assess the construct as broadly, and from as many angles, as the
complexity of the construct dictates.
• Ex: Stony Brook asks all graduating seniors, before they leave: “how satisfied are you
with your experience at Stony Brook University?” on a 1-5 scale.
• Content validity is low, because there are likely many different factors that should be
assessed (e.g., housing, cost of tuition, safety, food, fees, faculty, social life, quality of
education, etc, etc, etc.)
3. Construct validity
Kinds of validity we care about
3. Construct validity
• The instrument should behave consistently with its underlying theory.
• Essentially, a valid measure should demonstrate validity by relating in predictable
patterns with other variables.
• This kind of validity is the most academic. Used for theory-building more than for
common research applications.
Continuous vs. Discrete Data
Continuous vs. discrete data
• Continuous: an infinite number of possible values between any two
points on the measurement scale.
• Many ratio scales are continuous, but not all.
• Discrete: the variable can only take on a limited number of values.
• Nominal and ordinal scales are, by definition, discrete variables.
• Interval scales are usually discrete, but we bypass this by using multi-item
scales and averaging individual items to create a composite score per person.
• The best way to think about interval vs. ratio and continuous vs.
discrete is on the next slide.
Don’t think of the distinction as a defining characteristic.
Think of it as an additional layer.
ratio
discrete continuous
interval
Don’t think of the distinction as a defining characteristic.
Think of it as an additional layer.
ratio
number of drugs milligrams of a drug
number of sales from repeat customers percentage of market share
number of correct responses on Exam 1 your Exam 1 score as a percentage of 100
discrete continuous
Likert scale ratings Nasdaq composite index
1-10 pain scale composite variables from multi-item scales
SAT scores your overall GPA
interval