0% found this document useful (0 votes)
5 views7 pages

Pa CH6

The document outlines the process of psychological test development, including test design, scaling techniques, item writing, and scoring models. It emphasizes the importance of careful planning, conceptualization, and operationalization in creating effective assessments. Various scaling methods and item formats are discussed, along with the steps for assembling and conducting test trials to ensure reliability and validity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Pa CH6

The document outlines the process of psychological test development, including test design, scaling techniques, item writing, and scoring models. It emphasizes the importance of careful planning, conceptualization, and operationalization in creating effective assessments. Various scaling methods and item formats are discussed, along with the steps for assembling and conducting test trials to ensure reliability and validity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Northern Luzon Adventist College

Artacho, Sison, Pangasinan


Department of Psychology
Psychological Assessment

PSYCHOLOGICAL TEST DEVELOPMENT

Integration of Faith & Learning/Values


And let us not grow weary of doing good, for in due season we will reap, if we do not give up.
Galatians 6:9

Learning Objectives:
A. Discuss the process of test design and construction.
B. Differentiate and discuss the various scaling techniques.
C. Discuss how to prepare and construct test items.

Lesson Proper
Category of the test according to domain
Maximum performance test/Ability Test Typical Performance Tests
1. Intelligence 1. Personality
2. Aptitude 2. Interest
3. Achievement 3. Attitude

TEST DESIGN AND CONSTRUCTION


Designing and constructing a test varies with the type of test and the purposes for which
it was intended. Tests of ability and personality designed by specialists in psychological
measurement usually require the efforts of many individuals working for extended periods of
time.

1. Planning the Test Content


• Constructing a test requires careful consideration of its specific purposes. Tests serve
many different functions and the process of construction varies to some extent with their
particular purposes.
➢ Ideally, the construction of any test or psychometric device begins by defining the
variables or constructs to be measured and outlining the proposed content, such process is
called conceptualization.
Conceptualization- the process through which we specify precisely what we mean by a
particular term or construct. It is our definition of the term or the construct. For example:
What do you mean by verbal intelligence, neuroticism, sociability, or aggression?
Operationalization- the process of specifying concrete empirical procedures resulting in
the measurement of the concept. Behavioral indicators that are reflective of the concept
are identified.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque


2. Scaling
➢ Scaling is the process of setting rules for assigning numbers in measurement. It is a way
by which a measuring device is designed and calibrated by which numbers (scale values)
are designed to measure different amounts of the trait, attribute or characteristic being
measured.
Historically, the prolific L. L. Thurstone is credited for being at the forefront of efforts to
develop methodologically sound scaling methods.
➢ There are different scaling methods. Some of them are:
a. Likert Scale – one of the most popular scaling methods especially in measuring
attitude or typical performance test i.e., personality. It is a summative scale because
the final score is obtained by summing all the item ratings. Each test is usually
presented with five (5) alternatives, sometimes seven (7), usually in the form of
agree-disagree or approve-disapprove continuum.
b. Rating Scale – the grouping of words, statements, or symbols on which judgments of
the strength of a particular trait, attitude, or emotions are indicated. Rating may be
excellent, good, poor, etc. Facial expressions may also be used to indicate judgment.
Cheating on taxes if you have a chance is:
1 2 3 4 5 6 7 8 9 10
never always
justified justified

c. Semantic Differential- this format asks the examinee to choose between two
opposite (polarities) positions usually on 5 or 7 scale points, such as boring on one
end of the scale and interesting on the other.
How I feel when I am out and among other people
Warm __:__:__:__:__:__:__ Cold
Tense __:__:__:__:__:__:__ Relaxed
Weak __:__:__:__:__:__:__ Strong
Brooks Brothers suit __:__:__:__:__:__:__ Hawaiian shirt

d. Method of Paired Comparison – the process of presenting the test take with pairs of
stimuli (e.g., statements, photographs, etc.) which they are asked to compare then
select one based on a set of rules or criteria. E.g., EPI, EPPS, Love Language
Select the behavior that you think would be more justified:
a. cheating on taxes if one has a chance
b. accepting a bribe in the course of one’s duties
e. Guttman Scale - is a scaling method that yields ordinal-level measures. Item is
ranged from weaker to stronger expressions of the attitude, belief or feeling being
measured. All respondents who agree with the stronger items will also agree with the
weaker or milder items.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque


Do you agree or disagree with each of the following:
a. All people should have the right to decide whether they wish to end their lives.
b. People who are terminally ill and in pain should have the option to have a doctor
assist them in ending their lives.
c. People should have the option to sign away the use of artificial life-support
equipment before they become seriously ill.
d. People have the right to a comfortable life.

3. Item Writing
• An item pool is a reservoir or a group of items from which the items in the final
test will be drawn.
E.g., a test called “Philippine History: 1950 to 1990” is to have 30 questions in its
final version, it would be useful to have as many as 60 items in the item pool.
• Item pooling may be done as follows:
a. Intelligence Tests
A pool of items that presumably measures some aspect of the construct
“intelligence” is assembled. The items may be constructed according to a specific theory
of intelligent behavior or simply with reference to the kinds of tasks that highly
intelligent people presumably perform more effectively than those of lower intelligence.
b. Aptitude Tests
Constructing an aptitude test to be used in the industrial setting requires a task analysis or
job analysis which consists of specifying components of the job so that the test items can
be devised to predict employee performance. The specifications may include critical
incidents- behaviors that are critical to successful or unsuccessful performance. In
school, items that are able to predict school achievement or academic success must be
considered in the test construction.
c. Achievement Tests
Items are constructed based on the rationale and purpose of the test. The items that meet
the objectives are then written
d. Personality Inventories and Scales
Personality inventories are constructed by combining theoretical, rational, and empirical
approaches.

4. Preparing Test Items


➢ Involves preparation of a detailed outline (e.g., table of specifications) to serve as a guide
in constructing items. Once an outline or table of specifications is prepared, the next step
is to construct the test items.
➢ All test items represent procedures for obtaining information about the examinee but the
amount and kind of information vary with the nature of the task posed by the different
types of items.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque


➢ Item format- refers to the form, plan, structure, arrangement and layout of the individual
test items. There are two types:
1. Selected-response format – test takers are required to select their response from a set
of alternatives such as multiple-choice, matching, and true or false.
2. Constructed-response format – test takers are required to supply or create the
correct answer not merely to select it. This format is evident in a sentence completion
test and other projective techniques and also in some individual intelligence tests.
• The statement in the test item is called a stem whereas incorrect options are called
distracters or foils.

➢ One popular classification of test items is the objective type. The crucial feature of an
objective test is not the form of the response, but rather how objectively the items can be
scored.
➢ Multiple-choice type is common among mental ability tests intended for group testing.
The test item is composed of the stem and the response options. The response options
include the correct answer and the distracters.
➢ Some forms of Multiple-choice items are classification, If-the Conditions, Multiple
Conditions, Oddity and Relations, and Correlates.

1. Classification – the examinee classifies a person, object or condition into one of


several categories designated in the stem.
Example: Jean Piaget is best characterized as a _____________psychologist.
a. Clinical c. Psychometric
b. Developmental d. Social
2. If-Then Conditions- the examinee must determine the correct consequence of one or
more conditions being present.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque


Example: If the true variance of a test increases and the error variance decreases,
which of the following will occur?
a. Reliability will increase c. The observed variance will decrease
b. Reliability will decrease d. Neither reliability not observe

3. Multiple Conditions- the examinee uses two or more conditions or statements listed
in the stem to draw a conclusion.
Example: Given that John’s score on a test is 60, the test mean is 59, and the standard
deviation is 2, what is John’s z score?
a. -2.00 c. .50
b. -.50 d. 2.00
4. Oddity- the examinee indicates which option does not belong with the others.
Example: Which of the following names does not belong with the others?
a. Alfred Adler c. Carl Jung
b. Sigmund Freud d. Carl Rogers

5. Relations and Correlates- the examinee determines the relationship between


concepts 1 & 2 and then indicates which of the concepts listed in the option is related
to concept 2 in the same way that concepts 1 & 2 are related.
Example: Mean is to standard deviation as the median is to:
a. Average deviation c. Semi-interquartile Range
b. Inclusive Range d. variance

• Multiple-choice type is also used in personality testing.


Examples:
1. I prefer people who:
a. are reserved b. (are) in between c. make friends easily.
2. Money cannot bring happiness.
a. yes (true) b. in between c. no (false)
3. I am bothered by fears of being inadequate.
a. Disagree b. Somewhat Disagree c. Somewhat Agree d. agree
Some personality tests also use the True or False and Paired- Comparison types.
• Example: True or False
1. T | F: I feel close to members of my family
2. T | F: My sleep is fitful and disturbed.
• Example: Paired-Comparison
1. Which of these statements is more characteristic of you?
A. I like to solve puzzles and problems that other people have difficulty with.
B. I like to follow instructions and to do what is expected of me.

3 types of constructed-response items: completion item, the short answer, and the essay.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque


1. A completion item requires the examinee to provide a word or phrase that completes a
sentence, as in the following example:
The standard deviation is generally considered the most useful measure of __________.

2. short-answer item. It is desirable for completion or short-answer items to be written clearly


enough that the testtaker can respond succinctly—that is, with a short answer. There are no hard-
and-fast rules for how short an answer must be to be considered a short answer; a word, a term, a
sentence, or a paragraph may qualify.

3. Beyond a paragraph or two, the item is more properly referred to as an essay item. We may
define an essay item as a test item that requires the testtaker to respond to a question by writing a
composition, typically one that demonstrates recall of facts, understanding, analysis, and/or
interpretation.

e.g. Compare and contrast definitions and techniques of classical and operant
conditioning. Include examples of how principles of each have been applied in clinical as
well as educational settings.

5. Assembling the Test


• After preparing the test items, it is advisable to have them reviewed and edited by
another knowledgeable person.
• Other matters must be decided upon before assembling the test such as:
➢ Test length
➢ Arrangement of items
➢ Answer Sheets
• Test directions in terms of administration, scoring, and interpretation must be clear.

6. Conducting Test Tryout


• Conduct a first trial run by administering the initial draft to a relatively large sample of
intended examinees (pre-test) and conducting item analysis. Revise test based on item
analysis.
• Conduct a second trial run and do reliability and validity testing.

7. Write the Test Manual


• Among others, included in the test manual are detailed procedures for administration,
scoring, and interpretation of test results.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque


SCORING MODELS
Perhaps the most commonly used model—owing, in part, to its simplicity and logic—is
the cumulative model. Typically, the rule in a cumulatively scored test is that the higher the
score on the test, the higher the testtaker is on the ability, trait, or other characteristics that the
test purports to measure. For each testtaker response to targeted items made in a particular way,
the testtaker earns cumulative credit with regard to a particular construct.

In tests that employ class or category scoring, testtaker responses earn credit toward placement
in a particular class or category with other testtakers whose pattern of responses is presumably
similar in some way. This approach is used by some diagnostic systems wherein individuals
must exhibit a certain number of symptoms to qualify for a specific diagnosis.

A third scoring model, ipsative scoring, departs radically in rationale from either cumulative or
class models. A typical objective in ipsative scoring is comparing a testtaker’s score on one scale
within a test to another scale within that same test.
E.g., a personality test called the Edwards Personal Preference Schedule (EPPS), is designed
to measure the relative strength of different psychological needs. The EPPS ipsative scoring
system yields information on the strength of various needs in relation to the strength of other
needs of the testtaker. The test does not yield information on the strength of a testtaker’s need
relative to the presumed strength of that need in the general population. Edwards constructed his
test of 210 pairs of statements in a way such that respondents were “forced” to answer true or
false or yes or no to only one of two statements. Prior research by Edwards had indicated that the
two statements were equivalent in terms of how socially desirable the responses were. Here is a
sample of an EPPS-like forced-choice item, to which the respondents would indicate which is
“more true” of themselves:
I feel depressed when I fail at something.
I feel nervous when giving a talk before a group.

***Prepare for a Quiz on this module.

Module 6 | Psychological Test Development | Prepared by: A. Fiaroque

You might also like