0% found this document useful (0 votes)

29 views45 pages

Rezumat Teza - English

This document provides a summary of chapters from a PhD thesis on comparing item response theory and classical test theory models in psychological assessment. The thesis contains two parts, with the first part covering the theoretical background of psychological testing and item response models. It discusses the history of psychological testing and outlines the key differences between classical test theory and item response theory. The second part of the thesis applies item response models to adjust a personality inventory and compares the results to classical test approaches.

Uploaded by

Muhammad Saad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views45 pages

Rezumat Teza - English

Uploaded by

Muhammad Saad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Item

Thesis
response
Summary
models in
psychological
assessment
Al. I. Cuza

University

Fac ulty of

Psychology

and
PhD Student:
Educational
Cristian OpariucSciences
Dan

Coordinator: PhD

A bad constructed test wrongly adjusted or invalid in

psychology
is like a rusty scalpel in surgery:
even if it does not kill you it can leave permanent scars.

General structure of the thesis

The thesis has two parts, both including a
practical section, followed by conclusion and
discussion. The first part, covered in Chapters IIV, illustrates the theoretical background of this
thesis. We addressed issues related to general
aspects of the psychological testing, presentation
of item response models, how to build tests based
on item response theory, including self-adaptive
tests. The second part contains the results of
BigFive Plus personality inventory adjustment for
the item response models and comparisons
between evaluation with conventional tests and
evaluation by item response models.

Chapter I
General aspects related to psychological
assessment. Historical perspective
We considered it necessary to start work
with a short history of psychological testing in the
first chapter, marking the main stages of the
evolution of psychological tests (first experiments
in

psychology

the
2

19th

century,

the

materialization of the mental test notion, the

Alfred Binet moment, the emergence of nonverbal tests, collective tests as well as transition
from abilities testing to personality assessment).
Since the thesis aim was to compare the two
theories,

recognized

the

value

overview of developments of the classical

theory

psychological

tests,

from

the

Spearman moment to the Gaussian distribution,

Pearson's contributions and criticisms of Kelly. We
ended our overview by describing the main
postulates of classical test theory.
Item response theory was addressed in
the same manner, in the next section. We
illustrated

the

two

major

schools:

the

one

initiated by Lord and Novick, the American school

thought

and

the

European

one,

from

Richardson and Rasch. The merger of the two was

carried out by Professor Wright, contributing to
the

competitiveness

contemporary

researchers

a
(Ace,

number

Embretson,

Reise, Hambleton, Van der Linden and many

others). We did not fail to mention several
3

Romanian researchers in this arena, among which

Prof. Albu, Prof. Rusu, Prof. Pitariu, Prof. Balaszi,
Prof. Dobrean from the Babe-Bolyai University of
Cluj-Napoca. The schools from Iasi and Timisoara
are also represented by following Romanian
researchers -Prof. Constantin, Prof. Havrneanu,
Prof. Sava, Prof. Mricuoiu and others.
In the last section we present the main
distinction between item response theory and
classical test theory, as they were mentioned by
Susan Embretson and Paul Reise, adding our own
views as well. Summing it up, in classical test
theory, standard error of measurement is unique
and applies to all scores, while in item response
theory

variable

each

level

the

continuum latent factor. In classical tests, the

more items a classical test has, the more reliable
it is; in response models we show that, a short
test can be more reliable compared to the long
ones. The classical theory claims that comparing
scores is ideal if the forms are parallel. The item
response theory states that comparing scores is
ideal when coverage levels of the latent trait vary
4

between individuals. To build a classic test we

need representative samples. If we use item
response models, analysis of items can be made
without the use of representative sampling, even
on

simulated

data.

Classical

tests

grants

significance only if the raw scores are compared

with a norm. Item responses models do not
require norms, meaning of the raw scores are
given by comparing their distance to the items.
Furthermore, the properties of the scale interval
for conventional tests are achieved somewhat
forced by the normal distribution. Response
models acquire these properties by applying an
appropriate measurement model. In the classical
theory

test,

mixed

items

determine

unbalanced total score, while the use of such

items in the item response theory leads to an
optimal

model.

differences

are

Demonstrations
detailed

the

of
paper

these
and

concluding that the item response theory is not

an extension of classical test theory but a theory
radically different from this.

We have also shown that in the response

model the focus is not on the test, but on the
item, the circular dependency issue found in
conventional

tests

(subjects

results

are

dependent on the items sample and items

properties are dependent on the sample of the
subjects) being solved.
The first chapter ends with a summary of
the main differences between the two theories,
adapted from Hambleton and Jones.

Chapter II
Item response models
The

second

introduction

chapter

item

aims

response

models,

an
both

theoretical and applied. The classical test theory

is very simple, the measurement model being
unique. Observed score is the sum of the actual
score and measurement errors. Item response
theory no longer provides the same simplicity,
being a multi-model theory. The quality of the
measurements depends heavily on the model
chosen, as the one who best approximates the
observed data, and the fulfillment of a number of
assumptions.
concept

We
latent

began
trait,

describing

the

characteristics

and

significance of a measurement model and the

item characteristic function. We felt it is vital to
present

item

response

theory

assumptions:

unidimensionality, local independence and the

model of measurement. The applicative character
of the chapter is given by the presentation of
techniques for checking assumptions - Q 3 Yen test
for local independence, for unidimensionality a
7

series of heuristic techniques (scree-plot analysis

and eigenvalue) and statistics (Stout test for
essential unidimensionality, Martin-Lf, cluster
and NOHARM methods). We detailed calculation
formulas, most of the procedures described are
implemented in the Psihosoft CATS system.
The

paper

continues

present

dichotomous and unidimensional models,

which are the most easily understood. Some
models were presented: 1PL (Rasch) 2PL (Lord)
and

3PL

(Birnbaum),

providing

the

item

characteristic functions, curves, description and

applicability. In order not to remain in the
traditional approach we present other models of
this type: ogival models, unused on practical
applications but usefully to understands reasons
of translating to logistic models, linear logistic
model with latent trait (Fisher), four parameters
logistic model with response time or model for
repeated trials items. Although the number of
item response models of this type is much larger,
we did not continue the presentation because we

have exceeded the estimated volume of the

thesis.
Unidimensional

polytomous

item

response models are the subject of a separate

chapter. We defined the concepts of response
categories and categorical intervals as well as the
response function of the categorical interval and
category response function. Even if they are more
complex compared to the dichotomous, we were
able to synthesize functions and characteristics
curves of models such as: nominal response
model (Bock), partial credit model (Masters),
generalized partial credit model (Muraki), rating
scale model (Andersen), graded response model
(Samejima) and modified graded response model
(Muraki). We have avoided, wherever possible,
the use of sophisticated mathematical concepts
and we synthesize the mathematical summary in
terms of features, usability, and applicability.
Although item response theory stipulates
unidimensionality,

this

assumption

cannot

perform every time. Therefore, there are multi9

dimensional models of the item response

with limited use and not sufficiently studied, but it
may be used in the case of items that saturated
more than one factor. First we distinguished
between compensatory and non-compensatory
models with partially compensatory variant of the
latter.

Then

treated

multidimensional

dichotomous models, namely multidimensional

extensions of the 2PL and 3PL models, showing
response

surfaces

the

items

and

their

mathematical functions. A number of partially

compensatory extensions of the dichotomous
models were also discussed. In the case of
polytomous

models

multidimensional

generalized

presented
partial

credit

model, multidimensional partial credit model and

graded

response

multidimensional

model.

Towards the end, we mentioned other item

response models, but without going into details.
The chapter ends by presenting some
selection

criteria

item

response

models,

including a general decision-making scheme.

Overall, we addressed a number of methods to
10

study the adequacy of the model data; they will

be detailed in the next chapter.

Chapter III
Construction of the tests based on item
response theory
The third chapter is highly applicative and
refers to the construction of tests based on item
response theory. The section starts with the
presentation of general and universal aspects
concerning the construction of psychological
tests. We have shown how to prepare constructsmap, defining constructs, map them, and how to
operationalize. We then approached the elements
of items design, presenting descriptive decisions
and construct decisions as well as expert panel. A
response space was then defined, the concepts of
response or active pole and response or distracter
pole, mentioning a number of techniques to
develop space answers phenomenography ,
SOLO taxonomy and Guttman scale.
In the last stage of design, the choice of
measurement model, occupies the rest of this
11

chapter.

properties

showed

the

significance

measurement

scales

and
IRT,

describing anchoring system, the logit scale, the

scale in probabilistic unites and real scores scale.
The conclusion is that measurement in the item
response

models

differ

radically

from

measurements using classical tests.

Item calibration aims to describe and
intends providing practical guidelines on the main
techniques for initial items calibration. Just for
understanding concepts were presented a series
of heuristics techniques, unused in applications,
and then it continues with what are really
important,

methods

likelihood

estimation.

based

maximum

Parameter

estimation

techniques simultaneous for items and people

(JMLE), maximum likelihood estimation method
(MLE),

marginal

maximum

likelihood

method

(MMLE) and Bayesian methods including an

empirical

distribution

the

estimates

were

described in detail. Even if the mathematics is

extremely complex, being able to support a thesis
in the field, we tried to make it understandable by
12

presenting and explaining relationships and by

providing a concrete, clearly working algorithm.
Thus, we wanted to empower the reader with a
minimum

knowledge

mathematics

understand and build their own tests based on

item response theory.
Similarly we described the methods for
estimating the latent trait level of people,
basically the scoring system of item response
theory. We detailed the easiest scoring method,
maximum

likelihood

(ML),

and

two

scoring

systems used in professional applications, such as

maximum

posteriori

method

(MAP)

and

expected a posteriori method (EAP), perhaps the

most used in the present. We did not missed to
include a non-iterative method of scoring - the
method Owen - and the description of the role
and the place the item and test information
functions have in assessing the quality and
accuracy of the assessment.

Chapter IV
Construction of the auto-adaptive tests
13

The next chapter considers applications

from item response theory, especially through
self-adaptive tests. The existence of numerous
computerized tests on the market, some of
questionable quality, is the reason for decision to
mention

some

principles

construction

instruments for computer assisted psychological

assessment.

Thus,

requirements

have

established

human-computer

the

interface,

detailing the system of the stimuli presentation,

the system of responses and the requirements of
the data management system. We also briefly
presented the main evaluation methods using
computerized tests built on item response theory
- assessments with fixed and adaptive items.
Development of the items bank is a
separate chapter because of its quality depends
on the result of psychological assessment. We
defined a number of characteristics of an item
bank and its project, showing how to build a table
of classification of items, how we can specify a
set of constraints, how to calculate the objective
function of the bank of items and how we know
14

how many items needed at each level of latent

factor to obtain an effective bank of items. We
have also shown a number of methods to
optimize

the

bank

items

obtain

the

maximum of the information function.

Initial and online calibrations are the
next issues addressed. We showed that the items
are not immutable; their parameters can be
modified as a result of overexposure effect. This
process, referred as deviation of parameters, can
be monitored and attenuated by a number of
techniques described in detail in the paper.
Perhaps one of the curiosities of self-adaptive
tests is the way in which items are selected and
how they adapt to the subjects answers. This
curiosity will be satisfied in the section for
automatic selection of items. We have shown
some strategies to entry a test, a series of
methods and techniques for selecting the next
item, the advantages and disadvantages of each,
and some methods to complete the assessment.
There

have

been

also

given

number

techniques to control exposure and balancing the

items in order to reduce obsolescence of the bank

of items, as well as methods for identifying
aberrant response pattern, item response theory
having

strong

mathematical

mechanisms

control the facade trends or random responses.

The role of the first four chapters was to
create the conceptual, theoretical, base for the
construction of tests based on item response
theory and provide a set of practical methods to
achieve this. This was achieved in a total of about
190 pages; important aspects are dealt with in
detail. Some elements (such as response models
important for psychology or certain techniques)
were briefly covered and perhaps deserved more
attention. The latest researches published in
recent issues of the journal Psychometrika were
not included in the thesis. Reasons for this were
their abstract nature and the thesiss word limit.
There was the risk of presenting a too voluminous
paper,

containing

elements

specialists in this field,

population interest.

aimed

few

without a general

Chapter V
Influence of psychological
assessment model on accuracy and
reliability of results
Chapter five contains over 200 pages and
deals strictly with a practical adaptation of a
personality

inventory

(BigFive

Plus)

item

response theory and the study of relations

between classical assessment and assessment
based on item response theory. Originally, we
wanted the analysis of two instruments: BigFive
Plus personality inventory and EVIQ intelligence
test. We chose not to analyze EVIQ for the
following reasons. First, the analysis would have
doubled the volume of chapter. Second, most
research on item response theory was performed
using aptitude tests, creating the false impression
that the item response models can be applied
only to those instruments. We have extended the
scope, proposing the term "coverage level of the
latent trait" instead of difficulty and showed that
item response

theory can
17

be used without

problems in the case of personality tests as well.

EVIQ implementation in Psihosoft CATS will be
carried out separately, as a result of further study.
The overall objective of this thesis is to
investigate the compatibility of evaluations based
on item response models and those based on the
classical theory. In order to achieve this general
objective we follow several steps. First, we
analyze the test to study the assumptions of item
response

models;

second,

build

computerized evaluation system based on item

response theory, third, we see the degree of
compatibility between the scores on the items of
the classic test and IRT and we will see whether
we can speak of a relationship between the
psychometric properties of classical and IRT tests.
Research design involves two studies. In
the first study we use classical techniques for
construction of psychological tests and items
calibration. Thus, the 240 personality items of the
inventory will be analyzed at the item and scale
level, studying the normal distribution, internal
18

consistency and factorial structure. Then we

proceed to analyze unidimensionality and initial
calibration using item response models type 2PL
or 3PL. Ideal would be to use a 3PL model type,
but this was not possible every time. The second
study involves the administration of the classical
test and those based on item response theory to
the same group of subjects, at a certain period of
time, and study the relationship between scores,
the parameters of discrimination and the level of
coverage in latent trait.
Research hypotheses are simple, clear
and precise. These are in line with studies
(relatively few) on this subject. We note Lawsons
(1991) and Xitaos (1998) research that uses
ability tests and linear relationships. Embretson
and Hambleton latest research was conducted
(2009 and 2011) on simulated data predicting the
type of relationship results in our studies. The null
hypothesis states that there is no relationship
between the results of classical tests and the
administration of IRT tests. Rejecting the null
hypothesis

would

lead
19

the

presence

relations on two levels: the scores - there are

significant links between the scores achieved by
the classical version and the IRT version - and at
the

items

parameters

discrimination

and

coverage in latent trait. That's why we dont use a

Rasch model measurement type. Discrimination
parameter could not be studied. Methods of
analysis in the case of the second study are not
sophisticated. We investigate the linear nature of
the relationship by Bravais-Pearson r bivariate
correlations and the existence of differences by
Student t test for paired samples. Since we can
assume that the relationship can exist without
having a linear character, we shall proceed to
linear regression of variables to one another
through processes such as estimating the curve.
The principle of minimal residues will indicate the
best relationship. Data analysis programs used
are IBM SPSS for Windows and Psihosoft CATS, the
latter being used in tasks that require especially
item response theory techniques.
Research samples are different. The
first study were used a number of 4647 subjects,
20

and for the second study group of 323 students,

their characteristics are described in each study.
Analysis

the

normality

the

distributions was performed at the level of all

the 30 faces and to the five factors of the
instrument. Tests were used to compare the
observed distribution with the theoretical normal
distribution

(Kolmogorov-Smirnov),

analysis

symmetry and excess coefficient and distance

analysis of the observed data with the regression
line in relation to the normal distribution. The
results are presented in detail in the paper and
lead to distributions that deviate significantly
from the normal distribution.
Scale

consistency

analyses

were

performed similar to univariate normality, both on

dimensions and factors. We also studied the
presence
Tukey

test

multiplicative
and

interactions

multivariate

using

normality

distributions using Hotelling t2 test. It is noted

that a relatively small number of factors are
above the threshold of .7 required for a consistent
21

scale. Most factors have an internal consistency

between .6 and .7, which is acceptable for
research

purposes,

but

questionable

for

diagnostic. Also, a number of 8 factors have small

levels of consistency; normally they should be
excluded from the analysis. Multivariate normal
distribution criterion was reached for all variables
analyzed, indicating the relevance of the method.
Items

are

non-additive,

but

multiplicative;

however this is not an error but is caused by the

dichotomous nature and the small number of
items in the facets.

We have not been limited only to analysis

of consistency, but we proceeded to investigate
the internal structure of the 30 factors as well.
22

Since classical factor analysis cannot be used in

optimal conditions due to lack of normality, the
presence of multiplicative interactions and low
consistency, we used a nonparametric method
based

vector

and

centroide

analysis. This method is called

coordinates
categorical

principal components analysis (CATPCA) and

described

detail

the

article

"Principal

components analysis for categorical data" in the

Journal Psychology of Human Resources, Volume
10, no. 2/2012, pages 103-117. This study deals
with

significant

amount

data

and

accompanied by a critical analysis of items for

each factor. The result was a number of three
factors that will be completely excluded, 16 items
to be removed, and only 3 factors having a purely
one-dimensional nature. Most factors present a
dimensional-axial structure or two-dimensional
structure. The presence of an axis means that the
second dimension has not the specific of a
component, but orientates the main dimension.
Axis-dimension distinction was made following
items saturation analysis, investigating centroids

coordinates and critical analysis of clusters of

items.

The

results

this

analysis

will

published in the journal "Annals of the Alexandru

Ioan Cuza University" series Psychology; article is
currently in the process of reviewing.
We suggested hypothetically the exclusion
of items or factors. After studying the internal
structure

the

factors,

dimensionality

analysis and calibration of items followed.

Unidimensionality was verified by DIMTEST, in
partitioning set were included the items that
strongest saturates the factor, other items were
included in the evaluation set. Following this
analysis, the problematic items were effectively
removed, unidimensionality controls being made
by NOHARM. The results support strongly the
CATPCA analysis. Indeed, three factors have been
completely eliminated, most losing one or two
items in order to reach a definite one-dimensional
structure. Calibration considered the 3PL model,
the assumption of the model being tested by
measuring the ratio of likelihood logistics. In the
case of some latent traits, calibration failed for
24

the 3PL model and we use the Lord (2PL) model.

Unfortunately,

single

latent

trait

strictly

complies with the requirements of 2PL model,

morality

factor.

For

all

other

factors,

the

distribution of observed data at the item level

deviates

significantly

characteristic

curve.

from
Items

also

the

model

showed

tendency to concentrate in middle area of the

latent trait continuum for every factor. Both
biases are the result of the origin of the items
from classic tests and hold both construction
mode of the instrument and data collection. Even
if the results clearly indicate the presence of
errors, they have led, however, to useful results.
The second study aims to verify the
following

hypothesis.

The

first

hypothesis

supports a link between the latent factor level of

the subject assessed with classical test and the
subjects evaluated by IRT. We note that classic
test was administered to all the 240 items, paper
and pencil format and computerized test has a
smaller number of items, missing three factors,
and the items were presented randomly. Between
25

the two administrations, there was a period of 4-5

months. Variables were the z score of each
subject at each factor and the estimated theta for
each factor. The comparison is possible because
the distributions are standardized and is strongly
compatible.
descriptive
regressive.

Analyses

were

techniques,
For

item

represented
differential

response

models,

by
and
the

estimators averages focuses on the middle of the

latent trait continuum, showing the origin of the
items. Standard errors are very small, as well as
the standard deviations. The amplitudes of the
distributions are consistent with this orientation
on the average of the latent trait. For the classical
items the amplitudes of distributions are much
larger, sample dependence is obvious. Assessing
subjects with an IRT test, we could conclude
average levels of latent factor, without emphasis
on most people. Using a classic test and a norm
built on the 323 subjects, some people would
present very high or very low levels of latent
factors, and in reality this is wrong.

Significant

differences have resulted between the results

obtained with classical tests and IRT tests in all

latent factors, which supports once again the
dependence of the sample. There is, however, a
number of significant linear correlations, only 4
factors showing that there is no significant
relationship between variables. Nevertheless, the
best explanatory model is not linear but cubic and
logistic models. Cubic models are characteristics
of a third degree equation, and those logistic
correspond to an inverse of an exponential
equation. In our research, along with the nature
of the relationship between scores obtained in
tests built on two theories, we have provided the
equations of transformation of scores based on
cubic and logistic models. These results do not
invalidate the research conducted by the authors
mentioned but complement them, claiming cubic
models resulting from simulated studies. Despite
the biases present, we could argue that from an
assessment using a classical test and evaluation
with a variant of IRT, the results are consistent
even on a linear relationship, but the best model

is not linear but a cubic or logistic depending on

the nature of the latent factor measured.

The second level of analysis focused on the

psychometric

properties

the

items,

the

coverage level in latent trait and discrimination.

The second hypothesis states that there are
differences

between

discrimination

classic

items and items built on item response theory.

Discrimination

the

classic
28

items

can

evaluated based on a point-biserial correlation

between item and scale. However, this indicator
cannot

compared

discrimination

directly

parameter

for

with

item

the

response

models because of different scales. Therefore, the

common denominator is the logistic scale, Fisher
transformed
coefficient

the

point-biserial

bringing

data

correlation
a

common

denominator. The analysis was performed at each

dimension and for the entire test, using the same
techniques.
parameter

The
for

amplitude

classic

items

discrimination
much

lower

compared to that of items IRT, the discrimination

mean of the second category being, also, upper
in terms of very low standard errors of the
estimates.

The

same

parameters

fall

and

standard deviation, elements that lead us to the

idea of a superior discriminative capacity of IRT
items compared to the classics. The relationship
between

these

variables

also

has

linear

character, but the best explanatory model is the

cubic corresponding to an equation of the third

degree. This model is also preserved at the

dimensions, not only to the whole test.
Clearly,

significant

differences

exist

between the two parameters, the tests having

different capabilities of discrimination, higher for
item response models.

The third hypothesis considers the same

model of analysis, only we do not refer to
discrimination but to the latent trait coverage. For
the classical tests, the coverage level is given by
the ratio of active response. This proportion,
however, cannot be directly compared with the
corresponding parameter of IRT items, requiring z
score for normal distribution of proportion to one
tail. It follows a probit indicator, comparable with
the logit scale of IRT items parameter. To comply
with strict compatibility between scales, the
coverage

level

IRT

items

has

been

transformed, also, in probits. The amplitude of

distribution for IRT items is much higher in
31

comparison

with

classic

items,

the

average

hovering around the middle of the continuum of

latent trait, slightly to higher values, generally no
significant differences between means. Standard
errors of estimate are small; standard deviations
were, again, higher for IRT items. This shows that
the items assessing overall in the same area, the
results can be compared. The fact is supported by
the existence of significant and strong linear
correlations, without significant differences. The
cubic model is required again, the relationship
between

the

two

variables

having

the

characteristics of an equation of the third degree.

Despite the difficulties, we have supported

with real data what some researchers have shown
through

simulation

studies.

Item

responses

models are superior, estimators are more precise,

much

the

classical

theory

limits

being

exceeded. The answer to the original question is

positive. Yes, we can estimate the amount of
latent factor of a subject. This level of accuracy
32

comes with a price however. The rigors are

higher, mathematical mechanism is complicated,
paper and pencil assessments cannot be done
and the bank item needs to be extremely well
designed.

The chapter concludes with the limits of

the research and the development perspectives.
Since the second study tests contained a different
number of items, we noted possible errors that
can cause this difference, influencing the results.
The origin of items from the classic tests leads to
average levels of coverage in latent trait, which is
another possible limitation. In the same category
falls and the impossibility for most factors to
comply to the measuring model assumption:
there are differences between the characteristic
curve of the model and the observed data.
34

Development perspectives consider several

directions: from the study of multidimensional
and polytomous models, to the design of a strong
mechanism that identifies trends faade and
controls random responses.

Chapter VI
Conclusions and discussion
The last chapter proposes a synthesis of
the theory, practice and the research components
of the thesis identifying the main elements
presented in the paper.
Our intention was to provide a comprehensible
summary encompassing the entire approach and
mark the main results and concepts used.

References
1. Andersen, E. (1997). The rating scale
model. n W. Van der Linden, & R.
Hambleton, Handbook of modern item
response theory (pg. 67-84). New York:
Springer.
2. Baker, F. B. (1992). Item response theory:
Parameter estimation techniques. New
York: Marcel Dekker.
3. Baker, F. B. (2001). The basics of item
response theory. Wisconsin: ERIC
Clearinghouse on Assessment and
Evaluation.
4. Bock, R. (1972). Estimating item
parameters and latent ablility when
responses are scored in two or more
nominal categories. Psychometrika(37), 2951.
5. Bock, R. D., & Aitken, M. (1981). Marginal
maximum likelihood estimation of item
parameters: Application of an EM
algorithm. Psychometrika, 46, 443-459.
6. Bock, R., & Lieberman, M. (1970). Fitting a
response model for n dichotomously scored
items. Psychometrika(35), 179-197.

7. Bock, R., & Mislevy, R. (1982). Adaptive

EAP estimation of ability in a
microcomputer environment. Applied
Psychological Measurement, 6, 431-444.
8. Boekkooi-Timminga, E. (1991). A methos
for designing Rasch model-based item
banks. Paper presented at the annual
meeting of the Psychometric Society.
Princeton, NJ.
9. Constantin, T., & Macarie, A. (2012).
Chestionarul BigFive Plus - Manualul
probei. Draft, Universitatea Al. I Cuza,
Facultatea de Psihologie i tiine ale
Educaiei, Iai.
10.Constantin, T., Macarie, A., Gheorghiu, A.,
Iliescu, M., Fodorea, A., & Caldare, L.
(2008). Chestionarul Big Five PLUS
rezultate preliminare. n M. Milcu,
Cercetarea Psihologic Modern: Direcii i
perspective (pg. 46-58). Bucureti: Editura
Universitar.
11.Costa, R., & McCrae, P. (2003). Personality
in Adulthood: A Five Factor Theory
Perspenctive. New York: Guilford Press.
12.Crocker, L., & Algina, J. (1986). Introduction
to classical and modern test theory. New
York: Holt, Rinehart & Winston.
37

13.Davey, T., & Parshall, C. (1995). New

algorithms for item selection and exposure
control with computerized adaptive testing.
Anual meeting of the American Educational
Research Association. San Francisco, CA.
14.DeMars, C. (2010). Item Response Theory.
New York: Oxford University Press.
15.Embretson, S. E., & Reise, S. P. (2000).
Item Response Theory for Psychologists.
New Jersey, USA: Lawrence Erlbaum
Associates, Publishers.
16.Finch, H., & Habing, B. (2007). Performance
of DIMTEST- and NOHARM-Based Statistics
for Testing Unidimensionality. Applied
Psychological Measurement(31), 292-307.
17.Flaugher, R. (1990). Item Pools. n H.
Wainer, Computerized adaptive testing: A
primer (pg. 41-64). Hillsdale, NJ: Lawrence
Erlbaum Associates.
18.Glas, G. A. (2002). Item calibration and
parameter drift. n W. J. van der Linden, &
G. A. Glas, Computerized Adaptive Testing:
Theory and Practice (pg. 183-199). New
York: Kluwer Academic Publichers.
19.Goldberg, L. (1999). A Broad-Bandwidth,
Public-Domain, Personality Inventory
38

Measuring the Lower-Level Facets of

Several Five-Factor Models. Personality
Psychology in Europe, 7, 7-28.
20.Hambleton, R. K., & Jones, R. W. (1993).
Comparison of classical test theory and
item response theory and their applications
to test development. Educational
Measurement: Issues and Practice, 12(3),
38-47.
21.Hambleton, R., Swaminathan, H., & Rogers,
J. H. (1991). Fundamentals of Item
Response Theory. London: Sage
Publications Inc.
22.Keller, L. A. (2000). Ability estimation
procedures in Computerized Adaptive
Testing. American Institute of Certified
Public Accountants.
23.Kingsbury, G., & Zara, A. (1989).
Procedures for selecting items for
computerized adaptive tests. Applied
Measurement in EWducation(2), 359-375.
24.Lord, F. M. (1980). Applications of item
response theory to practical testing
problems. Hillsdale, NJ: Lawrence Erlbaum.

25.McDonald, R. (1967). Nonlinear factor

analysis. (W. B. Press, Ed.) Psychometric
Monographs(15).
26.Ostini, R., & Nering, M. L. (2006).
Polytomous Item Response Theory Models.
Thousand Oaks, California: Sage
Publications.
27.Reckase, M. D. (2009). Multidimensional
Item Response Theory. New York: Springer.
28.Reckase, M. D. (2010). Designing item
pools to optimize the functioning of a
computerized adaptive test. Psychological
Test and Assessment Modeling, 52(2), 127141.
29.Samejima, F. (1996). The graded response
model. n W. Van der Linden, & R.
Hambleton, Handbook of modern item
response theory. New York: Springer.
30.Stan, A. (2002). Testul psihologic. Evoluie,
construcie, aplicaii. Iai: Polirom.
31.Stocking, M., & Swanson, L. (1998).
Optimal design of item banks for
computerized adaptive tests. Applied
Psychological Measuremen(22), 271-280.
32.van de Linden, W. J., & Pashley, P. J. (2002).
Item selection and ability estimation in
40

Adaptive Testing. n W. J. van der Linden, &

G. A. Glas, Computerized Adaptive Testing
(pg. 1-25). New York: Kluwer Academic
Publishers.
33.Van Der Linder, W. J., & Hambleton, R. K.
(1997). Handbook of Modern Item
Response Theory. New York: Springer.
34.Wesman, A. (1971). Writing the test item.
n R. Thorndike, Educational measurement.
Washington D.C.: American Council on
Education.
35.Wilson, M. (2005). Constructing Measures:
An Item Response Modeling Approach. New
Jersey: Lawrence Erlbaum Assoicates
Publishers.
36.Xitao, F. (1998, June). Item response theory
and classical test theory: an empirical
comparison of their item/person statistics.
Educational and Psychological
Measurements, 58(3), 357-373.

Test Theory. A Unified Treatment - McDonald, R
No ratings yet
Test Theory. A Unified Treatment - McDonald, R
578 pages
The International Journal of Educational and Psychological Assessment Vol 1
100% (3)
The International Journal of Educational and Psychological Assessment Vol 1
51 pages
W H Van Der Linden - Ronald K Hambleton - Handbook of Modern Item Response Theory-Springer (1997)
No ratings yet
W H Van Der Linden - Ronald K Hambleton - Handbook of Modern Item Response Theory-Springer (1997)
265 pages
Al-Halal Wal Haram Fil Islam
0% (1)
Al-Halal Wal Haram Fil Islam
380 pages
Frederic M. Lord - Applications of Item Response Theory To Practical Testing Problems (1980)
No ratings yet
Frederic M. Lord - Applications of Item Response Theory To Practical Testing Problems (1980)
289 pages
Wim J. Van Der Linden (Auth.) - Linear Models For Optimal Test Design 2005
No ratings yet
Wim J. Van Der Linden (Auth.) - Linear Models For Optimal Test Design 2005
420 pages
Frederic M. Lord - Applications of Item Response Theory To Practical Testing Problems (1980) - 1-25
No ratings yet
Frederic M. Lord - Applications of Item Response Theory To Practical Testing Problems (1980) - 1-25
25 pages
Handbook of Modern Item Response Theory
No ratings yet
Handbook of Modern Item Response Theory
509 pages
Undergraduate Algebra Problems and Solutions
100% (1)
Undergraduate Algebra Problems and Solutions
106 pages
(Ronald K. Hambleton, Hariharan Swaminathan (Auth.
No ratings yet
(Ronald K. Hambleton, Hariharan Swaminathan (Auth.
340 pages
Ceu File
No ratings yet
Ceu File
38 pages
Item Response Theory
100% (1)
Item Response Theory
14 pages
Front Page File
No ratings yet
Front Page File
54 pages
DAWN Editorials - July 2016
No ratings yet
DAWN Editorials - July 2016
99 pages
MN 18
No ratings yet
MN 18
67 pages
Session 5 Customer Experience Journey Map
No ratings yet
Session 5 Customer Experience Journey Map
33 pages
Latent Trait Theory For Organizational Research: Bowling Green State University
No ratings yet
Latent Trait Theory For Organizational Research: Bowling Green State University
34 pages
Proposals For Resolving The Kashmir Dispute
No ratings yet
Proposals For Resolving The Kashmir Dispute
51 pages
Development of A Perceived Stress Scale Based On Classical Test Theory
No ratings yet
Development of A Perceived Stress Scale Based On Classical Test Theory
17 pages
Web Based Rubric
No ratings yet
Web Based Rubric
18 pages
JMetrik CTT N IRT
No ratings yet
JMetrik CTT N IRT
15 pages
G Theory Application Suen Lei
No ratings yet
G Theory Application Suen Lei
13 pages
Reckase Vol 11 No 1
No ratings yet
Reckase Vol 11 No 1
13 pages
JMETRIK Classical Test Theory and Item R
No ratings yet
JMETRIK Classical Test Theory and Item R
14 pages
Hinton Platt (2018) Measurement Theory and Psychological Scaling
No ratings yet
Hinton Platt (2018) Measurement Theory and Psychological Scaling
47 pages
Item Response Theory - An Introduction
No ratings yet
Item Response Theory - An Introduction
23 pages
1PNO The Rasch Testlet Model
No ratings yet
1PNO The Rasch Testlet Model
25 pages
ESTP 2017 0246 OnlineFirst 2
No ratings yet
ESTP 2017 0246 OnlineFirst 2
23 pages
Windows 10 and Storage Sense Best Cleaner
No ratings yet
Windows 10 and Storage Sense Best Cleaner
13 pages
Test Development
No ratings yet
Test Development
30 pages
Etl 1110 2 256
No ratings yet
Etl 1110 2 256
47 pages
BBA Course Curriculum PDF
No ratings yet
BBA Course Curriculum PDF
94 pages
Chapter5 3
No ratings yet
Chapter5 3
35 pages
Using The General Diagnostic Model To Measure Learning and Change in A Longitudinal Large-Scale Assessment
No ratings yet
Using The General Diagnostic Model To Measure Learning and Change in A Longitudinal Large-Scale Assessment
26 pages
Advanced Psychometric Toss Solution
No ratings yet
Advanced Psychometric Toss Solution
32 pages
03-MS - Liquid Penetration Test - Ludhiana
No ratings yet
03-MS - Liquid Penetration Test - Ludhiana
7 pages
Outline Example For Literature Review
100% (2)
Outline Example For Literature Review
7 pages
Quantum Mechanics
No ratings yet
Quantum Mechanics
31 pages
2017 12 Trees Impact Ice Storms Climate
No ratings yet
2017 12 Trees Impact Ice Storms Climate
6 pages
Assessing The Validity and Reliability of Dichotomous Test Result
No ratings yet
Assessing The Validity and Reliability of Dichotomous Test Result
11 pages
Akva News 2019 - Web - 22226
No ratings yet
Akva News 2019 - Web - 22226
11 pages
Designing A Place Value Chart To Help Basic Grade Three Learners Subtract Four Digit Numbers
No ratings yet
Designing A Place Value Chart To Help Basic Grade Three Learners Subtract Four Digit Numbers
4 pages
Test Theories: Classical Theory and Item Response Theory: Special Section
No ratings yet
Test Theories: Classical Theory and Item Response Theory: Special Section
10 pages
Measurement Models For Ordered-Categorical Indicators: A Factor Analytic Approach For Testing The Level of Measurement
No ratings yet
Measurement Models For Ordered-Categorical Indicators: A Factor Analytic Approach For Testing The Level of Measurement
25 pages
List of NBP Authorised Branches For Govt Collection
100% (1)
List of NBP Authorised Branches For Govt Collection
3 pages
CTT TRI Comparisons Hambleton
No ratings yet
CTT TRI Comparisons Hambleton
10 pages
Panel Discussion 2 Mixed Methods Approach To Assuring Content Validity
No ratings yet
Panel Discussion 2 Mixed Methods Approach To Assuring Content Validity
48 pages
Shape Theory: Categorical Methods of Approximation
From Everand
Shape Theory: Categorical Methods of Approximation
J. M. Cordier
No ratings yet
Ramdev Final
67% (3)
Ramdev Final
13 pages
Students Ranking Based On Their Abilitie
No ratings yet
Students Ranking Based On Their Abilitie
10 pages
Conducting External and Internal Analysis On Rupali Bank LTD & Future Investment Decision Making
No ratings yet
Conducting External and Internal Analysis On Rupali Bank LTD & Future Investment Decision Making
10 pages
Classical Test Theory Vs Item Response Theory
No ratings yet
Classical Test Theory Vs Item Response Theory
42 pages
Climate Very Importabty
No ratings yet
Climate Very Importabty
6 pages
Ability Estimation Using The Classical Test Theory and Three-Parameter Item Response Theory Model
No ratings yet
Ability Estimation Using The Classical Test Theory and Three-Parameter Item Response Theory Model
6 pages
PSY631 Highlighted Handouts
No ratings yet
PSY631 Highlighted Handouts
68 pages
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
No ratings yet
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
4 pages
Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/person Statistics
No ratings yet
Item Response Theory and Classical Test Theory: An Empirical Comparison of Their Item/person Statistics
17 pages
Comparison of Unidimensional and Multidimensional Models Based On Item Response Theory in Terms of Both Variables of Test Length and Sample Size
No ratings yet
Comparison of Unidimensional and Multidimensional Models Based On Item Response Theory in Terms of Both Variables of Test Length and Sample Size
6 pages
HOW 01 Both Side
No ratings yet
HOW 01 Both Side
1 page
Islamic Civilisation
No ratings yet
Islamic Civilisation
22 pages
Psychophysics. Irt
No ratings yet
Psychophysics. Irt
5 pages
Financial Reporting CPA Multiple Choice Flashcards - Quizlet
0% (2)
Financial Reporting CPA Multiple Choice Flashcards - Quizlet
6 pages
2.094 F E A S F: Inite Lement Nalysis of Olids and Luids
No ratings yet
2.094 F E A S F: Inite Lement Nalysis of Olids and Luids
3 pages
Geoelectric Study of Coal Deposits at UNWANA/AFIKPO Area of Southeastern Nigeria
No ratings yet
Geoelectric Study of Coal Deposits at UNWANA/AFIKPO Area of Southeastern Nigeria
12 pages
Revision For The First Regularly Test d2946
No ratings yet
Revision For The First Regularly Test d2946
4 pages
Annurev Statistics 041715 033702
No ratings yet
Annurev Statistics 041715 033702
25 pages
Evaluation of Mathematics Achievement Test: A Comparison Between CTT and IRT
No ratings yet
Evaluation of Mathematics Achievement Test: A Comparison Between CTT and IRT
8 pages
Item Response Theory PDF
100% (2)
Item Response Theory PDF
31 pages
L11 ItemAnalysis
No ratings yet
L11 ItemAnalysis
59 pages
Jeju Island
No ratings yet
Jeju Island
2 pages
IRTema PDF
No ratings yet
IRTema PDF
58 pages
Welcome, Students!
No ratings yet
Welcome, Students!
23 pages
Evaluating The Reliability Index of An Entrance Exam Using Item Response Theory
No ratings yet
Evaluating The Reliability Index of An Entrance Exam Using Item Response Theory
4 pages
Item Statistics of Multiple Choice Physics Achievement Test Using Classic
No ratings yet
Item Statistics of Multiple Choice Physics Achievement Test Using Classic
8 pages
POB UNIT 3 Management and Industrial Relations
No ratings yet
POB UNIT 3 Management and Industrial Relations
5 pages
Muraki 1990
No ratings yet
Muraki 1990
14 pages
Item Response Theory Columbia University Mailman School of Public Health
No ratings yet
Item Response Theory Columbia University Mailman School of Public Health
9 pages
Measuring Cognitive Performance On Programming Knowledge - Classical Test Theory Versus Item Response Theory
No ratings yet
Measuring Cognitive Performance On Programming Knowledge - Classical Test Theory Versus Item Response Theory
5 pages
Teaching and Learning Robotic Arm Model: Kadirimangalam Jahnavi Sivraj P
No ratings yet
Teaching and Learning Robotic Arm Model: Kadirimangalam Jahnavi Sivraj P
6 pages
FP Lesson 3 The Writing Process
No ratings yet
FP Lesson 3 The Writing Process
4 pages
Change With Something: Exit-And-Pakistan
No ratings yet
Change With Something: Exit-And-Pakistan
1 page
Comparison of Classical Test Theory and
No ratings yet
Comparison of Classical Test Theory and
8 pages
Eduwiser's Excerpt1
No ratings yet
Eduwiser's Excerpt1
4 pages
Multivariate Variance Component Analysis: An Application in Test Development
No ratings yet
Multivariate Variance Component Analysis: An Application in Test Development
22 pages
Discrimination and Difficulty Indices of A Senior High School Entrance Examination Using Classical Test Theory
No ratings yet
Discrimination and Difficulty Indices of A Senior High School Entrance Examination Using Classical Test Theory
7 pages
Test Theories - CTT, IRT
No ratings yet
Test Theories - CTT, IRT
4 pages
Proposed First Floor Plan: Balcony Bath 6'-0"x6'-3"
No ratings yet
Proposed First Floor Plan: Balcony Bath 6'-0"x6'-3"
1 page
Theories Related To Tests and Measurement Final
No ratings yet
Theories Related To Tests and Measurement Final
24 pages
Useful Microsoft Teams Notes
No ratings yet
Useful Microsoft Teams Notes
2 pages
Papers of Cost Accounting
No ratings yet
Papers of Cost Accounting
8 pages
Item Response Theory
No ratings yet
Item Response Theory
11 pages
CLASSICAL TEST THEORY: An Introduction To Linear Modeling Approach To Test and Item Analysis
No ratings yet
CLASSICAL TEST THEORY: An Introduction To Linear Modeling Approach To Test and Item Analysis
7 pages
Item Response
No ratings yet
Item Response
7 pages
X-Ray Inspection BGA Glenbrook Technologies
No ratings yet
X-Ray Inspection BGA Glenbrook Technologies
25 pages
Item Response Theory
No ratings yet
Item Response Theory
9 pages
Tmpa291 TMP
No ratings yet
Tmpa291 TMP
11 pages
Edulingua - Article Template Ok
No ratings yet
Edulingua - Article Template Ok
5 pages
An Introduction To Psychometrics
100% (1)
An Introduction To Psychometrics
5 pages
Self Report Adolescents Attachment
No ratings yet
Self Report Adolescents Attachment
15 pages
Work Order Template
No ratings yet
Work Order Template
5 pages
Answer The Following Questions Completely. Make Your Discussions Concise I.E. Brief But Comprehensive
No ratings yet
Answer The Following Questions Completely. Make Your Discussions Concise I.E. Brief But Comprehensive
4 pages
Springfield Class Cruiser PDF Draft
No ratings yet
Springfield Class Cruiser PDF Draft
4 pages