JJP PsychAssessment Handouts
JJP PsychAssessment Handouts
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
A. ANCIENT ROOTS
• Chinese Civilization – testing was instituted as a means of selecting who, of the many applicants, would obtain
government jobs
• Greek Civilization – tests were used to measure intelligence and physical skills
• European Universities – these universities relied on formal exams in conferring degrees and honors
B. INDIVIDUAL DIFFERENCES
• Charles Darwin – believed that despite our similarities, no two humans are exactly alike. Some of these individual
differences are more “adaptive than others and these differences lead to more complex, intelligent organisms over
time.
• Francis Galton – he established the testing movement; introduced the anthropometric records of students;
pioneered the application of rating-scale and questionnaire method, and the free association technique; he also
pioneered the use of statistical methods for the analysis of psychological tests He used the Galton bar (visual
discrimination length) and Galton whistle (determining the highest audible pitch). Moreover, he also noted that
persons with mental retardation tend to have diminished ability to discriminate among heat, cold and pain.
E. WORLD WAR I
• Robert Yerkes – pioneered the first group intelligence test known as the Army Alpha (for literate) and Army Beta
(for functionally illiterate)
• Arthur S. Otis – introduced multiple choice and other “objective” item type of tests
• Robert S. Woodworth – devised the Personal Data Sheet (known as the first personality test) which aimed to
identify soldiers who are at risk for shell shock
F. PERSONALITY TESTERS
• Herman Rorschach – slow rise of projective testing; Rorschach Inkblot Test
• Henry Murray & Christina Morgan – Thematic Apperception Test
• Early 1940’s – structure tests were being developed based on their better psychometric properties
• Raymond B. Cattell – 16 Personality Factors
• McCrae & Costa – Big 5 Personality Factors
A. OBJECTIVES OF PSYCHOMETRICS
1. To measure behavior (overt and covert)
2. To describe and predict behavior and personality (traits, states, personality types, attitudes, interests, values,
etc.)
3. To determine signs and symptoms of dysfunctionality (for case formulation, diagnosis, and basis for
intervention/plan for action)
1. Base Rate - An index, usually expressed as a proportion, of the extent to which a particular trait, behavior,
characteristic, or attribute exists in a population
2. Hit Rate - The proportion of people a test or other measurement procedure accurately identifies as possessing
or exhibiting a particular trait, behavior, characteristic, or attribute
a. Specificity - percentage of occurrences that are correctly predicted
b. Sensitivity - percentage of non-occurrences correctly predicted
3. Miss Rate - The proportion of people a test or other measurement procedure fails to identify accurately with
respect to the possession or exhibition of a trait, behavior, characteristic, or attribute; a "miss" in this context is
an inaccurate classification or prediction and can be classified as:
a. False Positive (Type I error) - an inaccurate prediction or classification indicating that a testtaker did
possess a trait or other attribute being measured when in reality the testtaker did not
b. False Negative (Type II error) - an inaccurate prediction of classification indicating that a testtaker did not
possess a trait or other attribute being measured when in reality the testtaker did
H. CROSS-CULTURAL TESTING
1. Parameters where cultures vary
– Language – Education
– Test Content – Speed (Tempo of Life)
2. Culture Free Tests
– An attempt to eliminate culture so nature can be isolated
– Impossible to develop such because culture is evident in its influence since birth or an individual
– The interaction between nature and nurture is cumulative and not relative
3. Culture Fair Tests
– These tests were developed because of the non-success of culture-free tests
– Nurture is not removed but parameters are common an fair to all
– Can be done using three approaches such as follows:
✓ Fair to all cultures ✓ Fair only to one culture
✓ Fair to some cultures
4. Culture Loadings
– The extent to which a test incorporates the vocabulary, concepts, traditions, knowledge, and feelings,
associated with particular culture
Saint Louis College
City of San Fernando, La Union
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
2. Professional Regulatory Board of Psychology Resolution No. 11, series of 2017: Adoption and Promulgation of the
Code of Ethics and Professional Standards for Psychology Practitioners in the Philippines
• Ethical Principles
i. Respect for Dignity of Persons and Peoples
– Respect for the unique worth and inherent dignity of all human beings;
– Respect for the diversity among persons and peoples;
– Respect for the customs and beliefs of cultures.
ii. Competent caring for the well-being of persons and peoples
– Maximizing benefits, minimizing potential harm, and offering or correcting harm.
– Application of knowledge and skills that are appropriate for the nature of a situation as well as
social and cultural context.
– Adequate self-knowledge of how one’s values, experiences, culture, and social context might
influence one’s actions and interpretations.
– Active concern for the well-being of individuals, families, groups, and communities;
– Taking care to do no harm to individuals, families, groups, and communities;
– Developing and maintaining competence.
iii. Integrity
– Integrity is based on honesty, and on truthful, open and accurate communications.
– Maximizing impartiality and minimizing biases
– It includes recognizing, monitoring, and managing potential biases, multiple relationships, and
other conflicts of interest that could result in harm and exploitation of persons and peoples.
– Avoiding incomplete disclosure of information unless complete disclosure is culturally
inappropriate, or violates confidentiality, or carries the potential to do various harm to individuals,
families, groups, or communities
– Not exploiting persons or peoples for personal, professional, or financial gain
– Complete openness and disclosure of information must be balanced with other ethical
considerations, including the need to protect the safety or confidentiality of persons and peoples,
and the need to respect cultural expectations.
– Avoiding conflicts of interest and declaring them when they cannot be avoided or are inappropriate
to avoid.
iv. Professional and Scientific responsibilities to society
– We shall undertake continuing education and training to ensure our services continue to be
relevant and applicable.
– Generate researches
• General Ethical Standards and Procedures
i. Resolving Ethical Issues iv. Confidentiality
ii. Standards of Professional v. Advertisement and Public
Competence Statements
iii. Human Relations vi. Records and Fees
• Ethical Standards and Procedures in Specific Functions
v. Assessment vii. Education and Training
vi. Therapy viii. Research
3. Professional Regulatory Board of Psychology Resolution No. 12, series of 2017: Endorsement, Adoption and
Ratification of the International Declaration of Core Competences in Professional Psychology as Part of the IRR
Governing the Practice of Psychology and Psychometrics in the Philippines
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
LESSON 3: PSYCHOLOGICAL ASSESSMENT TECHNIQUES AND TOOLS
I. TECHNIQUES IN PSYCHOLOGICAL ASSESSMENT
1. Traditional Clinical Assessment Modality
Step 1: Referral: Deciding what is being assessed
Step 2: Determining the goals for assessment
Step 3: Collecting standards for making decisions
Step 4: Collecting assessment data
Step 5: Making decisions and judgments
Step 6: Communicating results
2. Tele-Assessment
a. TECHNICIAN ASSISTED:
– Location: Exr is remote; Ct located in clinic with faci OR Ct is at home with faci
– Restrictions to Test Selection: Minimal
– Travel Requirements: Faci and/or Ct; Exr if remote site
– Social Distancing?: No
– Client Demands: Minimal
– Tech cost to clinician: Yes
– PPE and other Health requirements: Yes
b. HYBRID:
– Location: Exr and Ct in clinic but in separate rooms or barriers
– Restrictions to Test Selection: Some
– Travel Requirements: Exr and Ct
– Social Distancing?: Yes
– Client Demands: Some
– Tech cost to clinician: No
– PPE and other Health requirements: Yes
c. MODIFIED F2F:
– Location: Exr and Ct in clinic for specific tests; Remote for other procedures
– Restrictions to Test Selection: Minimal
– Travel Requirements: Exr and Ct
– Social Distancing?: Yes (if with barriers)
– Client Demands: Some
– Tech cost to clinician: No
– PPE and other Health requirements: None
d. DIRECT-TO-HOME:
– Location: Exr and Ct are both remote
– Restrictions to Test Selection: Most
– Travel Requirements: None
– Social Distancing?: Yes
– Client Demands: Significant
– Tech cost to clinician: No
– PPE and other Health requirements: None
d. Ratings Recording
2. Self-Report/Rating Scales
3. Analogue Studies: research investigation in which one or more variables are similar or analogous to the
real variable that the investigator wishes to examine. This definition is admittedly very broad, and the
term analogue study has been used in various ways. It has been used, for example, to describe
research conducted with white rats when the experimenter really wishes to learn about humans. It has
been used to describe research conducted with full-time students when the experimenter really wishes
to learn about people employed full-time in business settings. It has been used to describe research on
aggression defined as the laboratory administration of electric shock when the experimenter really
wishes to learn about real-world aggression outside the laboratory.
4. Situational Analysis Measure: procedure that allows for observation and evaluation of an individual
under a standard set of circumstances. A situational performance measure typically involves
performance of some specific task under actual or simulated conditions. The road test you took to
obtain your driver’s license was a situational performance measure that involved an evaluation of your
driving skills in a real car on a real road in real traffic. On the other hand, situational performance
measures used to assess the skills of prospective space-traveling astronauts are done in rocket
simulators in laboratories firmly planted on Mother Earth.
5. Role Play: acting an improvised or partially improvised part in a simulated situation, can be used in
teaching, therapy, and assessment.
6. Psychophysiological Methods: The search for clues to understanding and predicting human behavior
has led researchers to the study of physiological indices such as heart rate and blood pressure. These
and other indices are known to be influenced by psychological factors—hence the term
psychophysiological to describe these variables as well as the methods used to study them. Whether
these methods are properly regarded as behavioral in nature is debatable. Still, these techniques do
tend to be associated with behaviorally oriented clinicians and researchers.
7. Unobtrusive measures: type of measure quite different from any we have discussed so far is the
nonreactive or unobtrusive variety (Webb et al., 1966). In many instances, an unobtrusive measure is a
telling physical trace or record. In one study, it was garbage—literally (Cote et al., 1985). Because of
their nature, unobtrusive measures do not necessarily require the presence or cooperation of
respondents when measurements are being conducted.
C. There are some errors that may potentially occur due to behavioral observations such as:
• Reactivity – Being evaluated increases performance; also called as Hawthorne Effect
• Drift – moving away from what one has learned going to idiosyncratic definitions of behavior; this
suggests that observers should be retrained in a point of time.
• Contrast Effect – A behavioral rating may be excessively positive (or negative) because a prior rating
was excessively negative (or positive)
M. PSYCHOLOGICAL TESTS AS PSYCHOLOGOLICAL ASSESSMENT TOOLS
Psychological Tests – a standardized measuring device or procedure used to describe the ability, knowledge, skills
or attitude of the individual
• Measurement – the process of quantifying the amount or number of a particular occurrence of event,
situation, phenomenon, object or person
• Assessment – the process of synthesizing the results of measurement with reference to some norms and
standards
• Evaluation – the process of judging the worth of any occurrence of event, situation, phenomenon, object or
person which concludes with a particular decision
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
d. Q-Sort Technique: This is a comparative scale that uses a rank order procedure to sort objects based on similarity
with respect to some criterion. The important characteristic of this methodology is that it is more important to
make comparisons among different responses of a respondent than the responses between different
respondents. Therefore, it is a comparative method of scaling rather than an absolute rating scale. In this method
the respondent is given statements in a large number for describing the characteristics of a product or a large
number of brands of products.
Example: The bag given to you contain pictures of 90 magazines. Please choose 10 magazines you prefer most,
20 magazines you like, 30 magazines which you are neutral (neither like nor dislike), 20 magazines you dislike
and 10 magazines you prefer least.
Prefer Most Like Neutral Dislike Prefer Least
(10) (20) (30) (20) (10)
Strongly Strongly
Agree 10 9 8 7 6 5 4 3 2 1 Disagree
b. Itemized Rating Scale: itemized rating scale is a scale having numbers or brief descriptions associated with each
category. The categories are ordered in terms of scale position and the respondents are required to select one
of the limited numbers of categories that best describes the product, brand, company or product attribute being
rated. Itemized rating scales are widely used in marketing research. This can take the graphic, verbal or numerical
form.
c. Likert Scale: the respondents indicate their own attitudes by checking how strongly they agree or disagree with
carefully worded statements that range from very positive to very negative towards the attitudinal object.
Respondents generally choose from five alternatives (say stronglay agree, agree, neither agree nor disagree,
disagree, strongly disagree). A likert scale may include a number of items or statements. Disadvantage of Likert
scale is that it takes longer time to complete that other itemized rating scales because respondents have to read
each statement. Despite the above disadvantages, this scale has several to advantages. It is easy to construct,
administer and use.
Example: I believe that ecological questions are the most important issues facing human beings today.
1 2 3 4 5
Strongly Disagree Neutral Agree Strongly Agree
Disagree
d. Semantic Differential Scale: This is a seven-point rating scale with end points associated with bipolar labels (such
as good and bad, complex and simple) that have semantic meaning. It can be used to find whether a respondent
has a positive or negative attitude towards an object. It has been widely used in comparing brands and company
images. It has also been used to develop advertising and promotion strategies and in a new product development
study.
Example: Please indicate you attitude towards work using the scale below:
Attitude towards work
Boring : : : : : : : Interesting
Unnecessary : : : : : : : Necessary
e. Staple Scale: The staple scale was originally developed to measure the direction and intensity of an attitude
simultaneously. Modern versions of the staple scale place a single adjective as a substitute for the semantic
differential when it is difficult to create pairs of bipolar adjectives. The modified staple scale places a single
adjective in the center of an even number of numerical values.
Example: Select a plus number for words that you think describe personnel banking of a bank accurately. The
more accurately you think the word describes the bank, the larger the plus number you should choose. Select a
minus number for words you think do not describe the bank accurately. The less accurate you think the word
describes the bank, the larger the minus number you should choose.
+3 +3
+2 +2
+1 +1
Friendly Personnel Competitive Loan Rates
-1 -1
-2 -2
-3 -3
B. DESCRIPTIVE STATISTICS
1. Frequency Distributions – distribution of scores by frequency with which they occur
2. Measures of Central Tendency – a statistic that indicates the average or midmost score between the extreme scores
in a distribution
ΣX Σ(fX)
a. Mean – formula: ̅ X = N (for ungrouped distribution) ̅
X = N (for grouped distribution)
b. Median – the middle score in a distribution
c. Mode – frequently occurring score in a distribution
***Appropriate use of central tendency measure according to type of data being used:
Type of Data Measure
Nominal Data Mode
Ordinal Data Median
Interval / Ratio Data (Normal) Mean
Interval / Ratio Data (Skewed) Median
3. Measures of Variability – a statistic that describe the amount of variation in a distribution
a. Range – the difference between the highest and the lowest scores
b. Interquartile range – the difference between Q1 and Q3
c. Semi-Interquartile range – interquartile range divided by 2
d. Standard Deviation – the square root of the averaged squared deviations about the mean
4. Measures of Location
a. Percentiles – an expression of the percentage of people whose score on a test or measure falls below a particular
raw score
Number of students beaten
Formula for Percentile = Total number of students x 100
b. Quartiles – one of the three dividing points between the four quarters of a distribution, each typically labelled Q1,
Q2 and Q3
c. Deciles – divided to 10 parts
5. Skewness - a measure of the asymmetry of the probability distribution of a real-valued random variable about its
mean
a. Positive skew
– relatively few scores fall at the positive end
– reflects a very difficult type of test
b. Negative skew
– relatively few scores fall at the negative end
– reflects a very easy type of test
6. Kurtosis - the sharpness of the peak of a frequency-distribution curve.
D. INFERENTIAL STATISTICS
1. Parametric vs. Non-Parametric Tests
Parametric Test Non-Parametric Test
Requirements • Normal Distribution • Normal Distribution is not required
• Homogenous Variance • Homogenous Variance is not required
• Interval or Ratio Data • Nominal or Ordinal Data
Common Statistical • Pearson’s Correlation • Spearman’s Correlation
Tools • Independent Measures t-test • Mann-Whitney U test
• One-way, independent-measures • Kruskal-Wallis H test
ANOVA
• Paired t-test • Wilcoxon Signed-Rank test
• One-way, repeated-measures • Friedman’s test
ANOVA
2. Measures of Correlation
a. Pearson’s Product Moment Correlation – parametric test for interval data
b. Spearman Rho’s Correlation – non-parametric test for ordinal data
c. Kendall’s Coefficient of Concordance – non-parametric test for ordinal data
d. Phi Coefficient – non-parametric test for dichotomous nominal data
e. Lambda – non-parametric test for 2 groups (dependent and independent variable) of nominal data
***Correlation Ranges:
1.00 : Perfect relationship
0.75 – 0.99 : Very strong relationship
0.50 – 0.74 : Strong relationship
0.25 – 0.49 : Weak relationship
0.01 – 0.24 : Very weak relationship
0.00 : No relationship
3. Measures of Prediction
a. Biserial Correlation – predictive test for artificially dichotomized and categorical data as criterion with continuous
data as predictors
b. Point-Biserial Correlation – predictive test for genuinely dichotomized and categorical data as criterion with
continuous data as predictors
c. Tetrachoric Correlation – predictive test for dichotomous data with categorical data as criterion and categorical
data as predictors
d. Logistic Regression – a predictive test which involves one criterion that is nominal in nature with only one
predictor that is continuous
e. Multinomial Regression – a predictive test which involves one criterion that is nominal in nature with two or
more predictor that is continuous
f. Simple Linear Regression – a predictive test which involves one criterion that is continuous in nature with only
one predictor that is continuous
g. Multiple Linear Regression – a predictive test which involves one criterion that is continuous in nature with more
than one continuous predictor
h. Ordinal Regression – a predictive test which involves a criterion that is ordinal in nature with more than one
predictors that are continuous in
3. Chi-Square Test
a. Goodness of Fit – used to measure differences and involves nominal data and only one variable with 2 or more
categories
b. Test of Independence – used to measure correlation and involves nominal data and two variables with two or
more categories
4. Comparison of Two Groups
a. Paired t-test – a parametric test for paired groups with normal distribution
b. Unpaired t-test – a parametric test for unpaired groups with normal distribution
c. Wilcoxon Signed-Rank Test – a non-parametric test for paired groups with non-normal distribution
d. Mann-Whitney U test – a non-parametric test for unpaired groups with non-normal distribution
5. Comparison of Three or More Groups
a. Repeated measures ANOVA – a parametric test for matched groups with normal distribution
b. One-way/Two-Way ANOVA – a parametric test for unmatched groups with normal distribution
c. Friedman F test – a non-parametric test for matched groups with non-normal distribution
d. Kruskal-Wallis H test – a non-parametric test for unmatched groups with non-normal distribution
6. Factor Analysis
a. Principal Component Factor Analysis
b. Common Factor Analysis
o Exploratory Factor Analysis
o Confirmatory Factor Analysis
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
Benefits
Profits, gains, advantages
Ex.) more stringent hiring policy- more productive employees
Ex.) maintaining successful and academic environment of university
Utility Analysis
-a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a division about
the usefulness and/or practical value of a tool of assessment. Utility analysis: An illustration What’s the companies
goal?
• Limit the cost of selection o Don’t use FERT
• Ensure that qualified candidates are not rejected o Set a cut score that yields the lowest false negative
rate
• Ensure that all candidates selected will prove to be qualified o Lowest dales positive rate
• Ensure, to the extent possible, that qualified candidates will be selected and unqualified candidates will
be rejected o False positives are no better or worse than false negatives o Highest hit rate and lowest miss
rate
Compensatory model of selection: assumption is made that high scores on one attribute can compensate for low scores
on another attribute
3. IRT-Based Method
- based on test taker’s performance across all items on a test Some portion of test items must be correct in
order to pass the test
1. Item-mapping method: arrangement of items in a histogram, with each column in the histogram containing
items deemed to be of equivalent value
- difficulty level is set as the cut score
- involves several rounds of judgments in which experts may receive feedback regarding how their ratings
compare to ratings made by other experts. Ex. licensing exam
2. Book-Mark method
- begins with the training of experts with regard to the minimal knowledge, skills, and/or abilities that test
takers should possess in order to “pass.
- Subsequent to this training, the experts are given a book of items, with one item printed per page, such that
items are arranged in an ascending order of difficulty.
- Then, an expert places a bookmark to mark the divide which separate test takers who have acquired
minimal knowledge, skills, or abilities and those that have not. Bookmarks serve as cut score -
decided upon by the test developers.
Problems include training of experts, possible floor and ceiling effects, and the optimal length of item booklets
Other Methods
method of predictive- setting cut scores which took into account the number of positions to be filled, projections regarding
the
likelihood of offer acceptance, and the distribution of applicant scores.
-discriminant analysis- family of statistical techniques used to shed light on the relationship between certain variables
(scores on battery of tests) and two or more naturally occurring groups as persons judged to be successful at a job and
persons judged unsuccessful at a job
Saint Louis College
City of San Fernando, La Union
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
C. STANDARDIZATION
1. When to decide to standardize a test?
a. No test exists for a particular purpose
b. The existing tests for a certain purpose are not adequate for one reason or the another
2. Basic Premises of standardization
– The independent variable is the individual being tested
– The dependent variable is his behavior
– Behavior = person x situation
– In psychological testing, we make sure that it is the person factor that will ‘stand out’ and the situation
factor is controlled
– Control of extraneous variables = standardization
3. What should be standardized?
a. Test Conditions
– There should be uniformity in the testing conditions
– Physical condition
– Motivational condition
b. Test Administration Procedure
– There should be uniformity in the instructions and administration proper. Test administration includes
carefully following standard procedures so that the test is used in the manner specified by the test
developers. The test administrator should ensure that test takers work within conditions that maximize
opportunity for optimum performance. As appropriate, test takers, parents, and organizations should be
involved in the various aspects of the testing process
– Sensitivity to Disabilities: try to help the disable subject overcome his disadvantage, such as increasing
voice volume or refer to other available tests
– Desirable Procedures of Group Testing: Be care for time, clarity, physical condition (illumination,
temperature, humidity, writing surface and noise), and guess.
c. Scoring
– There should be a consistent mechanism and procedure in scoring. Accurate measurement necessitates
adequate procedures for scoring the responses of test takers. Scoring procedures should be audited as
necessary to ensure consistency and accuracy of application.
d. Interpretation
– There should be common interpretations among similar results. Many factors can impact the valid and
useful interpretations of test scores. These can be grouped into several categories including psychometric,
test taker, and contextual, as well as others.
i. Psychometric Factors: Factors such as the reliability, norms, standard error of measurement, and validity
of the instrument are important when interpreting test results. Responsible test use considers these basic
concepts and how each impacts the scores and hence the interpretation of the test results.
ii. Test Taker Factors: Factors such as the test taker’s group membership and how that membership may
impact the results of the test is a critical factor in the interpretation of test results. Specifically, the test
user should evaluate how the test taker’s gender, age, ethnicity, race, socioeconomic status, marital
status, and so forth, impact on the individual’s results.
iii. Contextual Factors: The relationship of the test to the instructional program, opportunity to learn, quality
of the educational program, work and home environment, and other factors that would assist in
understanding the test results are useful in interpreting test results. For example, if the test does not align
to curriculum standards and how those standards are taught in the classroom, the test results may not
provide useful information.
4. Tasks of test developers to ensure uniformity of procedures in test administration:
– Prepare a test manual containing the ff:
i. Materials needed (test booklets & answer sheets)
ii. Time limits
iii. Oral instructions
iv. Demonstrations/examples
v. Ways of handling querries of examinees
5. Tasks of examiners/test users/psychometricians
– Ensure that test user qualifications are strictly met (training in selection, administration, scoring and
interpretation of tests as well as the required license)
– Advance preparations
i. Familiarity with the test/s iv. Preparation of test materials
ii. Familiarity with the testing procedure v. Orient proctors (for group testing)
iii. Familiarity with the instructions
6. Standardization sample
– A random sample of the test takers used to evaluate the performance of others
– Considered a representative sample if the sample consists of individuals that are similar to the group to be
tested
D. OBJECTIVITY
1. Time-Limit Tasks – every examinee gets the same amount of time for a given task
2. Work-Limit Tasks – every examinee has to perform the same amount of work
3. Issue of Guessing
F. ITEM ANALYSIS
– Measures and evaluates the quality and appropriateness of test questions
– How well the items could measure ability/trait
1. Classical Test Theory
– Analyses are the easiest and the most widely used form of analyses
– Often called the “true-score model” which involves the true score formula:
𝑋𝑡𝑒 = 𝑟𝑥𝑥 (𝑋 − 𝑋̅) + 𝑋̅
Where:
𝑋𝑡𝑒 = True Score
𝑟𝑥𝑥 = Correlation Coefficient
𝑋 = Raw Score
𝑋̅ = Mean Score
– Assumes that a person’s test score is comprised of their “true score” plus some measurement error (X = T +
e)
2. Item-Response Theory (Latent Trait Theory)
– Sometimes referred to as “modern psychometrics”
– Latent trait models aim to look beyond that at the underlying traits which are producing the test performance
– Employs the following statistics
a. Item difficulty
– The proportion of examinees who got the item correctly
– The higher the item mean, the easier the item is for the group; the lower the item mean, the more
difficult the item is for the group
Nu + Nl
– Formula: = N
where: Nu = number of students from the upper group who answered the item correctly
Nl = number of students from the lower group who answered the item correctly
N = total number of examinees
– 0.00-0.20 : Very Difficult : Unacceptable
– 0.21-0.40 : Difficult : Acceptable
– 0.41-0.60 : Moderate : Highly Acceptable
– 0.61-0.80 : Easy : Acceptable
– 0.81-1.00 : Very Easy : Unacceptable
b. Item discrimination
– measure of how well an item is able to distinguish between examinees who are knowledgeable and
not
– how well is each item related to the trait
– The discrimination index range is between -1.00 to +1.00
– The closer the index to +1, the more effectively the item distinguishes between the two groups of
examinees
– The acceptable index is 0.30 and above
Nu − Nl
– Formula: = 1
N
2
where: Nu = number of students from the upper group who answered the item correctly
Nl = number of students from the lower group who answered the item correctly
N = total number of examinees
– 0.40-above : Very Good Item : Highly Acceptable
– 0.30-0.39 : Good Item : Acceptable
– 0.20-0.29 : Reasonably Good Item : For Revision
– 0.10-0.19 : Difficult Item : Unacceptable
– Below 0.19 : Very Difficult Item : Unacceptable
c. Item reliability index - the higher the index, the greater the test’s internal consistency
d. Item validity index - the higher the index, the greater the test’s criterion-related validity
e. Distracter Analysis
– All of the incorrect options, or distractors, should be equally distracting
– preferably, each distracter should be equally selected by a greater proportion of the lower scorers
than of the top group
f. Overall Evaluation of Test Items
DIFFICULTY LEVEL DISCRIMINATIVE POWER ITEM EVALUATION
Highly Acceptable Highly Acceptable Very Good Item
DIFFICULTY LEVEL DISCRIMINATIVE POWER ITEM EVALUATION
Highly Acceptable/ Acceptable Acceptable Good Item
Highly Acceptable/ Acceptable Unacceptable Revise the Item
Unacceptable Highly Acceptable/ Acceptable Discard the Item
Unacceptable Unacceptable Discard the Item
Saint Louis College
City of San Fernando, La Union
PSYCHOLOGICAL ASSESSMENT
(PSYCH 111/111L)
COURSE FACILITATOR:
Dr. Jose J. Pangngay, RPm, RPsy, LPT
Psychological report is an abstract of a sample of behavior of a patient or a client derived from results of
psychological tests. A vary brief sample of one’s behavior.
A psychological test report reflects a process that starts with a referral source. A psychological testing referral is usually
made when a specific problem appears in a person’s behavior. Such behavior or experiential symptoms call attention to the
fact that something disturbing has happened and a personality conflict or disorder has appeared. The behavioral difficulty
that the person displays is usually the point at which a psychologist may be called upon to utilize psychodiagnostic expertise
to clarify and localize the underlying cause of the problem.
Since the referral source can originate from different professional areas and levels of expertise, the psychologist needs
to keep in mind that the final report must be written in a manner that is understandable to the person who will be reading it.
The problem of the patient may be critical, and the referral person helping with the problem must be able to utilize the
psychologist’s input. Thus, the psychologist responsible for the testing and report must always respond to the needs of the
patient as well as to the needs of the particular referral source.
In addition, a psychological test report is a communication. Therefore, it must be written in a way that corresponds to
the reader’s level of understanding and training. The report must meet the criteria of clarity, meaningfulness, and
synthesis.
Referral to a psychologist for psychodiagnostic testing represents a profound moment in the process of help. This
referral becomes a pivotal event in the life of the person who displays the symptoms as well as to those intimately related to
the person.
Context of Referral
The reason for the referral is the symptomatic behavior that the subject displays. This may be acting-out behavior in
school, at home, or on the job; grossly bizarre behavior; or behavior reflecting anxiety conditions. The point is that either the
problematic behavior may be causing personal difficulty, or its effects may be disturbing a larger system such as the
classroom, family, or workplace. The psychologist must constantly focus on the nature and extent of the tension that is
involved in the symptom. Thus a psychological report is requested so that relevant information can be marshaled. This
information leads to the implementation of therapeutic helping procedures of further diagnostic measures.
• STYLE
The style or “flavor” of a report will be influenced primarily by the training and orientation of the psychologist. Ownby
(1987) stresses that the most important style to use in report writing is what he refers to as a “professional style.” This is
characterized by short words that are of common usage and that have precise meanings. The paragraphs should be
short and should focus on a single concept. Similar concepts should be located close to one another in the report. The
result should be a report that combines accuracy, clarity, integration, and readability.
Once these general guidelines have been taken into account, the next step is to focus on and organize the information
derived from the tests. A further general rule is that information should focus on the client’s unique method of
psychological functioning. A reader is not concerned so much with how the client is similar to the average person as in
what ways he or she is different.
A common error is psychological reports is the inclusion of generalized statements that are so vague they could apply to
the majority of the population.
• TERMINOLOGY
Several arguments have been made in determining whether to use technical or nontechnical language in psychological
reports. It might be urged that technical terminology is precise and economical, increase the credibility of the writer, and
can communicate concepts that are impossible to convey through nontechnical language. However, a number of potential
difficulties are often encouraged with the use of technical language.
One of the most frequent problems involves the varying backgrounds and levels of the person reading the report and
even among readers who have the proper background to understand technical terms; many prefer a more straightforward
presentation. In addition, technical terms also run in danger of becoming nominalisms in which, by merely naming the
phenomenon, persons develop an illusory sense of understanding more than is actually the case.
• CONTENT OVERLOAD
There are no specific rules to follow in determining how much information to include in a report. A general guideline is to
estimate how much information a reader can realistically be expected to assimilate. If too many details are given, the
information may begin to become poorly defined and vague, and therefore, lack impact or usefulness. The psychologist
should focus on and discuss only those areas that are most relevant to the purpose of the report.
• FEEDBACK
During the earlier days of psychological assessment, examiners often kept the results of psychological assessments
carefully concealed from the client. There was often an underlying belief that the results were too complex and mysterious
for the client to adequately understand. In contrast, current practices are to provide the client with clear, direct, and
accurate feedback regarding the results of an evaluation.
Level 1
There is minimal amount of any sort of interpretation. The information gathered is as simply and directly related to the outcome
decision as possible and there is minimal concern with intervening processes. Data are primarily treated in a sampling or
correlating way, never as “signs” for there is no concern with underlying constructs to explain why “input” and “output” events
are related. Little or no skilled clinical data collection or interpretation is needed. (E.g. large scale selection testing - people
are given a validated aptitude test and jobs are offered to those above a critical score and denied to those who fall below it.)
Level 2
Clinician can deductively arrive at decisions as to the further needs and treatment of the patient
Two kinds:
1. Descriptive generalizations: From the particular behaviors observed, we generalize to more inclusive, although still
largely behavioral and descriptive categories. Thus, they note, a clinician might observe instances of slow bodily
movements and excessive delays in answering questions and from this infer that the patient is “retarded motorically.”
With the further discovery that the patient eats and sleeps poorly, cries easily, reports a constant sense of futility and
discouragement and shows characteristic test behaviors, the generalization is now broadened as
“depressed.”
2. Hypothetical Construct: Assumption of an inner state which goes logically beyond description of visible behavior.
Such constructs imply causal conditions, related personality traits and behaviors and allow prediction of future events.
It is the movement from description to construction which is the sense of clinical interpretation.
Level 3
Goal is to develop a coherent and inclusive theory of the individual life (theory of the person-situation) or a working image of
the patient. The clinician attempts a full-scale exploration of the individual’s personality, psychosocial situation and
developmental history; in all, the various facets of the individual which were earlier described in the outline of the case
study. At the fullest, the output would be a psycho-biography of a sort that would make clear what the patient is, how he
came to be, how he might act under specific conditions and how he might change, particularly in terms of available clinical
interventions.
2) CLINICAL
a) personal Information
b) Referral question
c) Test administered
d) Behavioral observation (Test and Interview)
e) Test results and interpretation
f) Summary formulation
g) Diagnostic Impression
h) Recommendation