Psyass Midterm Notes (2)
Psyass Midterm Notes (2)
Errors – collective influence of all the factors on a test score or Frequency Distribution Graphs
measurement beyond those specifically measured by the test or Bar Graphs – represent data using rectangular bars of uniform
measurement. width along with equal spacing between the rectangular bars.
Histograms – graphical representation of data using rectangular
A psychology professional often deal with two types of data:
bars of different heights. In a histogram, there is no space
Quantitative Data – numerical data like age, weight, grades, etc.
between the rectangular bars.
Qualitative Data – categorical data like name, section, religion,
Pie Chart – visually displays data in a circular char. It records
subjects, etc.
data in a circular manner and then it is further divided into
sectors that show a particular part of data out of the whole part.
Data can be used in various ways. There is a need to create a levelling to
Frequency Polygon – drawn by joining the mid-points of the
be able to identify how to treat the data.
bars in a histogram.
Remember that as counsellors, data’s level of measurement dictates the
Measures of Central Tendency
appropriate statistical tool to use in order to arrive at meaningful
1. Mean
interpretation.
❖ Most commonly used measure of the center of data.
Stanley Smith Stevens – developed the four scales (ratio, interval, ❖ Also referred to as “arithmetic average”
nominal, ordinal) of measurement in 1946. ❖ Can be influenced by extreme scores
Scales/Levels of Measurement To find the mean, add all the observations then divide the sum by the total
Nominal – non-numeric group labels that do not reflect number of observations.
quantitative information. Simplest form of measurement and
involve classification based on one or more distinguishing Mean
S Population Mean
characteristics.
∑𝐱 ∑𝐱
Ordinal – reflects order, ranking and hierarchy. No absolute 𝐱̅ = 𝛍=
𝐧 𝐧
zero point.
Interval – no absolute zero point. Equal intervals between
numbers. Each unit on the scale is exactly equal to any other where ∑x is sum of all data values
Valid Nominal Ordinal Interval Ratio Ex. The grades of Student A in 5 subjects are 78, 88, 89, 90 and 95. What
Ex. The ages of the patients of Hospital X in the pediatric ward are:
What is the “frequently” used scale of measurement in Psychology?
2, 5, 5, 6, 8, 9, 9, 10
Describing Data: 6+8= 14/2 = 7 (the median is 7)
o Distribution – set of test scores arrayed for recording or study.
3. Mode – observation which appears the most number of times in The Normal Curve
a distribution. Can be unimodal, bimodal, polymodal, or no o Standard Deviation – measure of variability equal to the square
mode. root of the average squared deviations about the mean. It is
equal to the square root of the variance.
Ex. What is the model of the test scores of the new students in a Statistics
o Variance – equal to the arithmetic mean of the squares of the
test?
difference between the scores in a distribution and their mean.
12, 13, 12, 11, 10, 20, 24, 25, 10, 22, 20, 13, 16, 18, 20, 20, 20, 20
❖ Squares from the deviation of the mean
Mode: 20
∑ 𝐱𝟐
𝐬𝟐 = − 𝐱̅ 𝟐
𝐧
Measures of Variability
o Variability – indication of how scores in a distribution are 85+100+90+95+80 450
SD = = = 90 (mean)
scattered or dispersed. 5 5
o Range – the range of a distribution is equal to the difference SD = 85-90 = (-5)² = 25
between the highest and the lowest scores.
100-90 = (10)² = 100
❖ The range is based entirely on the values of the
lowest and highest scores; one extreme score (if it 90-90 = (0)² = 0
Non-Normal Distribution
o Skewness – distribution’s lack of symmetry.
Positively skewed examination results may indicate that the test was too
difficult. More items that were easier would have been desirable in order to
better discriminate at the lower end of the distribution of test scores. A
Quartile – specific point distribution has a negative skew when relatively few of the scores fall at
Quarter - interval the low end of the distribution. Negatively skewed examination results may
indicate that the test was too easy.
o Average Deviation (AD) – described the amount of variability in
a distribution.
|𝐱|
Formula: 𝐀𝐃 =
𝐧
Ex. Solve for the average deviation of: 85 100 90 95 80
85+100+90+95+80 450
AD = = = 90 (mean)
5 5
AD = 85-90 = -5 o Kurtosis – flatness or peakedness of the distribution.
100-90 = 10
90-90 = 0
95 -90 = 15
80-90 = -10
AD= 30/5 = 6
o The Normal Curve – bell-shaped, smooth, mathematically
How to solve: defined curve that is highest at its center.
✓ Get the arithmetic mean (add all the scores divided by the n)
✓ Calculate the deviation from the mean. After calculating the
mean, you can calculate the deviation from the mean for each
value in the data set.
✓ Calculate the sum of all deviations.
✓ Calculate the average deviation.
±68.26% s 2.2 2.2
SEM = = = = 0.30 = 35.4 − 36.6 inches
±95.44% √n 55 7.42
±99.72%
±99.98% Your study is all about the mean academic performance of all grade 11
students in region 6. With initial data gathering, you have a mean of 84.8
The Descriptive use of Normal Curve (Gaussian Curve) and a standard deviation of 3 from 93 students. Compute the SEM and
o Normalization – transformation of scores so that they can provide the confidence interval 99%
assume the same meaning and can relatively be compared. s 3 3
SEM = = = = 0.31 = 83.66 − 86.04 inches
√n √93 9.64
The Inferential Use of Normal Curve
o Estimating Population parameter – valid and convenient estimate
of the characteristic of the population from statistic sample.
STANDARD SCORES
o Testing Hypothesis About Differences – basis for rejecting or not
rejecting the null hypothesis formulated.
Standard Score – raw score that has been converted from one scale to
o Inferences - deduction
another scale, where the latter scale has some arbitrarily set mean and
Estimating Population Parameter standard deviation.
Ex. Given for example that you want to know the average height of all
Why convert raw scores?
Filipinos. You have acquired a sample data from 50 Filipinos and found out
o Easy interpretations
that the average height is 64 inches and a standard deviation of 4 inches.
o Position of a testtaker’s performance relative to other testtakers
o Standard Error (SE) – lies on the premise that sample statistic is readily apparent.
can be used to estimate the population parameter.
In converting raw scores to standard scores, two different transfromations
o Given the sample mean, we can now estimate the possible
can take place:
location of the population mean by using the formula of
Linear Transformation – one that retains a direct numerical
Standard Error of the Mean (SEM).
relationship to the original raw score.
𝐬
𝐒𝐄𝐌 = Non-Linear Transformation – required when the data under
√𝐧 consideration are not normally distributed yet comparisons with
Where: s – standard deviation, n – number of cases normal distributions need to be made.
4 4
SEM = = = 0.566 or 0.57 Z-Score – results from the conversion of a raw score into a number
√50 7.07
indicating how many standard deviation units the raw score is below or
The SEM is synonymous to a standard deviation that sets the confidence
above the mean of the distribution.
interval.
𝐗 − 𝐱̅
Given that your SEM is 0.566, which by nature a standard deviation, would 𝐙=
𝐬
be now ±0.566.
Where:
The sample mean was 64 inches. To create, the confidence interval which
x – raw score from a test
will pave the way for the population parameter estimation, it would be 64
x̅ – mean score of the class
±0.566.
s – standard deviation
In other words, we are 68% sure that the mean height of Filipinos is perform in the test?
The correlation between two variables can be positive and negative. when the predictors are continuous variables while the criterion
is also a continuous variable.
There are several tools that would aid us in measuring correlation: ❖ For example, the predictors are interview and IQ
Pearson r – statistical tool of choice when the variables are scores, and the criterion is GPA.
continuous. (Pearson correlation coefficient or Pearson product-
moment coefficient of correlation) Karl Pearson
RELIABILITY AND VALIDITY OF TESTS
Spearman’s Rho – statistical tool of choice when the
respondents are few, and scores are being viewed in the ordinal
level of measurement. (Rank-order correlation coefficient or Z = 3 = T = 30
Rank-difference correlation coefficient) Charles Spearman IQ = 85 < T = 60
T = 75 > IQ = 130
Sometimes we also correlate variables which appear to be dichotomous, Z = -1.5 > IQ = 85
or has two levels only.
Assumptions of Testing and Assessment:
1. Psychological Traits and States Exist
2. Psychological Traits and States Can Be Quantified and
Measured
3. Test-Related Behavior Predicts Non-Test Related Behavior
4. Tests and Other Measurement Techniques Have Strengths and ❖ Errors associated with split half reliability is item
Weaknesses sampling error.
5. Various Sources of Errors Are Part of the Assessment Process
4. Inter-Item Consistency – refers to the degree of correlation
6. Testing and Assessment Can Be Conducted in a Fair and
among all the items on a scale. A single administration of a
Unbiased Manner
single form of a test.
7. Testing and Assessment Benefit Society
❖ The issue on test homogeneity or heterogeneity
should be carefully considered.
KR-20 formula is used to solve for the reliability coefficient of a test which
contains dichotomous items.
In order for a test to be considered “good”, there are technical qualities that
In the trinitarian view, there are only three major types of validity.
it should possess.
Construct
The first two important technical qualities are validity and reliability.
20 .42 o Index of utility can tell us something about the practical value of
the information derived from scores on the test.
25 .37
o Test scores are said to have utility if their use in a particular
30 .33
situation helps us to make better decisions—better, that is, in
35 .31
the sense of being more cost-effective.
40 .29
Source: Lawshe (1975) Costs of Tests
A test has good concurrent validity if scores from a new test agree with the
scores from a well-established test.
𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐡𝐢𝐫𝐞𝐬
𝐬𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 𝐫𝐚𝐭𝐢𝐨 =
𝐧𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐚𝐩𝐩𝐥𝐢𝐜𝐚𝐧𝐭𝐬
The lower the selection ratio, the greater the potential usefulness of the
test.
Taylor-Russell Tables
o provide an estimate of the percentage of total new hires who
will be successful employees if a test is adopted (organizational
success)
o A series of tables based on the selection ratio, base rate, and
test validity that yield information about the percentage of future
employees who will be successful if a particular test is used.
Determining Proportion of Correct Decisions
Lawshe Tables
o A utility method that compares the percentage of times a
o Tables that use the base rate, test validity, and applicant
selection decision was accurate with the percentage of
percentile on a test to determine the probability of future
successful employees.
success for that applicant.
To resulting number represents the percentage of time that we expect to 𝐮𝐭𝐢𝐥𝐢𝐭𝐲 𝐠𝐚𝐢𝐧 = (𝐍)(𝐓) (𝐫𝐱𝐲 )(𝐒𝐃𝐲 )(𝐳̅𝐦 ) − (𝐍)(𝐂)
be accurate in making a selection decision in the future. To determine,
whether this is an improvement, we use the following formula: To use this formula, five items of information must be known:
Number of employees hired per year (n) – this number is easy
𝐏𝐨𝐢𝐧𝐭𝐬 𝐢𝐧 𝐪𝐮𝐚𝐝𝐫𝐚𝐧𝐭𝐬 𝐈 𝐚𝐧𝐝 𝐈𝐈
to determine: It is simply the number of employees who are
𝐓𝐨𝐭𝐚𝐥 𝐩𝐨𝐢𝐧𝐭𝐬 𝐢𝐧 𝐚𝐥𝐥 𝐪𝐮𝐚𝐝𝐫𝐚𝐧𝐭𝐬
hired for a given position in a year.
There are 5 data points in quadrant 1, 10 in quadrant II, 4 in quadrant III, Average tenure (t) – the average amount of time that
and 11 in quadrant IV. The percentage of time we expect to be accurate iin employees in the position tend to stay with the company. The
the future would be number is computed by using information from company
records to identify the time that each employee in that position
II + IV 10 + 11 21 stayed with the company. The number of years of tenure for
= = = 0.70
I + II + III + IV 5 + 10 + 4 + 11 30
each employee is then summed and divided by the total number
of employees.
To compare this figure with the test we were previously using to select
Test validity (r) – this figure is the criterion validity coefficient
employees, we compute the satisfactory performance baseline.
that was obtained through either a validity study or validity
I + II 5 + 10 15 generalization.
= = = 0.50
I + II + III + IV 5 + 10 + 4 + 11 30 Standard deviation of performance in dollars (SDy) – for many
years, this number was difficult to compute. Research has
Using the new test would result in a 40% increase in selection accuracy
shown, however, that for jobs in which performance is normally
[.70 - .50 = .20 divided by 5.0 = 0.04] over the selection methods
distributed, a good estimate of the difference in performance
previously used.
between an average and a good worker (one standard deviation
away in performance) is 40% of the employee’s annual salary
Naylor-Shine Tables
(Hunter & Schimdt, 1982). The 40% rule yields results similar to
o Entails obtaining the difference between the means of the
more complicated methods and it preferred by managers
selected and unselected groups to derive an index of what the
(Hazer & Highhouse, 1997). To obtain this, the total salaries of
test (or some other tool of assessment) is adding to already
current employees in the position in question should be
established procedures.
averaged.
o Determines the increase in average score on some criterion
Mean standardized predictor score of selected applicants (m) –
measure.
this number is obtained in one of two ways. The first method is
to obtain the average score on the selection test for both the both minorities and nonminorities, but predicts
applicants who are hired and the applicants who are not hired. significantly better for one of the two groups.
The average test score of the nonhired applicants is subtracted
from the average test score of the hired applicants. This
TEST DEVEOPMENT
difference is divided by the standard deviation of all the test
scores.
The Five Stages in Test Development
Considerations Test Conceptualization
Pool of applicants Test Construction
Complexity of job Test Tryout
Cut Scores – reference point derived as a result of a judgement Item Analysis
and used to divide a set of data into a classifications Test Revision
considerations.(norm-related score) The challenge can always be found in the beginning. To guide us
Fixed Cut Score - reference point in the distribution that conceptualizing the test that we are making, we should pay attention to
minimum level of proficiency required to be included. (absolute ✓ What is the test designed to measure?
Multiple Cut Score - two or more cut scores with reference to ✓ Is there a need for this test?
one predictor for categorizing testtakers. ✓ Who will use and take this test?
✓ What will the test cover?
Methods for Setting Cut Scores ✓ What is the ideal format for the test?
Angoff Method
o Presence or absence of a particular trait, attribute or ability. Search for Content Domain
o Provides an estimate on how testatkers with the least minimal o Grounded theory (ask people informally and in general)
competence should answer the items correctly . o Pattern analysis (key informants)
o The judgement of the experts are averaged to yield the cut - Interview
5-point scales
Do you agree or disagree with each of the following:
Satisfaction Likelihood Level of concern
a. All people should have the right to decide whether they wish to end their
1. Very dissatisfied 1. Very unlikely 1. Very unconcerned
lives.
2. Dissatisfied 2. Unlikely 2. Unconcerned
b. People who are terminally ill and in pain should have the option to have
3. Neither 3. Neutral 3. Neutral
a doctor assist them in ending their lives.
dissatisfied 4. Likely 4. Concerned
4. Satisfied 5. Very likely 5. Very concerned c. People should have the option to sign away the use of artificial life-
4. Agree 5. Always 4. Aware 2. Always state it in a first person point of view to make it more
5. Strongly agree 5. Very aware personal. Example: I always feel excited about new toys.
Familiarity Quality Importance 3. Watch your language.
1. Very unfamiliar 1. Very poor 1. Very unimportant 4. Keep your statements specific.
2. Unfamiliar 2. Poor 2. Unimportant 5. Always include negatively worded items to test the consistency
3. Somewhat 3. Acceptable 3. Neutral of responses.
familiar 4. Good 4. Important 6. Include synonymous items.
4. Familiar 5. Very good 5. Very important
5. Very familiar Item-Format
o In constructing test items, we should also take into
consideration the format of the item you will be utilizing.
Semantic Differential Scale – a survey or questionnaire rating scale that
o Item format could selected-response or constructed response.
asks people to rate a product, company, brand, or any ‘entity’ within the
➢ Selected Response format requires testtakers to select
frames of a multi-point rating option.
a response from a set of alternative responses.
Examples: Multiple Choice, True or False, Matching Essay – a test item that requires the testtaker to respond to a question by
Type writing a composition, typically one that demonstrates recall of facts,
➢ Constructed Response format requires testtakers to understanding, analysis, and/or interpretation.
supply or to create the correct answer, not merely to
Writings Items for Computer Administration
select it.
Two advantages of using digital media:
Examples: Essay, Short Answer Discussion, and
Item Bank – ability to store items
Identification
item Branching – ability to individualize
o The use of selected-response format can mostly be seen in
Ceiling Effects – diminished utility of an assessment tool for distinguishing
ability tests, where there are right or wrong answers. However,
testtakers at the high end of the ability, trait, or other attribute being
personality tests are also using this response format with a
measured.
different way of interpreting them.
o The three types of selected-response formats are multiple
Scoring Items
choice, matching, and true/false.
Cumulative Model – th4e higher the score on the test, the
higher the testtaker is on the ability, trait, or other characteristic
Multiple-Choice Format
that the test purports to measure.
A multiple-choice format has three elements:
Class Scoring (Category Scoring) – testtaker responses earn
1. A stem
credit toward placement in a particular class or category with
2. A correct alternative
their testtakers whose pattern of responses is presumably
3. Several distractors
similar in some way.
A psychological test, an interview, and a case study are: ----- Stem Ipsative Scoring – comparing a testtaker’s score on one scale
a. Psychological assessment tools ----- Correct Alternative within a test to another scale within that same test.
b. Standardized behavioral samples ----- Several Distractors
Test Tryout – the act of trying out items on people who are similar in
c. Reliable assessment instruments ----- Several Distractors
critical respects to the people for whom the test was designed.
d. Theory-linked measures ----- Several Distractors
The most critical question is on how many people on whom the test should
A good multiple-choice item has this following characteristics:
be tried out?
a. Has one correct alternative
b. Has grammatically parallel alternatives
The informal rule of thumb says that there should be no less than 5
c. Has alternatives of similar length
subjects, and a maximum of 10 subjects for each item on a test.
d. Has alternatives that fit grammatically with the stem
e. Includes as much of the item as possible in the stem to avoid The more number of participants there is, the lesser the chance that the
necessary repetition test will be psychometrically unsound.
f. Avoids ridiculous distractors
The attempt to tryout constructed test items should replicate the ideal
Matching Type conditions with which the standardized test will be administered.
o An item format consists of premises and responses.
What is a good item?
o Premises on the left column, responses on the right column.
A good item is an item that passed the 4-way test.
Both of them should be homogenous.
Test for validity
o Matching type is very susceptible to guessing. In order to avoid
Test for reliability
it, the number of responses should be more than the number of
Test for item difficulty
premises.
Test for item discrimination
True/False Format
Item Analysis – a general term for a set of methods used to evaluate the
o Also called as the binary-choice item, for it only includes 2
items. Generally, this refers to the assessment of Item Difficulty and Item
choices.
Discrimination.
o A good true/false item contains only a single idea, is not
excessively long, and is not subject to debate. Item Difficulty – is defined as the number of people who get a particular
item correct.
Completion Item
o Also called as short-answer item, or popularly known as
For example, if 84% of the students taking a particular test get item 1
identification.
correct, then the item difficulty index for that item is 0.84.
o A good completion item requires the examinee to provide a
word phrase that completes a sentence. How do we know then if that item is a good item?
o A good completion item should be worded in a way that the
We make use of the optimal item difficulty index.
correct answer is specific.
According to experts, the ideal item difficulty index that an item should
possess, should be from 0.3 up to the optimal item difficulty index