Toefl Ibt Insight s1v1
Toefl Ibt Insight s1v1
TOEFL iBT ®
Test Framework and
Test Development
VOLUME 1
TOEFL® Research Insight Series, Volume 1:
TOEFL iBT® Test Framework and Test Development
Preface
The TOEFL iBT® test is the world’s most widely respected English language assessment and used for admissions
purposes in more than 150 countries, including Australia, Canada, New Zealand, the United Kingdom, and the
United States (see test review in Alderson, 2009). Since its initial launch in 1964, the TOEFL® test has undergone
several major revisions motivated by advances in theories of language ability and changes in English teaching
practices. The most recent revision, the TOEFL iBT test, was launched in 2005. It contains a number of
innovative design features, including integrated tasks that engage multiple skills to simulate language use in
academic settings and test materials that reflect the reading, listening, speaking, and writing demands of
real-world academic environments.
In addition to the TOEFL iBT test, the TOEFL® Family of Assessments was expanded to provide to high-quality,
English proficiency assessments for a variety of academic uses and contexts. The TOEFL® Young Students
Series features the TOEFL Primary® and TOEFL Junior® tests, which are designed to help teachers and learners
of English in school settings. In addition, the TOEFL ITP® program offers colleges, universities, and others
affordable tests for placement and progress monitoring within English programs as a pathway to eventual
degree programs.
At ETS, we understand that scores from the TOEFL Family of Assessments are used to help make important
decisions about students, and we would like to keep score users and test takers up-to-date about the research
results that help assure the quality of these scores. Through the publication of the TOEFL® Research Insight
Series, we wish to communicate to the institutions and English teachers who use the TOEFL tests the strong
research and development base that underlies the TOEFL Family of Assessments and demonstrate our
continued commitment to research.
Since the 1970s, the TOEFL test has had a rigorous, productive, and far-ranging research program. But why
should test score users care about the research base for a test? In short, it is only through a rigorous program
of research that a testing company can substantiate claims about what test takers know or can do based on
their test scores, as well as provide support for the intended uses of assessments and minimize potential
negative consequences of score use. Beyond demonstrating this critical evidence of test quality, research is
also important for enabling innovations in test design and addressing the needs of test takers and test score
users. This is why ETS established a strong research base as a fundamental feature underlying the evolution of
the TOEFL Family of Assessments.
This portfolio is designed, produced, and supported by a world-class team of test developers, educational
measurement specialists, statisticians, and researchers in applied linguistics and language testing. Our test
developers have advanced degrees in fields such as English, language education, and applied linguistics. They
also possess extensive international experience, having taught English on continents around the globe. Our
research, measurement, and statistics teams include some of the world’s most distinguished scientists and
internationally recognized leaders in diverse areas such as test validity, language learning and assessment,
and educational measurement.
2 TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development
To date, more than 300 peer-reviewed TOEFL Family of Assessments research reports, technical reports, and
monographs have been published by ETS, and many more studies on the TOEFL tests have appeared in
academic journals and book volumes. In addition, over 20 TOEFL test-related research projects are conducted
by ETS’s Research & Development staff each year and the TOEFL Committee of Examiners — comprising
language learning and testing experts from the global academic community — funds an annual program of
TOEFL family of Assessments research by independent external researchers from all over the world.
The purpose of the TOEFL Research Insight Series is to provide a comprehensive, yet user-friendly account of
the essential concepts, procedures, and research results that assure the quality of scores for all products
in the TOEFL Family of Assessments. Topics covered in these volumes feature issues of core interest to test
users, including how tests were designed; evidence for the reliability, validity and fairness of test scores; and
research-based recommendations for best practices.
The close collaboration with TOEFL score users, English language learning and teaching experts, and
university scholars in the design of all TOEFL tests has been a cornerstone to their success and worldwide
acceptance. Therefore, through this publication, we hope to foster an ever-stronger connection with our test
users by sharing the rigorous measurement and research base, as well as solid test development, that
continues to help ensure the quality of the TOEFL Family of Assessments.
The following individuals contributed to the second edition (2018) and the third edition (2020) by providing careful reviews and revisions as well as editorial suggestions (in alphabetical order): Terry
Axe, Ian Blood, Michelle Hampton, Marcel Ionescu, Susan Nissan, Spiros Papageorgiou, Eileen Tyson, Jennifer Wain, and Yuan Wang.
TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development 3
TOEFL iBT Test Framework and Test Development
The TOEFL iBT test design is the result of years of research—both investigation of the language-related
knowledge, skills, and abilities (KSAs) that English language learners need to succeed in academic environments
where English is the medium of instruction and research to identify the most effective methods of assessing
these KSAs (described in Chapelle, Enright, & Jamieson, 2008). Leading experts from both inside and outside ETS
in the fields of educational measurement, language testing, and language teaching contributed to the design of
the TOEFL iBT test using an assessment design methodology known as evidence-centered design (ECD), originally
developed at ETS by Mislevy, Steinberg, and Almond (2003) and now applied in a wide range of testing contexts
across the globe. ECD is a process that requires explicit definitions of measurement claims and close examination
and questioning of the strength of the evidence that supports them. As part of the ECD process, a team of ETS
assessment specialists and statisticians reviewed a series of working papers defining the language use domains
of the TOEFL iBT test along with evidence gathered through developmental research, resulting in the
TOEFL iBT test framework (Pearlman, 2008). This framework established the test’s format, structure, and content.
Test Structure
The TOEFL iBT test is administered via computer from a secure, worldwide, internet-based testing network. Some
tasks on the test require the use of two or more language skills. Test takers wear noise-reducing headphones and
speak into a microphone to record their responses to Speaking tasks and type their responses to Writing tasks.
The spoken and written responses are digitally recorded and sent to the ETS online scoring network (for details,
see Scoring the Speaking and Writing Sections below).
As Table 1 illustrates, each test form includes four sections: Reading, Listening, Speaking, and Writing. Each
section is scored on a 0–30 scale, resulting in a total score of 120. The test takes about 3 hours to complete.
4 TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development
Test Content
Reading
The Reading section measures test takers’ ability to understand university-level academic texts. TOEFL iBT
test takers read three or four passages of approximately 700 words each and answer 10 questions about each
passage. The passages represent a variety of academic areas and contain all of the information needed to
answer the questions; they require no special background knowledge. The questions are intended to assess
the test takers’ ability to comprehend factual information, infer information from the passage, and understand
vocabulary and the author’s purpose. Other types of questions assess the test taker’s ability to recognize
relationships among facts and ideas in different parts of a passage.
Listening
The Listening section measures test takers’ ability to understand spoken English in an academic setting. Test
takers listen to three or four lectures representing different academic areas, each about five minutes long, and
two or three conversations representing typical campus interactions with faculty, staff, and fellow students,
each about three minutes long. Each listening passage is associated with a set of questions intended to assess
test takers’ ability to comprehend main ideas or important details, recognize a speaker’s attitude or function,
understand the organization of the information presented and relationships between the ideas presented,
and make inferences or connections among pieces of information.
Speaking
The Speaking section measures test takers’ ability to use spoken English effectively in educational environments,
both inside and outside the classroom. There are four tasks in the Speaking section, one “independent” and three
“integrated” tasks. The independent task requires the test taker to draw on personal experiences and opinions
to answer. The integrated tasks require the test taker to first listen to some information or listen to and read about
some information and then respond. The three integrated tasks are as follows:
• Read/Listen/Speak (campus situation). Test takers read a short passage communicating a typical campus
situation or policy and then listen to a conversation in which a speaker expresses an opinion about the
situation or policy. Test takers are then asked to give an oral summary of the speaker’s opinion. A full
response will require the test taker to combine and convey key information from both the reading and
the listening input.
• Read/Listen/Speak (academic course topic). Test takers read a passage that broadly defines a term,
process, or idea from an academic subject. They then listen to a lecture that provides specific
examples to illustrate the term, process, or idea expressed in the reading passage. Finally, they are
asked to explain how the illustration presented in the lecture supports the broader concept defined in
the reading. A full response will require test takers to combine and convey key information from both
the reading and the listening input.
• Listen/Speak (academic course topic). Test takers listen to an excerpt from a lecture that explains a term
or concept (often by explaining two aspects or perspectives) and gives concrete examples to illustrate
it. Test takers must then demonstrate understanding of the concept by providing a brief oral summary
of the explanation and the related examples.
TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development 5
Writing
The Writing section measures test takers’ ability to write in an academic environment and includes two tasks—
one independent and one integrated.
For both the Speaking and the Writing sections, test developers carefully design integrated tasks to ensure
that a successful response will consider information from both the listening and reading materials.
ETS carefully selects and trains outside item writers (who have experience teaching English as a second or
foreign language or other academic content areas) to develop an initial draft of test questions. ETS considers
item writers’ experience and backgrounds so that the pool of item writers reflects, to the greatest degree
possible, the diversity of the TOEFL iBT test’s international test-taking population.
Item Writing
To ensure that test content is as comparable as possible from one TOEFL iBT administration to another, test
developers follow detailed guidelines when selecting material for reading passages and lectures, and when
writing test questions. They consider whether the passages or lectures (and the questions based on them):
6 TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development
• align with ETS fairness guidelines (discussed below); and
These considerations are fundamental to the TOEFL iBT test development process.
Content Review
At this stage, assessment specialists conduct multiple reviews of stimuli and items for both language and
content, considering questions such as these:
• Is the language in the test materials clear? Is it accessible to a nonnative speaker of English who is
preparing to study or is studying at a university where English is a medium of instruction?
• Is the content of the stimulus accessible to nonnative speakers who lack specialized knowledge in a
given field (e.g., geology, business, or literature)?
For multiple-choice questions, reviewers also consider factors such as the following:
For constructed-response items (tasks in the Speaking and Writing sections), the process is similar but not
identical. Reviewers tend to focus on accessibility, clarity in the language used, and on how well they believe
the particular Speaking or Writing item will generate a fair and scorable response. It is also essential that
reviewers judge each Speaking or Writing item to be comparable with others in terms of difficulty. Expert
judgment, then, plays a major role in deciding whether a Speaking or Writing item is acceptable and can be
included in an operational test (See also Item Tryouts for Speaking and Writing below).
Fairness Review
The ETS Standards for Quality and Fairness (ETS, 2014a) mandate fairness reviews. This fairness review must take
place before using materials in a test.
TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development 7
All assessment specialists undergo fairness training—in addition to item writing training—soon after their
arrival at ETS. As part of their training, item writers become familiar with the ETS Guidelines for Fairness Review of
Assessments (ETS, 2016a) and the ETS International Principles for Fairness Review of Assessments (ETS, 2016b) and
use them when developing and reviewing test stimuli and items. Fairness issues are thus considered at each
stage of the development process.
In addition, specially trained and periodically calibrated fairness reviewers conduct a separate and
independent review of all TOEFL test materials. TOEFL assessment specialists may not perform this official
fairness review of TOEFL test materials; the official fairness reviewer is typically an assessment specialist who
works on other ETS tests. In this way, the fairness review is more objective and the reviewer brings no sense of
ownership of the test to the review. When fairness reviewers find unacceptable content in the test materials,
they issue a fairness challenge. The content reviewer assigned to the review step immediately after the fairness
reviewer must resolve the challenge to the satisfaction of both reviewers. For rare cases in which the reviewers
cannot reach agreement, a panel that includes the content and fairness reviewers adjudicates the issues at
hand and comes to a resolution.
Editorial Review
All TOEFL test materials receive an editorial review. The purpose of this review is to ensure that language in
the test materials (e.g., usage, punctuation, spelling, style, and format) is as clear, concise, and consistent as
possible. Editors ensure that established ETS test style is followed. In addition, when warranted, editors check
facts in stimuli for accuracy or to ensure that the stated facts are currently true; in areas such as physics or
geography, for example, advances in current knowledge occur periodically.
8 TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development
target population. Assessment specialists review and evaluate spoken or written responses to these tryout
questions. These specialists use expert judgment to determine which prompts are likely to elicit scorable responses
from test takers across the range of proficiency levels; these viable prompts are the ones that appear in operational
test forms.
• having groups of experts rank order speaking or writing samples to identify features that differentiate
performance at high, middle, and low levels of proficiency;
Rubric developers created 4-point rubrics for the Speaking section and 5-point rubrics for the Writing section; all of
the rubrics are holistic, meaning that they require the rater to consider the overall quality of the response.
Scoring Processes
Constructed-response scoring presents challenges that multiple-choice testing does not. Assessment specialists
and psychometricians—experts in the design and statistical quality of standardized tests—are fundamentally
concerned with the difficulty of constructed-response items as well as raters’ scoring consistency. ETS supports
scoring quality for the TOEFL test Speaking and Writing sections in a number of ways:
• The scoring process is centralized, and it is performed separately from the test center administration
in order to ensure that test data are not compromised. Through centralized, separate scoring, each
scoring step is closely monitored to ensure its security, fairness, and integrity.
TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development 9
• ETS uses its patented Online Network for Evaluation to distribute test takers’ responses to raters,
record ratings, and monitor rating quality constantly.
• Raters must be qualified. In general, they must be experienced teachers, ESL or EFL specialists, or have
other relevant experience. In addition to teaching experience, ETS prefers raters who have master’s
degrees and experience assessing spoken and written language.
• If they have the formal qualifications, raters are then trained. ETS trains raters using a web-based
system. Following their training, raters must pass a certification test in order to be eligible to score.
To assure reliability of constructed-response scoring, ETS monitors raters continuously as they score.
• Nonnative speakers of English may be raters, and, in fact, contribute a much needed perspective to
the rater pool, but they must pass the same certification test as native-speaking raters.
At the beginning of each rating session, raters must pass a calibration test for the specific task type they
will rate before they proceed to operational scoring. Scoring leaders—the scoring session supervisors—
monitor raters in real time, throughout the day. These supervisors also regularly work as raters on different
scoring shifts and are subject to the same monitoring. No rater, no matter how experienced, scores without
supervision. ETS assessment specialists also monitor rating quality and communicate with scoring leaders
during rating sessions.
For each administration, ETS’s online scoring network sends Speaking and Writing responses to multiple
independent raters for scoring. Each test taker’s responses are scored by more than a single rater.
The e-rater® automated scoring system (https://www.ets.org/erater/about) is a second rater on
TOEFL test Writing tasks and the SpeechRater® automated scoring engine is a second rater on TOEFL test
Speaking tasks. When a discrepancy between the human rater and automated scoring engine arises, it is
resolved by a second human rater. Information about the e-rater and SpeechRater engines can be found at
https://www.ets.org/accelerate/ai-portfolio/. Details on the use of these engines for scoring TOEFL iBT
Speaking and Writing tasks can be found in Volume 3 of the TOEFL Research Insight Series, Reliability and
Comparability of TOEFL iBT Scores (ETS, 2020).
10 TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development
Ongoing Oversight
Ongoing oversight is essential to the TOEFL program. As with all ETS tests, the TOEFL test undergoes an
internal audit every 3 years. The auditors report directly to the ETS Board of Trustees.
The TOEFL Committee of Examiners (COE) consists of 12 individuals from around the world, each of whom has
achieved professional recognition in an academic field related to ESL or EFL. The COE provides guidance and
oversight for research and development related to the TOEFL test.
The TOEFL Board consists of renowned professionals involved in international education, including admissions
officers, graduate deans, international student advisors, and specialists in the fields of language testing,
teaching, learning, and research. The TOEFL Board advises on the policies under which ETS administers the
TOEFL test.
References
Alderson, J. C. Test review: Test of English as a Foreign Language™: Internet-based Test (TOEFL iBT®). Language
Testing, 26(4), 621-631. doi:10.1177/0265532209346371
Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the Test of English as a
Foreign Language. New York, NY: Routledge.
Educational Testing Service. (2014a). ETS standards for quality and fairness.
Educational Testing Service. (2014b). TOEFL iBT scoring guides (rubrics) for speaking responses.
Educational Testing Service. (2014c). TOEFL iBT scoring guides (rubrics) for writing responses.
Educational Testing Service. (2016b). ETS international principles for fairness of assessments.
Educational Testing Service (2020). Reliability and comparability of TOEFL iBT scores. TOEFL Research Insight
Series (Vol. 3, 3rd ed.).
Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., & Taylor, C. (1999). TOEFL 2000 framework: A working paper
(TOEFL Monograph No. 16). Princeton, NJ: Educational Testing Service.
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessment.
Measurement: Interdisciplinary Research and Perspectives, 1, 3–62. https://doi.org/10.1207/
S15366359MEA0101_02
Pearlman, M. (2008). Finalizing the test blueprint. In C. A. Chapelle, M. K. Enright, & J. M. Jamieson (Eds.),
Building a validity argument for the Test of English as a Foreign Language (pp. 227–258). New York, NY:
Routledge.
TOEFL® Research Insight Series, Volume 1: TOEFL iBT® Test Framework and Test Development 11
Copyright © 2020 by Educational Testing Service. All rights reserved. ETS, the ETS logo, E-RATER, SPEECHRATER, TOEFL, TOEFL iBT, TOEFL ITP, TOEFL JUNIOR and TOEFL PRIMARY are registered trademarks