Skip to main content
Skip to footer

TOEFL RESEARCH

Evaluation of English Language Qualifications: How TOEFL iBT® Meets Key English Assessment Standards

May 18, 2026

Validity by design

In the UK, BUILA (the British Universities’ International Liaison Association) and BALEAP (the British Association of Lecturers in English for Academic Purpose) are two professional organizations that collaborate on issues related to international student recruitment and the use of English in academic settings.

Recently, BUILA, which represents UK universities’ international offices, and BALEAP, a group that supports teachers and researchers, joined forces to offer a set of recommendations on measuring the effectiveness of a test of academic English.

This publication, English Language Good Practices Guide: Testing Qualifications and English for Academic Purposes in Higher Education, offers practical guidance for universities looking to ensure that international students have sufficient academic English skills to succeed in modern classrooms.

Section 1 of the Guide, “Evaluation of English Language Qualifications” raises a handful of key questions on overall test quality that higher education institutions should consider before accepting an English test. In this article, we share how TOEFL iBT addresses each key question raised by this valuable BUILA-BALEAP guidance.

Does the test or qualification content reflect the linguistic and communicative demands of academic or professional contexts, rather than general or everyday English?

TOEFL iBT draws on more than six decades of English assessment research to measure the core communication abilities students must use in today’s academic environments. Examples of how TOEFL’s content reflects the linguistic and communication demands of modern academic and professional contexts include: 

  • Reading passages relevant in academic settings, such as textbooks, newspapers, and magazines
  • A sampling of website articles and social media posts that offer a relevant framework to test comprehension of implied meanings, opinions, and other pragmatic aspects of communication that are vital in today’s classrooms
  • Academic talks and lectures (monologic input), as well as group interactions (dialogic input) relevant in dynamic classroom environments and experiential learning settings
  • Written responses for common situations such as writing an email to a professor or colleague, as well as writing for an academic online discussion (requiring the synthesis of input from both a professor and fellow students)
  • An interview with a simulated interlocutor – within the context of an academic interaction
Are the tasks representative of real-life academic or professional communication that students will encounter?

Every individual task on TOEFL iBT works in concert with the other tasks in each section to address a variety of relevant skills, allowing us to obtain precise and useful measurement of reading, listening, writing, and speaking language ability in each test section.

Independent research has shown, to use one example, that the Speaking tasks on TOEFL iBT are very good indicators of performance in typical types of academic speaking. This, of course, remains the core goal of a high-stakes English assessment like TOEFL.

Another example of this holistic measurement approach: In our Listening section, a variety of task types do different jobs and feature different types (and length) of input. These tasks allow for the measurement of many skills (e.g., ability to understand implied meanings), contexts (e.g., professional and academic interaction), and genres (e.g., conversations; lectures).

But a closer look at individual item types also shows their representativeness of academic communication that students will encounter in real-world environments. More examples are offered below.

  • In the Reading section, Read an Academic Passage provides insight into a student’s ability to obtain information and understand the meaning of complex texts, as is commonly expected in academic study.
  • In the Writing section, Write for an Academic Discussion, which occurs in the context of a class discussion prompted by an instructor, asks a student to express their own views, supported with relevant reasoning, knowledge, or experience. Students are also expected to respond to peers’ contributions.
  • Take an Interview, one of the two Speaking tasks, asks students to participate in a simulated conversation with a pre-recorded interviewer. The interview takes place within a variety of academic situations, like participating in a research study. Initial questions focus on factual information and personal experience; later questions ask students to express and support opinions regarding broader issues.
  • In the Listening section, our Academic Talk task measures a student’s ability to understand a monologic lecture. This task is used in tandem with tasks like Listen to a Conversation to measure a student’s ability to succeed in modern classroom environments, where actively participating in group discussions is often as important as one-way listening in a lecture hall.
Is there a sufficient range of cognitive processes across the tasks, not just surface-level comprehension or recall? And do the tasks require the types of cognitive operations (e.g. analysis, synthesis, critical thinking) expected in UK academic settings?

Engaging a broad range of cognitive processes – not just surface-level comprehension or recall – remains at the heart of TOEFL’s design. TOEFL iBT’s tasks also test cognitive operations, like analysis, synthesis, and critical thinking, that are expected in rigorous academic settings – globally and in the UK.

To start, the Writing tasks in today’s TOEFL engage a wide range of cognitive processes, including micro-planning, macro-planning, monitoring, and synthesizing information.

TOEFL’s Reading tasks measure cognitive processes like understanding academic vocabulary, integrating textual information across sentences, inferring the situation implied in a text, understanding an author’s point of view, and inferring the meaning of figurative language.

TOEFL Listening tests a student’s ability to identify main ideas and supporting details, derive relationships among ideas, draw inferences, understand a speaker’s purpose and attitude, and process extended speech and organizational devices. Students also must make use of phonological information, lexical and grammatical meaning, and pragmatic information.

And TOEFL Speaking measures the ability to rapidly process and produce spoken language, plan and organize a spoken response, evaluate and form an opinion, and create a structured argument. It also requires metacognitive strategies (like monitoring pronunciation while speaking) and discourse management (controlling pace and intonation; using transitions).

Does the test assess all four skills (listening, speaking, reading, writing), either as separate components or in integrated tasks?

Yes, the test evaluates the four language skills of reading, listening, writing, and speaking as separate components. At the same time, test tasks require test takers to combine multiple English-language skills, such as listening or reading, to provide spoken or written responses to test questions effectively. 

Such integration across language skills makes these test tasks vital tools to measure test takers’ English proficiency.

Is there independent, transparent evidence that the test or qualification reliably assesses language ability at CEFR B2 level or higher?

From item and task development to score development and evaluation, the TOEFL iBT test is tightly aligned to each of the CEFR levels. First, the CEFR levels are integral to the targeted development and alignment of items as evidence to the claims and can-do statements, as articulated in the Test Specifications.

The development of the banded score scale (1-6) also reflects each of the six CEFR levels (A1-C2). The scale has an added benefit of consistency and ease of interpretation as the same scores are provided across the four test section scores, as well as the overall score (which is the rounded average of the section scores). For example, a score of 4 aligns with CEFR level B2 for Reading, Listening, Writing, Speaking, as well as the overall score.

The scale development was multifaceted and included mapping methodologies, including vertical linking, content evaluation and alignment between tasks and CEFR descriptors, as well as standard setting. Because the task types and number of items available in each test section varies, the methodology for mapping test scores to the CEFR levels also varied for test sections evaluating receptive language skills and test sections evaluating productive language skills.

These methods are described in a research paper, which further documents how the TOEFL iBT test reflects the skills associated with English language proficiency as described in the CEFR levels, from A1 to C2. Independent research to support these efforts for the latest edition of the TOEFL iBT are planned and will benefit from operational data as verification evidence.

Facebook Twitter LinkedIn
Copy URL to clipboard

Related

Why Did TOEFL iBT Update its Score Scale
TOEFL Research
Why Did TOEFL iBT Update its Score Scale?
June 8, 2026
toefl speaking research
TOEFL Research
Connecting TOEFL Speaking to Speaking at University

Learn how the TOEFL iBT® Speaking tasks, Listen & Repeat and Take an Interview, serve as strong indicators of how well students perform on actual academic speaking tasks.

May 10, 2026
Validity by design
TOEFL Research
Inside the TOEFL iBT Updates: Validity by Design

The TOEFL iBT team discusses the design principles underpinning the latest updates to the globally recognized English exam.

April 23, 2026
The “Forgotten” English Skill: A Deep Dive on Listening With Spiros Papageorgiou
TOEFL Research
The “Forgotten” English Skill: A Deep Dive on Listening With Spiros Papageorgiou

Spiros Papageorgiou shares how TOEFL balances the need to create authentic Listening tasks while adhering to key measurement principles.

April 6, 2026
Building a Fair Measure of English Writing Skills: A Conversation With Larry Davis
TOEFL Research
Building a Fair Measure of English Writing Skills: A Conversation With Larry Davis

Larry Davis offers insights into how TOEFL has refined its measurement of English writing skills.

March 30, 2026