International Journal of Language Testing

International Journal of Language Testing

International Journal of Language Testing, Volume 13, Special Issue, March 2023

مقالات

۱.

A Cognitive Diagnostic Assessment Study of the Reading Comprehension Section of the Preliminary English Test (PET)

کلید واژه ها: B1 Preliminary English test reading attributes G-DINA Compensatory non-compensatory

حوزه های تخصصی:
تعداد بازدید : ۱۷۸ تعداد دانلود : ۱۷۷
Cognitive diagnostic models (CDMs) have received much interest within the field of language testing over the last decade due to their great potential to provide diagnostic feedback to all stakeholders and ultimately improve language teaching and learning. A large number of studies have demonstrated the application of CDMs on advanced large-scale English proficiency exams, such as IELTS, TOEFL, MELAB, and ECPE. However, too little attention has been paid to the utility of CDMs on elementary and intermediate high-stakes English exams. The current study aims to diagnose the reading ability of test takers in the B1 Preliminary test, previously known as the Preliminary English Test (PET), using the generalized deterministic input, noisy, “and” gate (G-DINA; de la Torre, 2011) model. The G-DINA is a general and saturated model which allows attributes to combine in both compensatory and non-compensatory relationships and each item to select the best model. To achieve the purpose of the study, an initial Q-matrix based on the theory of reading comprehension and the consensus of content experts was constructed and validated. Item responses of 435 test takers to the reading comprehension section of the PET were analyzed using the “G-DINA” package in R. The results of attribute profiles suggested that lexico-grammatical knowledge is the most difficult attribute, and making an inference is the easiest one.
۲.

Examining Attribute Relationship Using Diagnostic Classification Models: A Mini Review

کلید واژه ها: Diagnostic Classification Models Attribute Relationship GDINA DINO DINA

حوزه های تخصصی:
تعداد بازدید : ۱۲۲ تعداد دانلود : ۱۰۱
Diagnostic classification models (DCMs) have recently become very popular both for research purposes and for real testing endeavors for student assessment. A plethora of DCM models give researchers and practitioners a wide range of options for student diagnosis and classification. One intriguing option that some DCM models offer is the possibility of examining the nature of the interactions among the attributes underlying a skill. Attributes in second/foreign language (L2) may interact with each other in a compensatory/non-compensatory manner. Subskill/attribute relationship has been studied using diagnostic classification models. The present study provides a mini review of the DCM studies on the attribute relationships in L2 reading, listening, and writing. The criteria based on which interaction between the attributes have been inferred are reviewed. The results showed that the majority of DCM studies have investigated reading comprehension and more studies are required on the productive skills of writing and speaking. Furthermore, suggestions for future studies are provided.
۳.

The Construction and Validation of a Q-matrix for Cognitive Diagnostic Analysis: The Case of the Reading Comprehension Section of the IAUEPT

کلید واژه ها: Cognitive Diagnostic Models (CDMs) GDINA Islamic Azad University English Proficiency Test (IAUEPT) Q-Matrix Reading comprehension attributes

حوزه های تخصصی:
تعداد بازدید : ۲۱۱ تعداد دانلود : ۱۵۹
Cognitive diagnostic models (CDMs) have received sustained attention in educational settings because they can be used to operationalize formative assessment to provide diagnostic feedback and inform instruction. A large number of CDMs have been developed over the past few years. An important component of all CDMs is a Q-matrix that specifies a particular hypothesis about the relationship between each test item and its required attributes. The purpose of this study was to construct and validate a Q-matrix for the reading comprehension section of the Islamic Azad University English Proficiency Test (IAUEPT), as an advanced English placement test designed to measure language ability of Ph.D. candidates who tend to pursue their studies in the IAU. To achieve this, using item responses of 1152 candidates to twenty items of the reading section of the test, an initial Q-matrix was constructed based on theories and models of second/foreign language (L2) reading comprehension, previous applications of CDMs on L2 reading comprehension, and brainstorming and consensus of five content experts. Then, the initial Q-matrix was empirically validated using the method proposed by de la Torre and Chiu (2016) and checking mesa plots, and heatmap plot. Five attributes were derived for the reading comprehension section: vocabulary, grammar, making an inference, understanding specific information, and identifying explicit information. Finally, the analysis of the Generalized Deterministic Inputs, Noisy “and” Gate (GDINA) regarding absolute fit at item- and test-level as well as three residual-based statistics showed the accuracy of the Q-matrix and a perfect model-data fit.
۴.

Multidimensional IRT Analysis of Reading Comprehension in English as a Foreign Language

کلید واژه ها: Bifactor model Multidimensional IRT Reading Comprehension Unidimensional IRT

حوزه های تخصصی:
تعداد بازدید : ۱۳۹ تعداد دانلود : ۹۰
Unidimensionality is an important assumption of measurement but it is violated very often. Most of the time, tests are deliberately constructed to be multidimensional to cover all aspects of the intended construct. In such situations, the application of unidimensional item response theory (IRT) models is not justified due to poor model fit and misleading results. Multidimensional IRT (MIRT) models can handle several dimensions simultaneously and yield person ability parameters on several dimensions which is helpful for diagnostic purposes too. Furthermore, MIRT models use the correlation between the dimensions to enhance the precision of the measurement. In this study a reading comprehension test is modelled with the multidimensional Rasch model. The findings showed that a correlated 2-dimensional model has the best fit to the data. The bifactor model revealed some interesting information about the structure of reading comprehension and the reading curriculum. Implications of the study for the testing and teaching of reading comprehension are discussed.
۵.

Psychometric Modelling of Reading Aloud with the Rasch Model

کلید واژه ها: Rasch partial credit model Reading aloud speaking test Validation

حوزه های تخصصی:
تعداد بازدید : ۱۳۷ تعداد دانلود : ۱۰۵
Reading aloud is recommended as a simple technique to measure speaking ability (Hughes & Hughes, 2020; Madsen, 1983). Reading aloud is currently used in the Pearson Test of English and a couple of other international English as a second language proficiency tests. Due to the simplicity of the technique, it can be used in conjunction with other techniques to measure foreign and second language learners’ speaking ability. One issue in reading aloud as a testing technique is its psychometric modelling. Because of the peculiar structure of reading aloud tasks, analysing them with item response theory models is not straightforward. In this study, the Rasch partial credit model (PCM) is suggested and used to score examinees’ reading aloud scores. The performances of 196 foreign language learners on five reading aloud passages were analysed with the PCM. Findings showed that the data fit the RPCM well and the scores are highly reliable. Implications of the study for psychometric evaluation of reading aloud or oral reading fluency are discussed.
۶.

Distractor Analysis in Multiple-Choice Items Using the Rasch Model

کلید واژه ها: Distractor analysis Item response theory Multiple-choice items Rasch model

حوزه های تخصصی:
تعداد بازدید : ۱۵۱ تعداد دانلود : ۱۱۵
Multiple-choice (MC) item format is commonly used in educational assessments due to its economy and effectiveness across a variety of content domains. However, numerous studies have examined the quality of MC items in high-stakes and higher education assessments and found many flawed items, especially in terms of distractors. These faulty items lead to misleading insights about the performance of students and the final decisions. The analysis of distractors is typically conducted in educational assessments with multiple-choice items to ensure high quality items are used as the basis of inference. Item response theory (IRT) and Rasch models have received little attention for analyzing distractors. For that reason, the purpose of the present study was to apply the Rasch model, to a grammar test to analyze items’ distractors of the test. To achieve this, the current study investigated the quality of 10 instructor-written MC grammar items used in an undergraduate final exam, using the items responses of 310 English as a foreign language (EFL) students who had taken part in an advanced grammar course. The results showed the acceptable fit to the Rasch model and high reliability. Malfunctioning distractors were identified.
۷.

Structural Equation Modeling in L2 Research: A Systematic Review

کلید واژه ها: L2 journals L2 research Multivariate data analysis Structural Equation Modeling Systematic review

حوزه های تخصصی:
تعداد بازدید : ۱۶۶ تعداد دانلود : ۱۴۰
Structural equation modeling (SEM), as a flexible and versatile multivariate statistical technique, has been growingly used since its introduction in the 1970s. This article presents a methodological synthesis of the characteristics of the use of SEM in L2 research by examining the reporting practices in light of the current SEM literature to eventually provide some empirically grounded recommendations for future research. A total of 722 instances of SEM found in 145 empirical reports published in 16 leading L2 journals across two periods of 1981-2008 and 2009-2020 were systematically reviewed. Each study was coded for a wide range of analytic and reporting practices. The results indicate that despite the growing popularity of SEM in L2 research, there was a wide variation and inconsistency in its uses and reports within and across the two periods in regard to the underlying assumptions, variables and models, model specification and estimation, and fit statistics. Drawing on the current SEM literature, we will discuss the findings and research implications for future use and reporting of SEM in L2 research.
۸.

A comparison of the added value of subscores across two subscore augmentation methods

کلید واژه ها: subscore augmentation subscore distinctness subscore variability Wainer Yen

حوزه های تخصصی:
تعداد بازدید : ۱۵۱ تعداد دانلود : ۱۲۰
Testing organizations are faced with increasing demand to provide subscores in addition to the total test score. However, psychometricians argue that most subscores do not have added value to be worth reporting. To have added value, subscores need to meet a number of criteria: they should be reliable, distinctive, and distinct from each other and from the total score. In this study, the quality of subscores from two subscore augmentation models (Wainer and Yen) were compared in terms of distinctness and variability. The reliabilities of the Wainer-augmented subscores were also examined. The models were applied to a high-stakes English language proficiency test in Iran. The results of the study showed that Yen better satisfied subscore distinctness while Wainer best preserved variability and had high-reliability subscores. In other words, Yen-augmented subscores had lower correlations while Wainer-augmented subscores better discriminated examinees with different ability levels. Thus, none of the examined models of subscoring satisfied all criteria. The results of the study are discussed and suggestions for future research are provided.
۹.

Detecting Measurement Disturbance: Graphical Illustrations of Item Characteristic Curves

کلید واژه ها: Graphical displays item characteristic curves measurement disturbances model-data fit

حوزه های تخصصی:
تعداد بازدید : ۱۴۶ تعداد دانلود : ۹۲
Measurement disturbances refer to any conditions that affect the measurement of some psychological latent variables, which result in an inaccurate interpretation of item or person estimates derived from a measurement model. Measurement disturbances are mainly attributed to the characteristics of the person, the properties of the items, and the interaction between the characteristics of the person and the features of the items. Although numerous researchers have detected measurement disturbances in different contexts, too little attention has been devoted to exploring measurement disturbances within the context of language testing and assessment, especially using graphical displays. This study aimed to show the utility of graphical displays, which surpass numeric values of infit and outfit statistics given by the Rasch model, to explore measurement disturbances in a listening comprehension test. Results of the study showed two types of outcomes for examining graphical displays and their corresponding numeric fit values: congruent and incongruent associations. It turned out that graphical displays can provide diagnostic information about the performance of test items which might not be captured through numeric values.
۱۰.

Evaluating Measurement Invariance in the IELTS Listening Comprehension Test

کلید واژه ها: Differential Item Functioning IELTS measurement invariance Rasch model

حوزه های تخصصی:
تعداد بازدید : ۱۲۳ تعداد دانلود : ۹۸
Measurement invariance (MI) refers to the degree to which a measurement instrument or scale produces consistent results across different groups or populations. It basically shows whether the same construct is measured in the same way across different groups, such as different cultures, genders, or age groups. If MI is established, it means that scores on the test can be compared meaningfully across different groups. To establish MI mostly confirmatory factor analysis methods are used. In this study, we aim to examine MI using the Rasch model. The responses of 211 EFL learners to the listening section of the IETLS were examined for MI across gender and randomly selected subsamples. The item difficulty measures were compared graphically using the Rasch model. Findings showed that except for a few items, the IELTS listening items exhibit MI. Therefore, score comparisons across gender and other unknown subgroups are valid with the IELTS listening scores.
۱۱.

Analysis of C-Tests with the Equidistance and the Dispersion Models

تعداد بازدید : ۶۷ تعداد دانلود : ۶۳
C-tests are commonly used as measures of second language reading comprehension and general language proficiency. Analysis of C-tests with item response theory models is problematic due to the interdependent structure of C-test items or gaps. An approach to facilitate item response theory (IRT) analysis of C-tests involves treating each passage as a polytomous super-item. This approach facilitates the application of polyomous IRT models for ordered response data to the C-test passages. Usually, Andrich’s rating scale model or Masters (1982) partial credit model are used to analyse the data. In this study, we aim to employ two alternative modelling techniques, namely, the equidistant model (Andrich, 1982) and the dispersion model (Rost, 1988) to C-test data. Our findings showed that the C-test data have a good fit to the both models and can be used for psychometric analysis of C-test passages. Information criteria showed that the dispersion model has a better fit compared to the equidistance model.