The Construction and Validation of a Q-matrix for a High-stakes Reading Comprehension Test: A G-DINA Study

کلید واژه ها: Cognitive Diagnostic Assessment Test Reading Comprehension Q-Matrix construction Q-matrix Validation

Investigating the processes underlying test performance is a major source of data for supporting the explanation inference in the validity argument (Chappelle, 2021). One way of modeling the cognitive processes underlying test performance is through the construction of a Q-matrix, which is essentially about summarizing the attributes explaining test takers’ response behavior. The current study documents the construction and validation of a Q-matrix for a high stakes test of reading within a generalized-deterministic inputs, noisy “and” gate (G-DINA) model framework. To this end, the attributes underlying the 20 items of the reading comprehension test were specified through retrospective verbal reports and domain experts’ Delphi techniques. In the ensuing stage, the Q-matrix thus developed along with item response data of 2625 test-takers were subjected to empirical analysis using the procedure suggested by de la Torre and Chiu (2016). Item-level results showed that, except for one item, the processes underlying the other items were captured by compensatory and additive models. This finding has significant implications for model selection for DCM practitioners.

Diagnostic Test Construction: Insights from Cognitive Diagnostic Modeling

کلید واژه ها: Cognitive Diagnostic Assessment Diagnostic Classification Models model fit indices Q-Matrix construction

Although Diagnostic Classification Models (DCMs) were introduced to education system decades ago, it seems that these models were not employed for the original aims upon which they had been designed. Using DCMs has been mostly common in analyzing large-scale non-diagnostic tests and these models have been rarely used in developing Cognitive Diagnostic Assessment (CDA) from scratch. Despite the prevalence of retrofitting CDA studies, true applications of CDA are believed to be rare since, firstly, a coherent framework to conduct such studies had not been available and, secondly, researchers were not able to analyze various DCMs according to the same model fit indices and criteria. This paper presents a summary of different types of DCMs and reviews true and retrofitting CDA studies. Having examined the limitations of previous CDA studies, the present study argues for the implication and application of Ravand and Baghaei’s (2019) framework to conduct true CDA studies. This framework is of importance since not only does it fit into prominent frameworks in education assessment such as Cognitive Design System and Assessment Triangle, but also it can provide test-developers with practical steps in conducting valid cognitive diagnostic tests.

CAPT and its Effect on English Language Pronunciation Enhancement: Evidence from Bilinguals and Monolinguals(مقاله علمی وزارت علوم)

کلید واژه ها: bilinguals CAPT monolinguals pronunciation perception pronunciation production

Nowadays there are several challenges for English teachers as well as researchers regarding how to teach foreign language pronunciation more effectively. The current study aimed to explore the effect of computer-assisted pronunciation teaching (CAPT) on Persian monolinguals and Turkmen- Persian and also Baloch- Persian bilinguals’ pronunciation considering production and perception. A sample of 48 female mono and bilingual 7th-grade students participated in this study and made the experimental and comparison groups. All the participants took the Oxford Placement Test and accordingly were in the beginner level of English language proficiency (95.83% of the participants’ scores ranged from 0 to 15). The experimental group experienced a technology-based instruction while the comparison group benefited from traditional listen and repeat method of pronunciation teaching. Two Two-way between-group ANOVAs were used to define the influence of CAPT on pronunciation production and perception of the mono and bilingual participants. The results of the study indicated that CAPT had a significant effect on pronunciation production while pronunciation perception was comparatively more enhanced through the traditional method. Regarding mono and bilingualism, it was also found that bilinguals significantly outperformed monolinguals in pronunciation production in both groups while there was no significant difference between them in pronunciation perception. There were also no interaction effects for pronunciation perception or production scores. The results generally showed that CAPT can be beneficial specifically when it is used along with traditional methods at schools in beginner levels.

Investigating Gender and Major DIF in the Iranian National University Entrance Exam Using Multiple-Indicators Multiple-Causes Structural Equation Modelling(مقاله علمی وزارت علوم)

کلید واژه ها: Differential Item Functioning (DIF) multiple-indicators multiple-causes (MIMIC) structural equation modeling (SEM)

The generalizability aspect of Construct validity, as proposed by Messick (1989), requires that a test measures the same trait across different samples from the same population. Differential Item functioning (DIF) analysis is a key component in the fairness evaluation of educational tests. University entrance exam for the candidates who seek admission into master's English programs (MEUEE) at Iranian state universities is a very high stakes test whose fairness is a promising line of research. The current study explored gender and major DIF in the general English (GE) section of the MEUEE using multiple-indicators multiple-causes (MIMIC) structural equation modelling. The data of all the test takers (n=21,642) who took the GE section of the MEUEE in 2012 were analyzed with Mplus. To determine whether an item is flagged for DIF both practical and statistical significance were considered. The results indicated that 12 items were flagged for DIF in terms of statistical significance. However, only 5 items showed practical significance. The items flagged for DIF alert the test developers and users to potential sources of construct-irrelevant variance in the test scores which may call into question comparison of the test takers’ performance, especially when the tests are used for selection purposes.

کارکرد دوگانه سوال، براساس رویکرد مبتنی بر نظریه تشخیصی طبقه بندی در آزمون خواندن ودرک مفاهیم کنکور منحصرا زبان(مقاله علمی وزارت علوم)


کلید واژه ها: کارگرد دوگانه سوال DIF نیمرخ خصیصه ای مدل های تشخیصی طبقه بندی روش منتل هنزل

کارگرد دوگانه سوال (DIF) وقتی اتفاق می افتد که آزمون شوندگان با سطح توانایی برابر از سازه مورد سنجش، عملکرد متفاوتی در هر کدام از سوالات یک آزمون داشته باشند. مطالعه حاضر به بررسی DIF در سوالات خواندن و درک مفاهیم آزمون منحصرا زبان ورودی دانشگاه های ایران می باشد. علاوه بر این در مطالعه حاضر روش DIF مبتنی بر مدل های تشخیصی طبقه بندی و روش منتل هنزل پرداخته شده است. بدین منظور پاسخ 10000 نفر از داوطلبان آزمون مذکور با استفاده از بسته های CDM و difRدر نرم افزار R استفاده گردید. نتایج نشان داد که در روش تشخیصی طبقه بندی یک سوال و در روش سنتی منتل هنزل دو سوال دارای DIF متوسط شناخته شدند که به نظر می رسد تهدیدی برای روایی سازه آزمون مورد نظر محسوب نمیشود. همچنین میتوان نتیجه گرفت هنگامی که نیمرخ خصیصه ای به عنوان متغیر جور کردنی استفاده می شود تعداد سوالات کمتری به عنوان DIF شناخته می شود.

On The Factor Structure (Invariance) of the PhD UEE Using Multigroup Structural Equation Modeling(مقاله علمی وزارت علوم)

کلید واژه ها: factor structure invariance language proficiency multigroup confirmatory factor analysis university entrance examination

The aim of the current study was twofold: (1) to validate the internal structure of the general English (GE) section of the university entrance examination for Ph.D applicants into the English programs at state universities in Iran (Ph.D. UEE), and (2) to examine the factor structure invariance of the Ph.D. UEE across two proficiency levels. Structural equation modeling (SEM) was used to analyze the responses of a random sample of participants (N=1009) who took the test in 2014 to seek admission to English programs at Iranian state universities. First, four models (unitary, uncorrelated, correlated and higher-ordered) were estimated and compared to find the model that best represented the data. Then, the factor structure invariance of the test across two proficiency levels was explored using multigroup confirmatory factor analysis. The higher-order and correlated three-factor model showed the best fit to the data. The result also showed that the structure of the test remained invariant across both proficiency levels. These results supported the multi-componential view of language proficiency. It was found that there is no relationship between levels of language proficiency and the structure of the test. However, the results called into question the score-reporting policy for the PhD UEE and led to the conclusion that a single total score does not reflect the structure of the test.

Investigating the Effect of Self-, Peer-, and Teacher Assessment in Second Language Writing over Time: A Multifaceted Rasch Approach(مقاله علمی وزارت علوم)

کلید واژه ها: Self-assessment Peer-assessment EFL Writing Multifaceted Rasch Measurement Teacher Assessment

This study investigated the accuracy of scores assigned by self-, peer-, and teacher assessors over time. Thirty-three English majors who were taking paragraph development course at Vali-e-Asr University of Rafsanjan and two instructors who had been teaching essay writing for at least two years at university, participated in the study. After receiving instructions on paragraph development, participants were trained for a session on how to rate the paragraphs. For three sessions the students were given topics to write about and were asked to rate their own and one of their peers’ papers for mechanics, grammar and choice of words, content development, and organization. The teachers also rated the paragraphs according to the same criteria. Multifaceted Rasch measurement was employed to analyze the data. The results showed different patterns of performance for the subjects rated by different raters at the beginning of the experiment. However, rater bias showed significant decrease across time. The results of the study have useful implications for language teachers especially in portfolio assessment where self and peer assessment provide invaluable help.

