Student evaluation apprehension as one of the detrimental factors in an English as a foreign language (EFL) context, reduces and gradually diminishes student participation in classroom activities, since learners are mostly concerned with how others (teacher and classmates) evaluate/judge their performance. Due to the fact that the studies considering the important role of student evaluation apprehension are scarce in number, this study was conducted to validate the newly-designed questionnaire via exploratory and confirmatory factor analyses and find the relationship between student evaluation apprehension and academic achievement, gender, and educational level of 258 EFL students. The results from EFA, CFA, and reliability analyses revealed that the new questionnaire is a valid and reliable instrument measuring EFL students’ evaluation apprehension. Moreover, a significant negative correlation was observed between student evaluation apprehension and academic achievement. Besides, it was found that females experience evaluation apprehension more than males, and BA students were also found to have more evaluation apprehension than their MA counterparts.
Although Diagnostic Classification Models (DCMs) were introduced to education system decades ago, it seems that these models were not employed for the original aims upon which they had been designed. Using DCMs has been mostly common in analyzing large-scale non-diagnostic tests and these models have been rarely used in developing Cognitive Diagnostic Assessment (CDA) from scratch. Despite the prevalence of retrofitting CDA studies, true applications of CDA are believed to be rare since, firstly, a coherent framework to conduct such studies had not been available and, secondly, researchers were not able to analyze various DCMs according to the same model fit indices and criteria. This paper presents a summary of different types of DCMs and reviews true and retrofitting CDA studies. Having examined the limitations of previous CDA studies, the present study argues for the implication and application of Ravand and Baghaei’s (2019) framework to conduct true CDA studies. This framework is of importance since not only does it fit into prominent frameworks in education assessment such as Cognitive Design System and Assessment Triangle, but also it can provide test-developers with practical steps in conducting valid cognitive diagnostic tests.
Uni-furcation of assessment and instruction has recently been realized in the form of purposeful assessment scenarios; Assessment x̄ Scenarios (analogous to Noam Chomsky's x̄ Theory!). x̄ here refers to any of the triple assessment scenarios including Assessment for Learning (AFL), Assessment as Learning (AAL), and Assessment of Learning (AOL), plus pairing each with another or integrating all three (i.e., Integrated Assessment Scenario). Comparative investigation of the effect of each scenario as to developing language skills particularly listening skill seems to be an intact area. In a bid to fill this gap, 100 conveniently sampled Iranian female EFL learners of 13-19 years old were randomly divided into three experimental and one control group. Prior to the treatment, their listening ability was measured through a pre-test. Then, each experimental group; AFL, AAL, and Integrated assessment, experienced the listening instruction based on the principles of each specific scenario, while the control group was treated based on AOL principles. Their listening ability was then measured in the light of an identical listening post-test to the pre-test. ANOVA, used to check the comparative performances of all groups, showed that AFL and AAL groups significantly outperformed over the AOL group, but the integrated assessment group significantly outperformed the other experimental groups. While the findings yield support to the bifurcation approach, they generate more prospective areas for further research.
Investigating the processes underlying test performance is a major source of data for supporting the explanation inference in the validity argument (Chappelle, 2021). One way of modeling the cognitive processes underlying test performance is through the construction of a Q-matrix, which is essentially about summarizing the attributes explaining test takers’ response behavior. The current study documents the construction and validation of a Q-matrix for a high stakes test of reading within a generalized-deterministic inputs, noisy “and” gate (G-DINA) model framework. To this end, the attributes underlying the 20 items of the reading comprehension test were specified through retrospective verbal reports and domain experts’ Delphi techniques. In the ensuing stage, the Q-matrix thus developed along with item response data of 2625 test-takers were subjected to empirical analysis using the procedure suggested by de la Torre and Chiu (2016). Item-level results showed that, except for one item, the processes underlying the other items were captured by compensatory and additive models. This finding has significant implications for model selection for DCM practitioners.
Placing non-native speakers of English into appropriate classes involves mapping placement test scores onto proficiency levels based on predetermined cut scores. However, studies on how to set boundaries for different levels of proficiency have been lacking in the language testing literature. A top-down approach to standard setting in which a panel of experts set cut scores has dominated the typical standard setting procedure. A less utilized approach is to proceed bottom-up by clustering learners based on test scores. The purpose of this study was to fill this gap by examining Education Testing Services (ETS)’s mapping of TOEFL® iBT Test scores to the Common European Framework of Reference (CEFR) levels. The study examined TOEFL® iBT score data from ICNALE (International Corpus Network of Asian Learners of English) and conducted optimal Kernel Density Estimation to find peaks in the distribution of test scores. In addition to the number of peaks, the local minima of the resulting distribution were chosen as boundaries of cut scores for delineating different ability groups. This method of separating scores, also known as contrasting groups, finds clusters of test takers based on maximum differences in scores. The results showed that ETS’ guide for cut scores linking to CEFR levels was comparable to Kernel Density Estimation with some exceptions, namely two out of three cut scores were found to be similar. Implications are discussed in terms of test-centered versus examinee-centered method of standard setting and the need to consider the demographics of the examinee population in determining cut scores.
The present research aimed to conceptualize the construct of Teacher Assessment Identity (TAI) by designing and validating a questionnaire in the Iranian EFL context. In so doing, a tentative scale with 96 items was piloted on 340 novice and experienced Iranian EFL teachers using Exploratory and Confirmatory Factor Analysis (EFA, CFA). The results of the analyses led to the removal of 33 items, leaving the questionnaire with 61 items on a five-point Likert scale. Moreover, the results revealed that the construct of TAI has 12 factors including assessment “knowledge”, “beliefs”, “attitudes”, “skills and confidence”, “practices”, “use assurance”, “feedback”, “rubric/criteria”, “consistency and consequence”, “grading/scoring”, “question-types”, and “roles”. Likewise, the convergent validity and reliability of the instrument to measure the construct of concern was statistically confirmed (p>.05). The findings have various implications for EFL teachers, teacher trainers, course designers, and language researchers by raising their awareness of assessment identity and its underlying components.