مطالب مرتبط با کلیدواژه
۱.
۲.
۳.
۴.
۵.
Rater training
حوزه های تخصصی:
Although the use of verbal protocols is growing in oral assessment, research on the use of raters’ verbal protocols is rather rare. Moreover, those few studies did not use a mixed-methods design. Therefore, this study investigated the possible impacts of rater training on novice and experienced raters’ application of a specified set of standards in rating. To meet this objective, the study made use of verbal protocols produced by 20 raters who scored 300 test takers’ oral performances and analyzed the data both qualitatively and quantitatively. The outcomes demonstrated that through applying the training program, the raters were able to concentrate more on linguistic, discourse, and phonological features; therefore, the extent of their agreement increased specifically among the inexperienced raters. The analysis of verbal protocols also revealed that training how to apply a well-defined rating scale can foster its use for raters both validly and reliably. Various groups of raters approach the task of rating in different ways, which cannot be explored through pure statistical analysis. Thus, think-aloud verbal protocols can shed light on the vague sides of the issue and add to the validity of oral language assessment. Moreover, since the results of this study showed that inexperienced raters can produce protocols of higher quality and quantity in the use of macro and micro strategies to evaluate test takers’ performances, there is no evidence based on which decision makers should exclude inexperienced raters solely because of their lack of adequate experience.
A Study of Raters’ Behavior in Scoring L2 Speaking Performance: Using Rater Discussion as a Training Tool(مقاله علمی وزارت علوم)
منبع:
Issues in Language Teaching (ILT), Vol. ۸, No. ۱, June ۲۰۱۹
195 - 224
حوزه های تخصصی:
The studies conducted so far on the effectiveness of resolution methods including the discussion method in resolving discrepancies in rating have yielded mixed results. What is left unnoticed in the literature is the potential of discussion to be used as a training tool rather than a resolution method. The present study addresses this research gap by exploring the data coming from rating behaviors of 5 Iranian raters of English. Qualitative analysis of the data indicated that the discussion method can serve the function of training raters. It helped raters rate more easily, quickly and confidently. Furthermore, it helped them improve their understanding and application of the rating criteria and enabled them justify their scoring decisions. Many-faceted Rasch analysis also supported the beneficial effects of discussion in terms of improvement in raters’ severity, consistency in scoring, and the use of scale categories. The findings provide insight into the potential of discussion to be used as a training tool especially in EFL contexts in which lack of access to expert raters can be an obstacle to holding training programs. The author argues for future studies to focus on how discussion may function depending on the rating scale used.
Fairness in Oral Language Assessment: Training Raters and Considering Examinees’ Expectations
حوزه های تخصصی:
This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees’ expectations by the examiners have any effect on test-takers’ perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian intermediate EFL learners’ oral performance on the speaking module of the IELTS in two stages (i.e. pre- and post-training stage). Furthermore, following Kunnan’s (2004) Test Fairness Framework, a questionnaire on fairness in oral language assessment was developed, and after pilot testing and validating, it was administered to the examinees at both stages. The examinees’ expectations were taken into account in the second round of the speaking test. The results indicated that rater training is likely to promote inter-rater reliability and, in turn, enhances the fairness of the decisions made based on the test scores. It was also concluded that considering students’ expectations of a fair test would improve their overall perceptions of being fairly evaluated. The results of this study sought to provide second language teachers, oral test developers, and oral examiners and raters with useful insights into addressing fairness-related issues in oral assessment.
Development and Validation of a Training-Embedded Speaking Assessment Rating Scale: A Multifaceted Rasch Analysis in Speaking Assessment
حوزه های تخصصی:
Performance testing including the use of rating scales has become widespread in the evaluation of second/foreign oral language assessment. However, no study has used Multifaceted Rasch Measurement (MFRM) including the facets of test takers’ ability, raters’ severity, group expertise, and scale category, in one study. 20 EFL teachers scored the speaking performance of 200 test-takers prior and subsequent to a rater training program using an analytic rating scale consisting of fluency, grammar, vocabulary, intelligibility, cohesion, and comprehension categories. The outcome demonstrated that the categories were at different levels of difficulty even after the training program. However, this outcome by no means indicated the uselessness of the training program since data analysis reflected the constructive influence of training in providing enough consistency in raters’ rating of each category of the rating scale at the post-training phase. Such an outcome indicated that raters could discriminate the various categories of the rating scale. The outcomes also indicated that MFRM can result in enhancement in rater training and functionality validation of the rating scale descriptors. The training helped raters use the descriptors of the rating scale more efficiently of its various band descriptors resulting in a reduced halo effect. The findings conveyed that stakeholders had better establish training programs to assist raters in better use of the rating scale categories of various levels of difficulty in an appropriate way. Further research could be done to make a comparative analysis between the outcome of this study and the one using a holistic rating scale in oral assessment.
Facet Variability in the Light of Rater Training in Measuring Oral Performance: A Multifaceted Rasch Analysis(مقاله علمی وزارت علوم)
حوزه های تخصصی:
Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, obtaining consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral performance is rater training. Recently, Multifaceted Rasch Measurement (MFRM) has been adopted to address the problem of rater bias and inconsistency; however, no research has incorporated the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale criterion category, and test version together in a piece of research along with their two-sided impacts. Moreover, little research has investigated how long rater training effects last. Consequently, this study explored the influence of the training program and feedback by having 20 raters score the oral production, as measured by the CEP (Community English Program) test, produced by 300 test takers in three phases, i.e., before, immediately after and long after the training program. The results indicated that training can lead to higher degrees of interrater reliability and diminished measures of severity/leniency, and biasedness. However, it won't lead the raters into total unanimity, except for making them more self-consistent. Although rater training might result in higher internal consistency among raters, it cannot eradicate individual differences. That is, experienced raters, due to their idiosyncratic characteristics, did not benefit as much as inexperienced ones. This study also showed that the outcome of training might not endure in long run after training; thus, it requires ongoing training letting raters regain consistency.