Rater training

۱.

Investigating the Effect of the Training Program on Raters’ Oral Performance Assessment: A Mixed-Methods Study on Raters’ Think-Aloud Verbal Protocols(مقاله علمی وزارت علوم)

نویسنده: هومن بیژنی مونا خبیری

منبع: Iranian Journal of Applied Linguistics (IJAL) Vol. ۲۰, No. ۱, March ۲۰۱۷ 113-150

کلیدواژه‌ها: bias Oral performance assessment Rater training Think-aloud verbal protocols

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۷۱۱ تعداد دانلود : ۳۳۶

Although the use of verbal protocols is growing in oral assessment, research on the use of raters’ verbal protocols is rather rare. Moreover, those few studies did not use a mixed-methods design. Therefore, this study investigated the possible impacts of rater training on novice and experienced raters’ application of a specified set of standards in rating. To meet this objective, the study made use of verbal protocols produced by 20 raters who scored 300 test takers’ oral performances and analyzed the data both qualitatively and quantitatively. The outcomes demonstrated that through applying the training program, the raters were able to concentrate more on linguistic, discourse, and phonological features; therefore, the extent of their agreement increased specifically among the inexperienced raters. The analysis of verbal protocols also revealed that training how to apply a well-defined rating scale can foster its use for raters both validly and reliably. Various groups of raters approach the task of rating in different ways, which cannot be explored through pure statistical analysis. Thus, think-aloud verbal protocols can shed light on the vague sides of the issue and add to the validity of oral language assessment. Moreover, since the results of this study showed that inexperienced raters can produce protocols of higher quality and quantity in the use of macro and micro strategies to evaluate test takers’ performances, there is no evidence based on which decision makers should exclude inexperienced raters solely because of their lack of adequate experience.

۲.

A Study of Raters’ Behavior in Scoring L2 Speaking Performance: Using Rater Discussion as a Training Tool(مقاله علمی وزارت علوم)

نویسنده: علیرضا احمدی

منبع: Issues in Language Teaching (ILT), Vol. ۸, No. ۱, June ۲۰۱۹ 195 - 224

کلیدواژه‌ها: discussion Rater training L2 speaking many-faceted Rasch analysis resolution method

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۷۵۲ تعداد دانلود : ۵۴۶

The studies conducted so far on the effectiveness of resolution methods including the discussion method in resolving discrepancies in rating have yielded mixed results. What is left unnoticed in the literature is the potential of discussion to be used as a training tool rather than a resolution method. The present study addresses this research gap by exploring the data coming from rating behaviors of 5 Iranian raters of English. Qualitative analysis of the data indicated that the discussion method can serve the function of training raters. It helped raters rate more easily, quickly and confidently. Furthermore, it helped them improve their understanding and application of the rating criteria and enabled them justify their scoring decisions. Many-faceted Rasch analysis also supported the beneficial effects of discussion in terms of improvement in raters’ severity, consistency in scoring, and the use of scale categories. The findings provide insight into the potential of discussion to be used as a training tool especially in EFL contexts in which lack of access to expert raters can be an obstacle to holding training programs. The author argues for future studies to focus on how discussion may function depending on the rating scale used.

۳.

Fairness in Oral Language Assessment: Training Raters and Considering Examinees’ Expectations

نویسنده: مهدی دوستی محمد احمدی صفا

منبع: International Journal of Language Testing, Volume ۱۱, Issue ۲, Summer and Autumn ۲۰۲۱ 64 - 90

کلیدواژه‌ها: Inter-rater reliability oral language assessment Rater training Test fairness

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۲۵۶ تعداد دانلود : ۲۶۷

This study examined the effect of rater training on promoting inter-rater reliability in oral language assessment. It also investigated whether rater training and the consideration of the examinees’ expectations by the examiners have any effect on test-takers’ perceptions of being fairly evaluated. To this end, four raters scored 31 Iranian intermediate EFL learners’ oral performance on the speaking module of the IELTS in two stages (i.e. pre- and post-training stage). Furthermore, following Kunnan’s (2004) Test Fairness Framework, a questionnaire on fairness in oral language assessment was developed, and after pilot testing and validating, it was administered to the examinees at both stages. The examinees’ expectations were taken into account in the second round of the speaking test. The results indicated that rater training is likely to promote inter-rater reliability and, in turn, enhances the fairness of the decisions made based on the test scores. It was also concluded that considering students’ expectations of a fair test would improve their overall perceptions of being fairly evaluated. The results of this study sought to provide second language teachers, oral test developers, and oral examiners and raters with useful insights into addressing fairness-related issues in oral assessment.

۴.

Development and Validation of a Training-Embedded Speaking Assessment Rating Scale: A Multifaceted Rasch Analysis in Speaking Assessment

نویسنده: هومن بیژنی بهاره هاشم پور Salim Said Bani Orabah

منبع: Research in English Education Volume ۷, Issue ۳ (۲۰۲۲) 32-45

کلیدواژه‌ها: bias Interrater consistency Intrarater consistency multifaceted Rasch measurement (MFRM) Rater training rating scale

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۴۴۰ تعداد دانلود : ۲۲۹

Performance testing including the use of rating scales has become widespread in the evaluation of second/foreign oral language assessment. However, no study has used Multifaceted Rasch Measurement (MFRM) including the facets of test takers’ ability, raters’ severity, group expertise, and scale category, in one study. 20 EFL teachers scored the speaking performance of 200 test-takers prior and subsequent to a rater training program using an analytic rating scale consisting of fluency, grammar, vocabulary, intelligibility, cohesion, and comprehension categories. The outcome demonstrated that the categories were at different levels of difficulty even after the training program. However, this outcome by no means indicated the uselessness of the training program since data analysis reflected the constructive influence of training in providing enough consistency in raters’ rating of each category of the rating scale at the post-training phase. Such an outcome indicated that raters could discriminate the various categories of the rating scale. The outcomes also indicated that MFRM can result in enhancement in rater training and functionality validation of the rating scale descriptors. The training helped raters use the descriptors of the rating scale more efficiently of its various band descriptors resulting in a reduced halo effect. The findings conveyed that stakeholders had better establish training programs to assist raters in better use of the rating scale categories of various levels of difficulty in an appropriate way. Further research could be done to make a comparative analysis between the outcome of this study and the one using a holistic rating scale in oral assessment.

۵.

Facet Variability in the Light of Rater Training in Measuring Oral Performance: A Multifaceted Rasch Analysis(مقاله علمی وزارت علوم)

نویسنده: هومن بیژنی سلیم سعید بنی اوراباه

منبع: Issues in Language Teaching (ILT), Vol. ۱۱, No. ۲, ِDecember ۲۰۲۲ 255 - 290

کلیدواژه‌ها: bias Interrater consistency multifaceted Rasch measurement (MFRM) Rater training Severity/leniency

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۲۵۹ تعداد دانلود : ۲۱۹

Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, obtaining consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral performance is rater training. Recently, Multifaceted Rasch Measurement (MFRM) has been adopted to address the problem of rater bias and inconsistency; however, no research has incorporated the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale criterion category, and test version together in a piece of research along with their two-sided impacts. Moreover, little research has investigated how long rater training effects last. Consequently, this study explored the influence of the training program and feedback by having 20 raters score the oral production, as measured by the CEP (Community English Program) test, produced by 300 test takers in three phases, i.e., before, immediately after and long after the training program. The results indicated that training can lead to higher degrees of interrater reliability and diminished measures of severity/leniency, and biasedness. However, it won't lead the raters into total unanimity, except for making them more self-consistent. Although rater training might result in higher internal consistency among raters, it cannot eradicate individual differences. That is, experienced raters, due to their idiosyncratic characteristics, did not benefit as much as inexperienced ones. This study also showed that the outcome of training might not endure in long run after training; thus, it requires ongoing training letting raters regain consistency.

۶.

Construct Validation of a Rating Scale through a Training Program: A Multifaceted Rasch Analysis in Speaking Assessment(مقاله علمی وزارت علوم)

نویسنده: Wander Lowie هومن بیژنی محمدرضا عروجی زینب خلافی پویا عباسی

منبع: Iranian Journal of Applied Linguistics (IJAL) Vol. ۲۶, No. ۲, September ۲۰۲۳ 48-80

کلیدواژه‌ها: bias Interrater consistency Intrarater consistency multifaceted Rasch measurement (MFRM) Rater training rating scale

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۲۲ تعداد دانلود : ۱۳

Performance testing including the use of rating scales has become highly widespread in the evaluation of second/foreign oral assessment. However, few studies have used a pre-, post-training design investigating the impact of a training program on the reduction of raters’ biases to the rating scale categories resulting in increase in their consistency measures. Besides, no study has used MFRM including the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale category, and test version all in a single study. 20 EFL teachers rated the oral performances produced by 200 test takers before and after a training program using an analytic rating scale including fluency, grammar, vocabulary, intelligibility, cohesion and comprehension categories. The outcome of the study indicated that MFRM can be used to investigate raters’ scoring behavior and can result in enhancement in rater training and validating the functionality of the rating scale descriptors. Training can also result in higher levels of interrater consistency and reduced levels of severity/leniency; however, it cannot turn raters into duplicates of one another, but can make them more self-consistent. Training helped raters use the descriptors of the rating scale more efficiently of its various band descriptors resulting in reduced halo effect. Finally, the raters improved consistency and reduced rater-scale category biases after the training program. The remaining differences regarding bias measures could probably be attributed to the result of different ways of interpreting the scoring rubrics which is due to raters’ confusion in the accurate application of the scale.

۷.

Raters’ Perception and Expertise in Evaluating Second Language Compositions(مقاله پژوهشی دانشگاه آزاد)

نویسنده: هومن بیژنی

منبع: Journal of Applied Linguistics Vol. ۳, No. ۷, fall ۲۰۱۰ 67 - 87

کلیدواژه‌ها: Experienced Raters Inexperienced Raters Interrater Reliability Rater training writing assessment

حوزه‌های تخصصی:

حوزه‌های تخصصی زبان شناسی

تعداد بازدید : ۱۴ تعداد دانلود : ۷

The consideration of rater training is very important in construct validation of a writing test because it is through training that raters are adapted to the use of students’ writing ability instead of their own criteria for assessing compositions (Charney, 1984). However, although training has been discussed in the literature of writing assessment, there is little research regarding raters’ perceptions and understandings of the training program. Although a few studies have looked at the differences between trained and untrained raters in writing assessment (Cumming, 1990; Huot, 1990), few studies have used a pre-and post-training design. The purpose of this study is to investigate the effectiveness of the training program on experienced and inexperienced raters with regard to a pre-and post- training design. Twelve EFL raters scored 45 pre-rated benchmark essay compositions by an authorized IELTS trainer. These essay compositions were scored before, during and after the training program. The results regarding the comparison across raters showed that inexperienced raters had wider range of inconsistency before training but they became more consistent than experienced raters after training.