The score reliability of language performance tests has attracted increasing interest. Classical Test Theory cannot examine multiple sources of measurement error. Generalizability theory extends Classical Test Theory to provide a practical framework to identify and estimate multiple factors contributing to the total variance of measurement. Generalizability theory by using analysis of variance divides variances into their corresponding sources, and finds their interactions. This study used generalizability theory as a theoretical framework to investigate the effect of raters’ gender on the assessment of EFL students’ writing. Thirty Iranian university students participated in the study. They were asked to write on an independent task and an integrated task. The essays were holistically scored by 14 raters. A rater training session was held prior to scoring the writing samples. The data were analyzed using GENOVA software program. The results indicated that the male raters’ scores were as reliable as those of the female raters for both writing tasks. Large rater variance component revealed the low score generalizability in case of using one rater. The implications of the results in the educational assessment are elaborated.