Ege University, TURKEY.
Purpose: This study intended to examine the generalizability and reliability of essay ratings within the scope of the generalizability (G) theory. Specifically, the effect of raters on the generalizability and reliability of students’ essay ratings was examined. Furthermore, variations of the generalizability and reliability coefficients with respect to the number of raters and optimal number of raters for obtaining optimal reliability of the rating of the writing ability of a student, which is considered to be an implicit trait as a whole and in its subdimensions of wording/writing, paragraph construction, and title selection, were determined.
Research Methods: The student sample of the study comprised 443 students who were selected via random cluster sampling, and rater sample of this study comprised four Turkish teachers. All the essays written by the students in the sample were independently rated on a writing skill scale (WSS), which is an ordinal scale comprising 20 items, by four trained teachers. In this study, data analysis was performed using the multivariate 𝑝•𝑥 𝑖∘𝑥 𝑟• design of the G theory. Finding: In the G studies that were performed, variances of the rater (r) as well as item and rater (ixr) were low in all sub-dimensions; however, variance of the object of measurement and rater (pxr) was relatively high. The presence of trained raters increased the reliability of the ratings.
Implications for Research and Practice: In the decision (D) study analyses of the original study conducted using four raters, the G and Phi coefficients for the combined measurement were observed to be .95 and .94, respectively. Further, the G and Phi coefficients were .91 and .90, respectively, for the alternative D studies that were conducted by two trained raters. Thus, rating of essays by two trained raters may be considered to be satisfactory.
Keywords: Generalizability Theory, generalizability, reliability, essay rating, essay rater reliability, writing ratings