Using Generalizability Theory to Examine Different Concept Map Scoring Methods

Bayram CETIN, Nese GULER* and Rabia SARICA***
*Faculty of Education, Gazi University, Ankara, Turkey
**Faculty of Education, Sakarya University, Sakarya, Turkey,
***Ministry of Education, Ankara, Turkey

ABSTRACT
Problem Statement: In addition to being teaching tools, concept maps can be used as effective assessment tools. The use of concept maps for assessment has raised the issue of scoring them. Concept maps generated and used in different ways can be scored via various methods. Holistic and relational scoring methods are two of them.
Purpose of the Study: In this study, the reliability of the  concept map scores, which were made by the students and which were scored by different teachers using different scoring methods (holistic and relational), will be discussed in terms of G theory.
Methods: The research was performed during the fall semester of the 2010-2011 academic year, between December and January.  Concept maps created by thirty-six students were scored by three different teachers who played roles as raters. Data were obtained from four different concept maps that were generated by each student.
Findings and Results: In focusing on the size of the variance estimates according to holistic scoring methods, while the student component (objects of measurement) accounts for one of the largest percentages of the variance (20%), the main effects of the task and the raters account for about 14% and almost 0% of the total variance, respectively. The difficulty level of tasks did not differ so much from student to student, and there is a scoring agreement among raters. Using the holistic scoring method,  and  coefficients were calculated as 0.63 and 0.57, respectively, depending upon the four tasks and three raters. In terms of relational scoring, the student component (object of measurement) accounts for 10% of the variance, the main effect of the task accounts for a very significant percentage of the variance (56%), and the main effect of the raters does not demonstrate any variance. G and Φ coefficients calculated over the four tasks and three raters in the study were .63 and .34, respectively.
Conclusions and Recommendations: According to the results of this study, Phi coefficient was higher in the concept map study in which the holistic scoring method was used. In this study, tasks represented a significant variance component for both scoring methods. This may be interpreted to mean that the levels of difficulty for the tasks differed according to the students using both methods. In each of the scoring methods, the variance related to the raters was found to be zero, which may result in the interpretation that raters scored the maps consistently.

Keywords: Generalizability theory, rater effect, scoring concept maps, scoring methods.