Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other sources of error also affect reliability. When measurement involves more than one source of error, as in the case of scoring open-ended items, item response theory, which removes the restriction of the classical test theory, is preferred.
Purpose of Study: The purpose of the study is to assess the infit-outfit statistics and reliability coefficients of the scores for a statistics exam composed of open-ended items using the many facet Rasch model (MFRM) analysis for each source of variability (i.e., students, items, and raters) and to interpret the reliability of the scores.
Methods:In this study, MFRM was used to analyse the answers given to 10 open-ended items in a Statistics I course; the answers were provided by 55 third year graduate students of the Psychological Counselling and Guidance Department of the Faculty of Education in the fall semester of the 2010-2011 academic year. The scoring was performed by three raters who were experts in statistics and work as academic staff at the university. Thereby, this study contains the following three sources of variability (facets): students, items, and raters. Measurement reports, including infit and outfit statistics, separation indexes and reliability coefficients were calculated for each facet by FACET computer package programme.
Findings and Results: According to the MFRM analysis, the reliability coefficients for the student and item facets were .79 and .90, respectively; moreover, the separation indexes of the student and item facets were 1.95 and 2.95, respectively. Additionally, complete consistency was found between the raters in this study.
Conclusions and Recommendations: The MFRM makes important contributions to the analysis of measurement results, the development of measurement tools, the organization of appropriate measurement circumstances, and the provision of effective training for raters. Because it is believed to provide important information, the use of the MFRM might be recommended when analysing the results obtained from exams in which open-ended items are used and through which important decisions about the students’ future are made.
Keywords: Open-ended questions, reliability, many facet Rasch model