Problem Statement: Reliability, which refers to the degree to which measurement results are free from measurement errors, as well as its estimation, is an important issue in psychometrics. Several methods for estimating reliability have been suggested by various theories in the field of psychometrics. One of these theories is the generalizability theory. In generalizability theory, two distinct reliability coefficients are estimated: the generalizability coefficient (G coefficient) for relative evaluation, and the index of dependability (Phi coefficient) for absolute decisions. Like in all methods of reliability estimation, G and Phi coefficients are estimated based on a data set obtained from a sample as a result of administering the instrument. Therefore, it has been a critical issue to determine what sample size is necessary in order to reliably estimate the population’s characteristics.
Purpose of Study: The purpose of this study is to determine the adequate sample size required to ensure that the G and Phi coefficients obtained from a sample can estimate the G and Phi coefficients for the population in an unbiased way.
Methods: A total of 480691 students who took Form A of the SBS test for the 6th grade in 2008 were considered as the population of the study. Using a bootstrap method, a total of 1200 students were selected from this population, randomly falling into 12 subgroups consisting of different sample sizes (n=30, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000), with each sample size having 100 replications. Since the test battery contained five subtests with distinct contents and numbers of items, and all items were replied to by all participants, a multivariate G theory design was used. G and Phi reliability coefficients were estimated both for the population and each of the 12 distinct samples of different sizes. The relative root mean square error (R-RMSE) index was used as the error index to analyze the consistency of the G and Phi coefficients with the G and Phi parameters estimated for the population.
Findings and Results: It was found that the G and Phi coefficients estimated for a sample size of 30 tended to be less than the G and Phi parameters, and the R-RMSE value was greater than .01. When the sample size was 50 or more, R-RMSE values were less than .01. Thus it can be said that G and Phi coefficients are robust estimators of G and Phi parameters. Moreover, it was concluded that where the sample size is 400 or greater, R-RMSE values become stable. It was seen that a sample size of 400 is a more exact and robust estimator of G and Phi parameters, and increasing the sample size over 400 does not make a significant contribution to the unbiased estimation of G and Phi parameters.
Conclusions and Recommendations: A sample size of 30 does not provide an adequately unbiased estimation of G and Phi coefficients. It can be recommended that sample sizes of 50 to 300 are adequate for a robust estimation of G and Phi coefficients; however, a more exact and robust estimation requires a sample size of 400. In future research, the sample size for facets using different designs of G theory can be studied.
Keywords: Generalizability theory, sample size, generalizability coefficient, Phi coefficient