A Study on Detecting Differential Item Functioning of PISA 2006 Science Literacy Items in Turkish and American Samples

Prof. Dr., Department of Measurement and Evaluation, Ankara University, Ankara
DOI: 10.14689/ejer.2015.58.3


Problem Statement: Item bias occurs when individuals from different groups (different genders, cultural backgrounds, etc.) have different probabilities of responding correctly to a test item despite having the same skill levels. It is important that tests or items do not have bias in order to ensure the accuracy of decisions taken according to test scores. Thus, items should be tested for bias during the process of test development and adaptation. Items used in testing programs, such as the Program for International Student Assessment (PISA) study, whose results inform educational policies throughout the participating countries, should be reviewed for bias. The study examines whether item bias of the 2006 PISA science literacy test, in Turkish sample versus American sample,

Purpose of the Study: The aim of this study is to analyze the measurement equivelance of the PISA science literacy test of 2006 in Turkish and American groups in terms of structural invariance and also to determine whether the science literacy items include inter-cultural bias.

Methods: The study included data for 757 Turkish and 856 American 15- year-old students. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) tests were performed to determine whether the PISA science literacy test was equivalent in measurement construct in both groups; multi group confirmatory factor analysis (MCFA) was used to identify differences in the factor structure according to cultures. Differential item functioning (DIF) was detected via the Mantel–Haenszel (MH), Simultaneous Item Bias Test (SIBTEST), and Item Response Theory Likelihood- Ratio Analysis (IRT-LR) procedures.

Findings and Results: According to the MCFA results PISA 2006 science literacy test, both Turkish and American groups showed equivalent measurement constructs. Moreover, the three analyses methods agreed at B and C levels for 15 items in the Turkish sample and 25 items in the American sample in terms of DIF. According to expert opinions, common sources for item bias were: familiarity with item content and differing skill levels between cultures.

Conclusions and Recommendations: The 38 items that showed DIF by each of the three methods were accepted as having DIF. The findings of the present study, that there is a possible source of bias in the items, will not change the average level of student performance in participating countries. However, it will be beneficial to review item content before test administration, in order to reduce the erroneous items with DIF across different language and cultural groups in international comparative studies.

Keywords: PISA, DIF, Mantel–Haenszel, SIBTEST, IRT-LR.