Catalogue Search | MBRL

Assessing the fairness of mathematical literacy test in Indonesia: Evidence from gender-based differential item function analysis

by Kartianom, Kartianom , Retnawati, Heri , Hidayati, Kana in classical test theory , cognitive diagnostic model , differential item function

2024

Conducting a fair test is important for educational research. Unfair assessments can lead to gender disparities in academic achievement, ultimately resulting in disparities in opportunities, wages, and career choice. Differential Item Function [DIF] analysis is presented to provide evidence of whether the test is truly fair, where it does not harm or benefit certain groups of students. For this reason, this study aims to assess the fairness of mathematics literacy tests from a gender perspective using three DIF analysis approaches, namely, the Cognitive Diagnostic Model [CDM], Classical Test Theory [CTT], and Item Response Theory [IRT], and to compare the results of the three approaches to examine the compatibility between them in identifying DIF effects. This study was included in quantitative descriptive research, and for the CDM approach, a retrofitting method (post-hoc analysis) was used. The sample in this study consists of Indonesian students who participated in the administration of PISA 2012 and were tested on Booklet 1, Booklet 3, Booklet 4, and Booklet 6. The Q-matrix used in this study consisted of 12 items and 11 attributes. The results of this study show that out of the 12 items analyzed, there are differences in findings between the CTT, IRT, and CDM approaches; the item with the largest DIF was found using the Raju Unsigned Area Measures method in IRT and the Wald Test from the CDM approach, while the item with the lowest DIF was found using the LRT method from the CDM approach; and there are three items that were simultaneously identified as DIF using the CTT, IRT, and CDM methods, namely PM923Q01, PM923Q03, and PM924Q02. Items PM923Q01 and PM923Q03 favor the group of male students, while item PM924Q02 favors the group of female students.

Journal Article

Share this book

Add to My Shelf

The Arabic Version of the Impact of Event Scale-Revised: Psychometric Evaluation among Psychiatric Patients and the General Public within the Context of COVID-19 Outbreak and Quarantine as Collective Traumatic Events

by Samah M. Taha , Mohammad Yousef Saleh , Amira Mohammed Ali in Anxiety , Cognitive ability , Coronavirus Disease-19; COVID-19; the Impact of Event Scale-Revised (IES-R); post-traumatic stress disorder; psychiatric patients; the general public; healthy individuals; quarantine; gender differences; confirmatory factor analysis; measurement invariance; differential item functioning; psychometric evaluation; concurrent validity; convergent validity; discriminant validity/known-group validity; Arabic/Saudi Arabia

2022

The Coronavirus Disease-19 (COVID-19) pandemic has provoked the development of negative emotions in almost all societies since it first broke out in late 2019. The Impact of Event Scale-Revised (IES-R) is widely used to capture emotions, thoughts, and behaviors evoked by traumatic events, including COVID-19 as a collective and persistent traumatic event. However, there is less agreement on the structure of the IES-R, signifying a need for further investigation. This study aimed to evaluate the psychometric properties of the Arabic version of the IES-R among individuals in Saudi quarantine settings, psychiatric patients, and the general public during the COVID-19 outbreak. Exploratory factor analysis revealed that the items of the IES-R present five factors with eigenvalues > 1. Examination of several competing models through confirmatory factor analysis resulted in a best fit for a six-factor structure, which comprises avoidance, intrusion, numbing, hyperarousal, sleep problems, and irritability/dysphoria. Multigroup analysis supported the configural, metric, and scalar invariance of this model across groups of gender, age, and marital status. The IES-R significantly correlated with the Depression Anxiety Stress Scale-8, perceived health status, and perceived vulnerability to COVID-19, denoting good criterion validity. HTMT ratios of all the subscales were below 0.85, denoting good discriminant validity. The values of coefficient alpha in the three samples ranged between 0.90 and 0.93. In path analysis, correlated intrusion and hyperarousal had direct positive effects on avoidance, numbing, sleep, and irritability. Numbing and irritability mediated the indirect effects of intrusion and hyperarousal on sleep and avoidance. This result signifies that cognitive activation is the main factor driving the dynamics underlying the behavioral, emotional, and sleep symptoms of collective COVID-19 trauma. The findings support the robust validity of the Arabic IES-R, indicating it as a sound measure that can be applied to a wide range of traumatic experiences.

Journal Article

Share this book

Add to My Shelf

Do measures of depressive symptoms function differently in people with spinal cord injury versus primary care patients: the CES-D, PHQ-9, and PROMIS®-D

by Kim, Jiseon , Bombardier, Charles , Kallen, Michael A. in Depression - diagnosis , Female , Humans

2017

Purpose To evaluate whether items of three measures of depressive symptoms function differently in persons with spinal cord injury (SCI) than in persons from a primary care sample. Methods This study was a retrospective analysis of responses to the Patient Health Questionnaire depression scale, the Center for Epidemiological Studies Depression scale, and the National Institutes of Health Patient-Reported Outcomes Measurement Information System (PROMIS®) version 1.0 eight-item depression short form 8b (PROMIS-D). The presence of differential item function (DIF) was evaluated using ordinal logistic regression. Results No items of any of the three target measures were flagged for DIF based on standard criteria. In a follow-up sensitivity analyses, the criterion was changed to make the analysis more sensitive to potential DIF. Scores were corrected for DIF flagged under this criterion. Minimal differences were found between the original scores and those corrected for DIF under the sensitivity criterion. Conclusions The three depression screening measures evaluated in this study did not perform differently in samples of individuals with SCI compared to general and community samples. Transdiagnostic symptoms did not appear to spuriously inflate depression severity estimates when administered to people with SCI.

Journal Article

Share this book

Add to My Shelf

A comparison of children and adolescent's self-report and parental report of the PedsQL among those with and without autism spectrum disorder

by Kornienko, L. , Koot, H. M. , Stokes, M. A. in Adolescent , Adult , Autism

2017

Purpose Children and adolescents with autism spectrum disorders (ASD) are understood to experience a reduced quality of life compared to typically developing (TD) peers. The evidence to support this has largely been derived from proxy reports, in turn which have been evaluated by Cronbach's alpha and interrater reliability, neither of which demonstrate unidimensionality of scales, or that raters use the instruments consistently. To redress this, we undertook an evaluation of the Pediatrie Quality of Life Inventory™ (PedsQL), a widely used measure of children's quality of life. Three questions were explored: (1). do TD children or adolescents and their parents use the PedsQL differently; (2). do children or adolescents with ASD and their parents use the PedsQL differently, and (3). do children or adolescents with ASD and TD children or adolescents use the PedsQL differently? By using the scales differently, we mean whether respondents endorse items differently contingent by group. Methods We recruited 229 children and adolescents with ASD who had an IQ greater than 70, and one of their parents, as well as 74 TD children or adolescents and one of their parents. Children and adolescents with ASD (aged 6-20 years) were recruited from special primary and secondary schools in the Amsterdam region. Children and adolescents were included based on an independent clinical diagnosis established prior to recruitment according to DSM-IV-TR criteria by psychiatrists and/or psychologists, qualified to make the diagnosis. Children or adolescents and parents completed their respective version of the PedsQL. Results Data were analysed for unidimensionality and for differential item functioning (DIF) across respondent for TD children and adolescents and their parents, for children and adolescents with ASD and their parents, and then last, children and adolescents with ASD were compared to TD children and adolescents for DIF. Following recoding the data, the unidimensional model was found to fit all groups. We found that parents of and TD children and adolescents do not use the PedsQL differently $\\left( {\\chi _{(46)}^2 = 64.86,p = ns} \\right)$, consistent with the literature that children and adolescents with ASD and TD children and adolescents use the PedsQL similarly $\\left( {\\chi _{(69)}^2 = 92.22,p = ns} \\right)$, though their score levels may differ. However, children and adolescents with ASD and their parents respond to the PedsQL differently $\\left( {\\chi _{(115)}^2 = 190.22,p < 0.001} \\right)$ and contingently upon features of the child or adolescent. Conclusions We suggest this is due to children or adolescents with ASD being less forthcoming with their parents about their lives. This, however, will require additional research to confirm. Consequently, we conclude that parents of high-functioning children with ASD are unable to act as reliable proxies for their children with ASD.

Journal Article

Share this book

Add to My Shelf

Bias Geographic Location of Math National Examination in Junior High School: Analysis of Differential Item Functioning (DIF)

by Kartowagiran, Badrun , Retnawati, Heri , Sainuddin, Syamsir in Access to education , Assessment centers , Bias

2023

Diversity is a hot issue discussed in the world of education. With its diversity, Indonesia has great potential to study such things as the geographical diversity equalization factors that affect the quality of education. Implementation of national examinations (NE) as a benchmark and standards from primary to secondary education have a different condition for each location, such as Daerah Istimewa Yogyakarta (DIY) representing the West and Nusa Tenggara Timur (NTT) representing the western region of Indonesia. The focus of this research is to find out how much difference the ability of junior high school students in Indonesia in terms of geographical location. This study uses five DIF detection methods for mathematics NE 2013/2014 school year to analyze students' different abilities. The analysis results show that, in general, the NE questions for the 2013/2014 academic year benefit the focal group / NTT on Algebra, Geometry, and Statistics/Probability material, although with lower ability compared to group reference /DIY. With the analysis carried out, policymakers can take corrective steps to focus more on fixing problems, facilities, and resources teacher power so that problem inequality from aspect geographical there is no future again in Indonesia.

Journal Article

Share this book

Add to My Shelf

Examining differential responses of youth with and without autism on a measure of everyday activity performance

by Liljenquist, Kendra , Coster, Wendy J. , Ni, Pengsheng in Activities of Daily Living - psychology , Autism , Autistic children

2015

Purpose This study further investigated items with differential item function (DIF) in the Social/Cognitive domain of a measure of everyday activity performance, the Pediatrie Evaluation of Disability Inventory-Computer Adapted Test version for Autism \"PEDI-CAT (ASD),\" to understand possible sources of response variation in a heterogeneous sample of youth with autism compared to the national standardization sample. Methods Cross-sectional design. A convenience sample of parents who identified they had a child between 3 and 21 years (M = 11.9 years, SD = 4.67 years) with autism (n = 365) completed an online survey that included the PEDI-CAT (ASD) and descriptive measures. For 28 items previously identified as having DIF, the PEDI-CAT (ASD) expected item score curves for the autism sample were compared to the original PEDI-CAT standardization sample. The weighted area between expected score curves (wABC) was also calculated; values >0.24 indicate significant DIF. Results All items had wABC that exceeded the criterion. Compared with peers without disabilities at the same ability level, 11 items were significantly more difficult for the youth with autism and 16 items were significantly easier. One item demonstrated non-uniform DIF. Conclusion Differential responses could indicate that: (1) children with autism have a different developmental pattern of skill acquisition for everyday activities in the Social/Cognitive domain, or (2) parents of children with autism utilize a unique appraisal process when assessing their children's functional performance of everyday activities. Further research is required to better understand the factors leading to differential responses on the targeted items. The study illustrates the value of in-depth analysis of DIF to gain insight into the impact of a clinical condition on functional performance.

Journal Article

Share this book

Add to My Shelf

Application of a method of estimating DIF for polytomous test items

by P. Congdon , G. Camilli in Differential item function (DIF) , Item bias , Item response theory

1999

In this paper, a method for studying DIF is demonstrated that can be used with either dichotomous or polytomous items. The method is shown to be valid for data that follow a partial credit IRT model. It is also shown that logistic regression gives results equivalent to those of the proposed method. In a simulation study, positively biased type 1 error rates of the method are shown to be in accord with results from previous studies; however, the size of the bias in the log odds is moderate. Finally it is demonstrated how these statistics can be used to study DIF variability with the method of Longford, Holland, and Thayer (1993).

Journal Article

Share this book

Add to My Shelf

Examining Differential Item Functions of Different Item Ordered Test Forms According to Item Difficulty Levels

by Gül, Emrah , Dogan-Gül, Çilem , Çokluk, Ömay in Ability , Achievement Tests , Correlation

2016

The study aims to examine whether differential item function is displayed in three different test forms that have item orders of random and sequential versions (easy-to-hard and hard-to-easy), based on Classical Test Theory (CTT) and Item Response Theory (IRT) methods and bearing item difficulty levels in mind. In the correlational research, the data from a total of 578 seventh graders were gathered using an Atomic Structures Achievement Test. R programming language and \"difR\" package were employed for all the analyses. As a result of the analyses, it was concluded that a comparison of IRT- and CTT-based methods indicate a greater number of items with distinctively significant differential item functioning. Different item ordering leads students at the same ability levels to display different performances on the same items. As a result, it is found that item order differentiates the probability of correct response to the items for those at the same ability levels. A test form of sequential easy-to-hard questions brings more advantages than that of a hard-to-easy sequence or a random version. The findings show that it is essential to arrange tests that are employed to make decisions about people in consideration with psychometric principles.

Journal Article

Share this book

Add to My Shelf

Stability of Differential Item Functioning Over a Single Population in Survey Data

by Dodeen, Hamzeh in Attitude Measures , Attitude surveys , Attitudes

2004

This study investigates the stability of differential item functioning (DIF) in survey data. Surveys are conducted periodically, and their results are often reported by aggregating responses. Estimating the stability of DIF across subsets of a survey population can be an important indicator in determining the likelihood of DIF stability over different populations. Data from four previously administered surveys were used in the analysis of the stability of gender-related DIF. The surveys were the Family Survey, Economic Expectations and Attitudes, Ministry With Young Adults Initiative, and Attitudes Toward the Environment Survey. Two samples of 500 participants each were randomly selected and used from each survey. The Mantel-Haenszel and the logistic regression procedures were used separately to detect DIF. Results showed that DIF in survey data is highly stable over subsamples from a single population.

Journal Article

Share this book

Add to My Shelf

Score Equivalence is at the Heart of International Measures of Physical Activity

by Zhu, Weimo in Academic Achievement , Basic Skills , Bias

2000

Zhu comments on an article by Booth (2000) that discussed the importance and need to develop international measures of physical activity (PA). Zhu focuses on the questionnaire approach, but many measurement concepts and statistical methods described are applicable to other measurement approaches.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter