Catalogue Search | MBRL

Using R for item response theory model applications

by Paek, Insu (Professor of measurement and statistics), author , Cole, Ki, author in Item response theory. , R (Computer program language)

Book

Share this book

Add to My Shelf

Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review

by Bulut, Okan , Zhang, Xinxin , Gierl, Mark J. in Accuracy , Achievement tests , Difficulty Level

2017

Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also called the distractors. Despite a vast body of literature on multiple-choice testing, the task of creating distractors has received much less attention. In this study, we provide an overview of what is known about developing distractors for multiple-choice items and evaluating their quality. Next, we synthesize the existing guidelines on how to use distractors and summarize earlier research on the optimal number of distractors and the optimal ordering of distractors. Finally, we use this comprehensive review to provide the most up-to-date recommendations regarding distractor development, analysis, and use, and in the process, we highlight important areas where further research is needed.

Journal Article

Share this book

Add to My Shelf

A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation

by Pereira, Daniela Marques , Gonçalves, Nuno , De Champlain, Andre in Automation , Cognitive models , Computer Assisted Testing

2023

Automatic Item Generation (AIG) refers to the process of using cognitive models to generate test items using computer modules. It is a new but rapidly evolving research area where cognitive and psychometric theory are combined into digital framework. However, assessment of the item quality, usability and validity of AIG relative to traditional item development methods lacks clarification. This paper takes a top-down strong theory approach to evaluate AIG in medical education. Two studies were conducted: Study I—participants with different levels of clinical knowledge and item writing experience developed medical test items both manually and through AIG. Both item types were compared in terms of quality and usability (efficiency and learnability) ; Study II—Automatically generated items were included in a summative exam in the content area of surgery. A psychometric analysis based on Item Response Theory inspected the validity and quality of the AIG-items. Items generated by AIG presented quality, evidences of validity and were adequate for testing student’s knowledge. The time spent developing the contents for item generation (cognitive models) and the number of items generated did not vary considering the participants' item writing experience or clinical knowledge. AIG produces numerous high-quality items in a fast, economical and easy to learn process, even for inexperienced and without clinical training item writers. Medical schools may benefit from a substantial improvement in cost-efficiency in developing test items by using AIG. Item writing flaws can be significantly reduced thanks to the application of AIG's models, thus generating test items capable of accurately gauging students' knowledge.

Journal Article

Share this book

Add to My Shelf

Parameters and Models of Item Response Theory (IRT): A Review of Literature

by Gyamfi, Abraham , Acquaye, Rosemary in Difficulty Level , Educational Assessment , Educational Technology

2023

Item response theory (IRT) has received much attention in validation of assessment instrument because it allows the estimation of students’ ability from any set of the items. Item response theory allows the difficulty and discrimination levels of each item on the test to be estimated. In the framework of IRT, item characteristics are independent of the sample and latent traits of the person are independent of the test on the account that the selected models perfectly fit the data. Therefore, scores that describe examinee performance are independent on test difficulty. The scores of the examinee may be lower on a difficult test and higher on easier tests, but the ability level of the examinee remains the same over any test at the time of testing. The IRT model allows the estimation of item parameters. The line of difference between the models and parameters of IRT is not clear to many students in assessment. This paper reviews the parameters that are estimated using IRT and the models available in IRT. Also, the paper highlights the difference between the parameters and models and the various models under each set of data. Various literatures on IRT relating to the parameters and models of IRT are reviewed. There are four parameters estimated with IRT but the models are not four. Again, the models of IRT depends on the type of data. Dichotomous data has four models for the four parameters. However, polytomous data has two parameters: item difficulty and item discrimination for the models.

Journal Article

Share this book

Add to My Shelf

Optimizing the Use of Response Times for Item Selection in Computerized Adaptive Testing

by Choe, Edison M. , Kern, Justin L. , Chang, Hua-Hua in Accuracy , Adaptive Testing , Computation

2018

Despite common operationalization, measurement efficiency of computerized adaptive testing should not only be assessed in terms of the number of items administered but also the time it takes to complete the test. To this end, a recent study introduced a novel item selection criterion that maximizes Fisher information per unit of expected response time (RT), which was shown to effectively reduce the average completion time for a fixed-length test with minimal decrease in the accuracy of ability estimation. As this method also resulted in extremely unbalanced exposure of items, however, a-stratification with b-blocking was recommended as a means for counterbalancing. Although exceptionally effective in this regard, it comes at substantial costs of attenuating the reduction in average testing time, increasing the variance of testing times, and further decreasing estimation accuracy. Therefore, this article investigated several alternative methods for item exposure control, of which the most promising was a simple modification of maximizing Fisher information per unit of centered expected RT. The key advantage of the proposed method is the flexibility in choosing a centering value according to a desired distribution of testing times and level of exposure control. Moreover, the centered expected RT can be exponentially weighted to calibrate the degree of measurement precision. The results of extensive simulations, with item pools and examinees that are both simulated and real, demonstrate that optimally chosen centering and weighting values can markedly reduce the mean and variance of both testing times and test overlap, all without much compromise in estimation accuracy.

Journal Article

Share this book

Add to My Shelf

Is it quality, is it redundancy, or is model inadequacy? Some strategies for judging the appropriateness of high-discrimination items

by Morales-Vives, Fabia , Ferrando, Pere J.

2023

When developing new questionnaires, it is traditionally assumed that the items should be as discriminative as possible, as if this was always indicative of their quality. However, in some cases these high discriminations may be masking some problems such as redundancies, shared residuals, biased distributions, or model limitations which may contribute to inflate the discrimination estimates. Therefore, the inspection of these indices may lead to erroneous decisions about which items to keep or eliminate. To illustrate this problem, two different scenarios with real data are described. The first focuses on a questionnaire that contains an item apparently highly discriminant, but redundant. The second focuses on a clinical questionnaire administered to a community sample, which gives place to highly right-skewed item response distributions and inflated discriminant indices, despite the items do not discriminate well among the majority of participants. We propose some strategies and checks to identify these situations, so that the items that are inappropriate may be identified and removed. Therefore, this article seeks to promote a critical attitude, which may involve going against routine stablished principles when they are not appropriate. Cuando se desarrollan nuevos cuestionarios, tradicionalmente se asume que los ítems deben ser lo más discriminativos posible, como si esto fuera siempre indicativo de su calidad. Pero en algunos casos estas discriminaciones elevadas pueden estar ocultando algunos problemas como redundancias, residuales compartidos, distribuciones sesgadas o limitaciones del modelo que pueden contribuir a inflar las estimaciones de la discriminación. Por lo tanto, la inspección de estos índices puede llevar a decisiones erróneas sobre qué ítems mantener o eliminar. Para ilustrar este problema, se describen dos escenarios diferentes con datos reales. El primero se centra en un cuestionario que contiene un ítem aparentemente muy discriminante, pero redundante. El segundo se centra en un cuestionario clínico administrado a una muestra comunitaria, lo que da lugar a distribuciones de respuesta de los ítems muy sesgadas y a índices de discriminación inflados, a pesar de que los ítems no discriminan bien entre la mayoría de los sujetos. Proponemos algunas estrategias y comprobaciones para identificar estas situaciones, para facilitar la identificación y eliminación de los ítems inapropiados. Por lo tanto, este artículo pretende promover una actitud crítica, que puede implicar ir en contra de los principios rutinarios establecidos cuando no son apropiados.

Journal Article

Share this book

Add to My Shelf

Polytomous explanatory item response models for item discrimination: Assessing negative-framing effects in social-emotional learning surveys

by Ulitzsch, Esther , Gilbert, Joshua B. , Domingue, Benjamin W. in Behavioral Science and Psychology , Child, Preschool , Cognitive Psychology

2025

Modeling item parameters as a function of item characteristics has a long history but has generally focused on models for item location. Explanatory item response models for item discrimination are available but rarely used. In this study, we extend existing approaches for modeling item discrimination from dichotomous to polytomous item responses. We illustrate our proposed approach with an application to four social-emotional learning surveys of preschool children to investigate how item discrimination depends on whether an item is positively or negatively framed. Negative framing predicts significantly lower item discrimination on two of the four surveys, and a plausibly causal estimate from a regression discontinuity analysis shows that negative framing reduces discrimination by about 30% on one survey. We conclude with a discussion of potential applications of explanatory models for item discrimination.

Journal Article

Share this book

Add to My Shelf

Program FACTOR at 10: Origins, development and future directions

by Lorenzo-Seva, Urbano , Ferrando, Pere in Factor Analysis, Statistical , Forecasting , Item response theory

2017

We aim to provide a conceptual view of the origins, development and future directions of FACTOR, a popular free program for fitting the factor analysis (FA) model. The study is organized into three parts. In the first part we discuss FACTOR in its initial period (2006-2012) as a traditional FA program with many new and cutting-edge features. The second part discusses the present period (2013-2016) in which FACTOR has developed into a comprehensive program embedded in the framework of structural equation modelling and item response theory. The third part discusses expected future developments. at present FACTOR has attained a degree of technical development comparable to commercial software, and offers options not available elsewhere. We discuss several shortcomings as well as points that require changes or improvements. We also discuss the functioning of FACTOR within its community of users.

Journal Article

Share this book

Add to My Shelf

Enhancing Assessment in a Mathematics General Education Course: The Development of a Standardized Item Bank

by Cajindos, Rizza , Cadorna, Edelyn , Taban, Joseph

2025

Journal Article

Share this book

Add to My Shelf

A bias-corrected RMSD item fit statistic

by Köhler, Carmen , Robitzsch, Alexander , Hartig, Johannes in Antwort , Bias , Bildungsforschung

2020

Testing whether items fit the assumptions of an item response theory model is an important step in evaluating a test. In the literature, numerous item fit statistics exist, many of which show severe limitations. The current study investigates the root mean squared deviation (RMSD) item fit statistic, which is used for evaluating item fit in various large-scale assessment studies. The three research questions of this study are (1) whether the empirical RMSD is an unbiased estimator of the population RMSD; (2) if this is not the case, whether this bias can be corrected; and (3) whether the test statistic provides an adequate significance test to detect misfitting items. Using simulation studies, it was found that the empirical RMSD is not an unbiased estimator of the population RMSD, and nonparametric bootstrapping falls short of entirely eliminating this bias. Using parametric bootstrapping, however, the RMSD can be used as a test statistic that outperforms the other approaches - infit and outfit, S1 X2 with respect to both Type I error rate and power. The empirical application showed that parametric bootstrapping of the RMSD results in rather conservative item fit decisions, which suggests more lenient cut-off criteria. (DIPF/Orig.).

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter