Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,806
result(s) for
"Adaptive Testing"
Sort by:
Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing
2019
In computerized adaptive testing (CAT), a variable-length stopping rule refers to ending item administration after a pre-specified measurement precision standard has been satisfied. The goal is to provide equal measurement precision for all examinees regardless of their true latent trait level. Several stopping rules have been proposed in unidimensional CAT, such as the minimum information rule or the maximum standard error rule. These rules have also been extended to multidimensional CAT and cognitive diagnostic CAT, and they all share the same idea of monitoring measurement error. Recently, Babcock and Weiss (J Comput Adapt Test
2012
.
https://doi.org/10.7333/1212-0101001
) proposed an “absolute change in theta” (CT) rule, which is useful when an item bank is exhaustive of good items for one or more ranges of the trait continuum. Choi, Grady and Dodd (Educ Psychol Meas 70:1–17,
2010
) also argued that a CAT should stop when the standard error does not change, implying that the item bank is likely exhausted. Although these stopping rules have been evaluated and compared in different simulation studies, the relationships among the various rules remain unclear, and therefore there lacks a clear guideline regarding when to use which rule. This paper presents analytic results to show the connections among various stopping rules within both unidimensional and multidimensional CAT. In particular, it is argued that the CT-rule alone can be unstable and it can end the test prematurely. However, the CT-rule can be a useful secondary rule to monitor the point of diminished returns. To further provide empirical evidence, three simulation studies are reported using both the 2PL model and the multidimensional graded response model.
Journal Article
Psychometrics Behind Computerized Adaptive Testing
2015
The paper provides a survey of 18 years’ progress that my colleagues, students (both former and current) and I made in a prominent research area in Psychometrics—Computerized Adaptive Testing (CAT). We start with a historical review of the establishment of a large sample foundation for CAT. It is worth noting that the asymptotic results were derived under the framework of Martingale Theory, a very theoretical perspective of Probability Theory, which may seem unrelated to educational and psychological testing. In addition, we address a number of issues that emerged from large scale implementation and show that how theoretical works can be helpful to solve the problems. Finally, we propose that CAT technology can be very useful to support individualized instruction on a mass scale. We show that even paper and pencil based tests can be made adaptive to support classroom teaching.
Journal Article
Online Computerized Adaptive Tests of Children's Vocabulary Development in English and Mexican Spanish
by
Dale, Philip S.
,
Mankewitz, Jessica
,
Kachergis, George
in
Adaptive Testing
,
American English
,
Analysis
2022
Purpose: Measuring the growth of young children's vocabulary is important for researchers seeking to understand language learning as well as for clinicians aiming to identify early deficits. The MacArthur-Bates Communicative Development Inventories (CDIs) are parent report instruments that offer a reliable and valid method for measuring early productive and receptive vocabulary across a number of languages. CDI forms typically include hundreds of words, however, and so the burden of completion is significant. We address this limitation by building on previous work using item response theory (IRT) models to create computer adaptive test (CAT) versions of the CDIs. We created CDI-CATs for both comprehension and production vocabulary, for both American English and Mexican Spanish. Method: Using a data set of 7,633 English-speaking children ages 12-36 months and 1,692 Spanish-speaking children ages 12-30 months, across three CDI forms (Words & Gestures, Words & Sentences, and CDI-III), we found that a 2-parameter logistic IRT model fits well for a majority of the 680 pooled vocabulary items. We conducted CAT simulations on this data set, assessing simulated tests of varying length (25-400 items). Results: Even very short CATs recovered participant abilities very well with little bias across ages. An empirical validation study with N = 204 children ages 15-36 months showed a correlation of r = 0.92 between language ability estimated from full CDI versus CDI-CAT forms. Conclusion: We provide our item bank along with fitted parameters and other details, offer recommendations for how to construct CDI-CATs in new languages, and suggest when this type of assessment may or may not be appropriate.
Journal Article
Enhancing the Efficiency of Confrontation Naming Assessment for Aphasia Using Computer Adaptive Testing
2019
Purpose: In this study, we investigated the agreement between the 175-item Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) and a 30-item computer adaptive PNT (PNT-CAT; Fergadiotis, Kellough, & Hula, 2015; Hula, Kellough, & Fergadiotis, 2015) created using item response theory (IRT) methods. Method: The full PNT and the PNT-CAT were administered to 47 participants with aphasia in counterbalanced order. Latent trait-naming ability estimates for the 2 PNT versions were analyzed in a Bayesian framework, and the agreement between them was evaluated using correlation and measures of constant, variable, and total error. We also evaluated the extent to which individual pairwise differences were credibly greater than 0 and whether the IRT measurement model provided an adequate indication of the precision of individual score estimates. Results; The agreement between the PNT and the PNT-CAT was strong, as indicated by high correlation (r = 0.95, 95% CI [0.92, 0.97]), negligible bias, and low variable and total error. The number of statistically robust pairwise score differences did not credibly exceed the Type I error rate, and the precision of individual score estimates was reasonably well predicted by the IRT model. Discussion: The strong agreement between the full PNT and the PNT-CAT suggests that the latter is a suitable measurement of anomia in group studies. The relatively robust estimates of score precision also suggest that the PNT-CAT can be useful for the clinical assessment of anomia in individual cases. Finally, the IRT methods used to construct the PNT-CAT provide a framework for additional development to further reduce measurement error.
Journal Article
Adjusted Residuals for Evaluating Conditional Independence in IRT Models for Multistage Adaptive Testing
by
van Rijn, Peter W.
,
Ali, Usama S.
,
Joo, Sean-Hwane
in
Adaptive Testing
,
Assessment
,
Behavioral Science and Psychology
2024
The key assumption of conditional independence of item responses given latent ability in item response theory (IRT) models is addressed for multistage adaptive testing (MST) designs. Routing decisions in MST designs can cause patterns in the data that are not accounted for by the IRT model. This phenomenon relates to quasi-independence in log-linear models for incomplete contingency tables and impacts certain types of statistical inference based on assumptions on observed and missing data. We demonstrate that generalized residuals for item pair frequencies under IRT models as discussed by Haberman and Sinharay (J Am Stat Assoc 108:1435–1444, 2013.
https://doi.org/10.1080/01621459.2013.835660
) are inappropriate for MST data without adjustments. The adjustments are dependent on the MST design, and can quickly become nontrivial as the complexity of the routing increases. However, the adjusted residuals are found to have satisfactory Type I errors in a simulation and illustrated by an application to real MST data from the Programme for International Student Assessment (PISA). Implications and suggestions for statistical inference with MST designs are discussed.
Journal Article
Computer adaptive testing, big data and algorithmic approaches to education
2017
This article critically considers the promise of computer adaptive testing (CAT) and digital data to provide better and quicker data that will improve the quality, efficiency and effectiveness of schooling. In particular, it uses the case of the Australian NAPLAN test that will become an online, adaptive test from 2016. The article argues that CATs are specific examples of technological ensembles which are producing, and working through, new subjectivities. In particular, CATs leverage opportunities for big data and algorithmic approaches to education that are symptomatic of what Deleuze saw as the shift from disciplinary to control institutions and societies.
Journal Article
Advances in CD-CAT: The General Nonparametric Item Selection Method
by
Chiu, Chia-Yi
,
Chang, Yuan-Pei
in
Adaptive Testing
,
Assessment
,
Behavioral Science and Psychology
2021
Computerized adaptive testing (CAT) is characterized by its high estimation efficiency and accuracy, in contrast to the traditional paper-and-pencil format. CAT specifically for cognitive diagnosis (CD-CAT) carries the same advantages and has been seen as a tool for advancing the use of cognitive diagnosis (CD) assessment for educational practice. A powerful item selection method is the key to the success of a CD-CAT program, and to date, various parametric item selection methods have been proposed and well-researched. However, these parametric methods all require large samples, to secure high-precision calibration of the items in the item bank. Thus, at present, implementation of parametric methods in small-scale educational settings, such as classroom, remains challenging. In response to this issue, Chang, Chiu, and Tsai (Appl Psychol Meas 43:543–561, 2019) proposed the nonparametric item selection (NPS) method that does not require parameter calibration and outperforms the parametric methods for settings with only small or no calibration samples. Nevertheless, the NPS method is not without limitations; extra assumptions are required to guarantee a consistent estimator of the attribute profiles when data conform to complex models. To remedy this shortcoming, the general nonparametric item selection (GNPS) method that incorporates the newly developed general NPC (GNPC) method (Chiu et al. in Psychometrika 83:355–375, 2018) as the classification vehicle is proposed in this study. The inclusion of the GNPC method in the GNPS method relaxes the assumptions imposed on the NPS method. As a result, the GNPS method can be used with any model or multiple models without abandoning the advantage of being a small-sample technique. The legitimacy of using the GNPS method in the CD-CAT system is supported by Theorem 1 proposed in the study. The efficiency and effectiveness of the GNPS method are confirmed by the simulation study that shows the outperformance of the GNPS method over the compared parametric methods when the calibration samples are small.
Journal Article
A Computerized Adaptive Approach to Measuring Faculty Assessment Literacy
by
Aksu Dünya, Beyza
,
Wind, Stefanie A.
,
Demir, Mehmet Can
in
Adaptive Testing
,
Assessment Literacy
,
Calibration
2025
The purpose of this study was to generate an item bank for assessing faculty members’ assessment literacy and to examine the applicability and feasibility of a Computerized Adaptive Test (CAT) approach to monitor assessment literacy among faculty members. In developing this assessment using a sequential mixed-methods research design, our goal was to create a simple, quick, and precise screening tool for assessing literacy in higher education. After defining the construct of assessment literacy within the higher education context, we developed the test blueprint and items, and subjected them to a series of expert reviews. Following a pilot administration to confirm feasibility, we conducted item parameter calibration using a representative sample of faculty members (n = 211) selected through a convenience sampling approach. We evaluated the items for evidence of adequate psychometric quality, including fit, targeting, and unidimensionality under the Rasch framework. We concluded that developing an adaptive test for measuring assessment literacy is possible even with a small item pool and a small calibration sample.
Plain language summary
Generating an item bank for assessing faculty members’ assessment literacy and to examine the applicability and feasibility of a computerized adaptive test approach
The primary aim was to develop a simple, quick, and precise screening tool for assessing assessment literacy in higher education. Despite having a small item pool and calibration sample, the study found that developing an adaptive test for measuring assessment literacy was feasible. The study finally addresses the practical implications of integrating adaptive assessment tools into assessment-related feedback systems, faculty development programs, and improving ongoing assessment literacy evaluations.
Journal Article
Generalizing computerized adaptive testing for problematic mobile phone use from Chinese adults to adolescents
The number of mobile phone users worldwide has increased in recent years. As people spend more time on their phones, negative effects such as problematic mobile phone use (PMPU) have become more pronounced. Many researchers have dedicated their efforts to developing the questionnaires and revising the tools to more accurately evaluate PMPU. Previous studies had demonstrated that CAT-PMPU for adults could significantly enhance measurement accuracy and efficiency. However, most of the items in this scenario was developed for adults, and there were notable differences between adults and adolescents, making some items potentially unsuitable for the latter. Thus, this study aimed to generalize the CAT-PMPU adult version to make it suitable for both adult and adolescent populations. A total of 740 Chinese adolescents and 980 Chinese adults participated in this study, completing online or paper-pencil questionnaires. Empirical data was then used to simulate CAT-PMPU, and the measurement efficiency, accuracy and reliability were compared between adults and adolescents in different stopping rules. The results showed that generalized CAT-PMPU in this study had promising measurement efficiency and accuracy, which was consistent with adult version. In conclusion, the CAT-PMPU developed in this study not only exhibited satisfactory test reliability but also emerged as a novel technical support for evaluation of PMPU in both adolescent and adult populations, which demonstrated the potential applicability in practice.
Journal Article
The feasibility of computerized adaptive testing of the national benchmark test: A simulation study
by
Ndlovu, Mdutshekelwa
,
Ayanwale, Musa Adekunle
in
Adaptive Testing
,
Algorithms
,
Computer Assisted Testing
2024
The COVID-19 pandemic has had a significant impact on high-stakes testing, including the National Benchmark Tests in South Africa. Current linear testing formats have been criticized for their limitations, leading to a shift towards Computerized Adaptive Testing [CAT]. Assessments with CAT are more precise and take less time. Evaluation of CAT programs requires simulation studies. To assess the feasibility of implementing CAT in NBTs, SimulCAT, a simulation tool, was utilized. The SimulCAT simulation involved creating 10,000 examinees with a normal distribution characterized by a mean of 0 and a standard deviation of 1. A pool of 500 test items was employed, and specific parameters were established for the item selection algorithm, CAT administration rules, item exposure control, and termination criteria. The termination criteria required a standard error of less than 0.35 to ensure accurate abilities estimation. The findings from the simulation study demonstrated that fixed-length tests provided higher testing precision without any systematic error, as indicated by measurement statistics like CBIAS, CMAE, and CRMSE. However, fixed-length tests exhibited a higher item exposure rate, which could be mitigated by selecting items with fewer dependencies on specific item parameters (a-parameters). On the other hand, variable-length tests demonstrated increased redundancy. Based on these results, CAT is recommended as an alternative approach for conducting NBTs due to its capability to accurately measure individual abilities and reduce the testing duration. For high-stakes assessments like the NBTs, fixed-length tests are preferred as they offer superior testing precision while minimizing item exposure rates.
Journal Article