Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,107
result(s) for
"Test Length"
Sort by:
Making sense of Cronbach's alpha
by
Tavakol, Mohsen
,
Dennick, Reg
in
Attitude Measures
,
Data Interpretation, Statistical
,
Error of Measurement
2011
[...] a more rigorous view of alpha is that it cannot simply be interpreted as an index for the internal consistency of a test. 5, 15, 17 Factor Analysis can be used to identify the dimensions of a test.18 Other reliable techniques have been used and we encourage the reader to consult the paper \"Applied Dimensionality and Test Structure Assessment with the STARTM Mathematics Test\" and to compare methods for assessing the dimensionality and underlying structure of a test.19 Alpha, therefore, does not simply measure the unidimensionality of a set of items, but can be used to confirm whether or not a sample of items is actually unidimensional. 5 On the other hand if a test has more than one concept or construct, it may not make sense to report alpha for the test as a whole as the larger number of questions will inevitable inflate the value of alpha. In principle therefore, alpha should be calculated for each of the concepts rather than for the entire test or scale.\\n More importantly, alpha is grounded in the 'tau equivalent model' which assumes that each test item measures the same latent trait on the same scale. [...] if multiple factors/traits underlie the items on a scale, as revealed by Factor Analysis, this assumption is violated and alpha underestimates the reliability of the test.17 If the number of test items is too small it will also violate the assumption of tau-equivalence and will underestimate reliability.20 When test items meet the assumptions of the tau-equivalent model, alpha approaches a better estimate of reliability.
Journal Article
Evaluation of the Accuracy of Different Apex Locaters with 45° Apical Root Resorption
by
I Al-Jobory, Ahmed
,
Jabbar Atyiah, Mohammed
,
Mohammed Ayad Taha
in
45° Root Resorption
,
Apex Locater Devices
,
Electronic Working Length Test
2024
Aim: This study was aimed to calculate the working length (WL) of permanent teeth with simulated 45° resorption of the apical part of the root using four electric apex locaters (EALs): NSK, Woodpecker III, Woodpecker V, and Eighteeth. Methods: Twenty maxillary anterior single-rooted teeth were removed. Following tooth preparation to offer access to the root canal and replicate the 45° apical root resorption, each tooth underwent a microscope-assisted working length determination process. The four apex locaters measured each individual tooth and calculated electronic working lengths. Results: No discernible change existed between all four apex locater devices. Conclusion: NSK, Woodpecker III, Woodpecker V, and Eighteeth provide virtually the same measurement for 45 apical root resorption in single rooted teeth.
Journal Article
Proof of Reliability Convergence to 1 at Rate of Spearman–Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality
2024
It is shown that the psychometric test reliability, based on any true-score model with randomly sampled items and uncorrelated errors, converges to 1 as the test length goes to infinity, with probability 1, assuming some general regularity conditions. The asymptotic rate of convergence is given by the Spearman–Brown formula, and for this it is not needed that the items are parallel, or latent unidimensional, or even finite dimensional. Simulations with the 2-parameter logistic item response theory model reveal that the reliability of short multidimensional tests can be positively biased, meaning that applying the Spearman–Brown formula in these cases would lead to overprediction of the reliability that results from lengthening a test. However, test constructors of short tests generally aim for short tests that measure just one attribute, so that the bias problem may have little practical relevance. For short unidimensional tests under the 2-parameter logistic model reliability is almost unbiased, meaning that application of the Spearman–Brown formula in these cases of greater practical utility leads to predictions that are approximately unbiased.
Journal Article
The effect of item pool and selection algorithms on computerized classification testing (CCT) performance
2022
The purpose of this research was to evaluate the effect of item pool and selection algorithms on computerized classification testing (CCT) performance in terms of some classification evaluation metrics. For this purpose, 1000 examinees’ response patterns using the R package were generated and eight item pools with 150, 300, 450, and 600 items having different distributions were formed. A total of 100 iterations were performed for each research condition. The results indicated that average classification accuracy (ACA) was partially lower, but average test length (ATL) was higher in item pools having a broad distribution. It was determined that the observed differences were more apparent in the item pool with 150 items, and that item selection methods gave similar results in terms of ACA and ATL. The Sympson-Hetter method indicated advantages in terms of test efficiency, while the item eligibility method offered an improvement in terms of item exposure control. The modified multinomial model, on the other hand, was more effective in terms of content balancing.
Journal Article
Optimizing a national examination for medical undergraduates via modern automated test assembly approaches
2024
Background
Automated test assembly (ATA) represents a modern methodology that employs data science optimization on computer platforms to automatically create test form, thereby significantly improving the efficiency and accuracy of test assembly procedures. In the realm of medical education, large-scale high-stakes assessments often necessitate lengthy tests, leading to elevated costs in various dimensions (such as examinee fatigue and expenses associated with item development). This study aims to augment the design of the medical education assessments by leveraging modern ATA approaches.
Methods
To achieve the objective, a four-step process employing psychometric methodologies was used to calibrate and analyze the item pool of the Standardized Competence Test for Clinical Medicine Undergraduates (SCTCMU), a nationwide summative test comprising 300 multiple-choice questions (MCQ) in China. Subsequently, two modern ATA approaches were employed to determine the optimal item combination, accounting for both statistical and content requirements specified in the test blueprint. The qualities of the assembled test form, generated using modern ATA approaches, underwent meticulous evaluation.
Results
Through an exploration of the psychometric properties of the SCTCMU as a foundational step, the evaluation revealed commendable quality in the item properties. Furthermore, the evaluation of the quality of assembled test form using modern ATA approaches indicated the ability to ascertain the optimal test length within the predefined measurement precision. Specifically, this investigation demonstrates that the application of modern ATA approaches can substantially reduce the test length of assembled test form, while simultaneously maintaining the required statistical and content standards specified in the test blueprint.
Conclusions
This study harnessed modern ATA approaches to facilitate the automatic construction of test form, thereby significantly enhancing the efficiency and precision of test assembly procedures. The utilization of modern ATA approaches offers medical educators a valuable tool to enhance the efficiency and cost-effectiveness of medical education assessment.
Journal Article
Closed formula of test length required for adaptive testing with medium probability of solution
by
Széll, Krisztián
,
Takács, Szabolcs
,
T. Kárász, Judit
in
Academic Achievement
,
Accuracy
,
Adaptive Testing
2023
Purpose
Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in the case of calibrated items, determine the required test length given number of respondents.
Design/methodology/approach
Empirical results have been obtained on computerized or multistage adaptive implementation. Simulation studies and classroom/experimental results show that adaptive tests can measure test subjects’ ability to the same quality over half the test length compared to linear versions. Due to the complexity of the problem, the authors discuss a closed mathematical formula: the relationship between the length of the tests, the difficulty of solving the items, the number of respondents and the levels of ability.
Findings
The authors present a closed formula that provides a lower bound for the minimum test length in the case of adaptive tests. The authors also present example calculations using the formula, based on the assessment framework of some student assessments to show the similarity between the theoretical calculations and the empirical results.
Originality/value
With this formula, we can form a connection between theoretical and simulation results.
Journal Article
Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task
by
Groenen, Patrick J. F.
,
de Groot, Kristel
,
Sijtsma, Klaas
in
Assessment
,
Bayes Theorem
,
Bayesian analysis
2024
In psychophysiology, an interesting question is how to estimate the reliability of event-related potentials collected by means of the Eriksen Flanker Task or similar tests. A special problem presents itself if the data represent neurological reactions that are associated with some responses (in case of the Flanker Task, responding incorrectly on a trial) but not others (like when providing a correct response), inherently resulting in unequal numbers of observations per subject. The general trend in reliability research here is to use generalizability theory and Bayesian estimation. We show that a new approach based on classical test theory and frequentist estimation can do the job as well and in a simpler way, and even provides additional insight to matters that were unsolved in the generalizability method approach. One of our contributions is the definition of a single, overall reliability coefficient for an entire group of subjects with unequal numbers of observations. Both methods have slightly different objectives. We argue in favor of the classical approach but without rejecting the generalizability approach.
Journal Article
A Bayesian-Inspired Item Response Theory–Based Framework to Produce Very Short Versions of MacArthur–Bates Communicative Development Inventories
by
Chai, Jun Ho
,
Lo, Chang Huan
,
Mayor, Julien
in
Adaptive Testing
,
American English
,
Bayesian analysis
2020
Purpose: This study introduces a framework to produce very short versions of the MacArthur-Bates Communicative Development Inventories (CDIs) by combining the Bayesian-inspired approach introduced by Mayor and Mani (2019) with an item response theory-based computerized adaptive testing that adapts to the ability of each child, in line with Makransky et al. (2016). Method: We evaluated the performance of our approach--dynamically selecting maximally informative words from the CDI and combining parental response with prior vocabulary data--by conducting real-data simulations using four CDI versions having varying sample sizes on Wordbank--the online repository of digitalized CDIs: American English (a very large data set), Danish (a large data set), Beijing Mandarin (a medium-sized data set), and Italian (a small data set). Results: Real-data simulations revealed that correlations exceeding 0.95 with full CDI administrations were reached with as few as 15 test items, with high levels of reliability, even when languages (e.g., Italian) possessed few digitalized administrations on Wordbank. Conclusions: The current approach establishes a generic framework that produces very short (less than 20 items) adaptive early vocabulary assessments--hence considerably reducing their administration time. This approach appears to be robust even when CDIs have smaller samples in online repositories, for example, with around 50 samples per month-age.
Journal Article
Variational Estimation for Multidimensional Generalized Partial Credit Model
2024
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.
Journal Article