Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
5,107 result(s) for "Test Length"
Sort by:
Making sense of Cronbach's alpha
[...] a more rigorous view of alpha is that it cannot simply be interpreted as an index for the internal consistency of a test. 5, 15, 17 Factor Analysis can be used to identify the dimensions of a test.18 Other reliable techniques have been used and we encourage the reader to consult the paper \"Applied Dimensionality and Test Structure Assessment with the STARTM Mathematics Test\" and to compare methods for assessing the dimensionality and underlying structure of a test.19 Alpha, therefore, does not simply measure the unidimensionality of a set of items, but can be used to confirm whether or not a sample of items is actually unidimensional. 5 On the other hand if a test has more than one concept or construct, it may not make sense to report alpha for the test as a whole as the larger number of questions will inevitable inflate the value of alpha. In principle therefore, alpha should be calculated for each of the concepts rather than for the entire test or scale.\\n More importantly, alpha is grounded in the 'tau equivalent model' which assumes that each test item measures the same latent trait on the same scale. [...] if multiple factors/traits underlie the items on a scale, as revealed by Factor Analysis, this assumption is violated and alpha underestimates the reliability of the test.17 If the number of test items is too small it will also violate the assumption of tau-equivalence and will underestimate reliability.20 When test items meet the assumptions of the tau-equivalent model, alpha approaches a better estimate of reliability.
Evaluation of the Accuracy of Different Apex Locaters with 45° Apical Root Resorption
Aim: This study was aimed to calculate the working length (WL) of permanent teeth with simulated 45° resorption of the apical part of the root using four electric apex locaters (EALs): NSK, Woodpecker III, Woodpecker V, and Eighteeth. Methods: Twenty maxillary anterior single-rooted teeth were removed. Following tooth preparation to offer access to the root canal and replicate the 45° apical root resorption, each tooth underwent a microscope-assisted working length determination process. The four apex locaters measured each individual tooth and calculated electronic working lengths. Results: No discernible change existed between all four apex locater devices. Conclusion: NSK, Woodpecker III, Woodpecker V, and Eighteeth provide virtually the same measurement for 45 apical root resorption in single rooted teeth.
Proof of Reliability Convergence to 1 at Rate of Spearman–Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality
It is shown that the psychometric test reliability, based on any true-score model with randomly sampled items and uncorrelated errors, converges to 1 as the test length goes to infinity, with probability 1, assuming some general regularity conditions. The asymptotic rate of convergence is given by the Spearman–Brown formula, and for this it is not needed that the items are parallel, or latent unidimensional, or even finite dimensional. Simulations with the 2-parameter logistic item response theory model reveal that the reliability of short multidimensional tests can be positively biased, meaning that applying the Spearman–Brown formula in these cases would lead to overprediction of the reliability that results from lengthening a test. However, test constructors of short tests generally aim for short tests that measure just one attribute, so that the bias problem may have little practical relevance. For short unidimensional tests under the 2-parameter logistic model reliability is almost unbiased, meaning that application of the Spearman–Brown formula in these cases of greater practical utility leads to predictions that are approximately unbiased.
The effect of item pool and selection algorithms on computerized classification testing (CCT) performance
The purpose of this research was to evaluate the effect of item pool and selection algorithms on computerized classification testing (CCT) performance in terms of some classification evaluation metrics. For this purpose, 1000 examinees’ response patterns using the R package were generated and eight item pools with 150, 300, 450, and 600 items having different distributions were formed. A total of 100 iterations were performed for each research condition. The results indicated that average classification accuracy (ACA) was partially lower, but average test length (ATL) was higher in item pools having a broad distribution. It was determined that the observed differences were more apparent in the item pool with 150 items, and that item selection methods gave similar results in terms of ACA and ATL. The Sympson-Hetter method indicated advantages in terms of test efficiency, while the item eligibility method offered an improvement in terms of item exposure control. The modified multinomial model, on the other hand, was more effective in terms of content balancing.
Optimizing a national examination for medical undergraduates via modern automated test assembly approaches
Background Automated test assembly (ATA) represents a modern methodology that employs data science optimization on computer platforms to automatically create test form, thereby significantly improving the efficiency and accuracy of test assembly procedures. In the realm of medical education, large-scale high-stakes assessments often necessitate lengthy tests, leading to elevated costs in various dimensions (such as examinee fatigue and expenses associated with item development). This study aims to augment the design of the medical education assessments by leveraging modern ATA approaches. Methods To achieve the objective, a four-step process employing psychometric methodologies was used to calibrate and analyze the item pool of the Standardized Competence Test for Clinical Medicine Undergraduates (SCTCMU), a nationwide summative test comprising 300 multiple-choice questions (MCQ) in China. Subsequently, two modern ATA approaches were employed to determine the optimal item combination, accounting for both statistical and content requirements specified in the test blueprint. The qualities of the assembled test form, generated using modern ATA approaches, underwent meticulous evaluation. Results Through an exploration of the psychometric properties of the SCTCMU as a foundational step, the evaluation revealed commendable quality in the item properties. Furthermore, the evaluation of the quality of assembled test form using modern ATA approaches indicated the ability to ascertain the optimal test length within the predefined measurement precision. Specifically, this investigation demonstrates that the application of modern ATA approaches can substantially reduce the test length of assembled test form, while simultaneously maintaining the required statistical and content standards specified in the test blueprint. Conclusions This study harnessed modern ATA approaches to facilitate the automatic construction of test form, thereby significantly enhancing the efficiency and precision of test assembly procedures. The utilization of modern ATA approaches offers medical educators a valuable tool to enhance the efficiency and cost-effectiveness of medical education assessment.
Closed formula of test length required for adaptive testing with medium probability of solution
Purpose Based on the general formula, which depends on the length and difficulty of the test, the number of respondents and the number of ability levels, this study aims to provide a closed formula for the adaptive tests with medium difficulty (probability of solution is p = 1/2) to determine the accuracy of the parameters for each item and in the case of calibrated items, determine the required test length given number of respondents. Design/methodology/approach Empirical results have been obtained on computerized or multistage adaptive implementation. Simulation studies and classroom/experimental results show that adaptive tests can measure test subjects’ ability to the same quality over half the test length compared to linear versions. Due to the complexity of the problem, the authors discuss a closed mathematical formula: the relationship between the length of the tests, the difficulty of solving the items, the number of respondents and the levels of ability. Findings The authors present a closed formula that provides a lower bound for the minimum test length in the case of adaptive tests. The authors also present example calculations using the formula, based on the assessment framework of some student assessments to show the similarity between the theoretical calculations and the empirical results. Originality/value With this formula, we can form a connection between theoretical and simulation results.
Reliability Theory for Measurements with Variable Test Length, Illustrated with ERN and Pe Collected in the Flanker Task
In psychophysiology, an interesting question is how to estimate the reliability of event-related potentials collected by means of the Eriksen Flanker Task or similar tests. A special problem presents itself if the data represent neurological reactions that are associated with some responses (in case of the Flanker Task, responding incorrectly on a trial) but not others (like when providing a correct response), inherently resulting in unequal numbers of observations per subject. The general trend in reliability research here is to use generalizability theory and Bayesian estimation. We show that a new approach based on classical test theory and frequentist estimation can do the job as well and in a simpler way, and even provides additional insight to matters that were unsolved in the generalizability method approach. One of our contributions is the definition of a single, overall reliability coefficient for an entire group of subjects with unequal numbers of observations. Both methods have slightly different objectives. We argue in favor of the classical approach but without rejecting the generalizability approach.
A Bayesian-Inspired Item Response Theory–Based Framework to Produce Very Short Versions of MacArthur–Bates Communicative Development Inventories
Purpose: This study introduces a framework to produce very short versions of the MacArthur-Bates Communicative Development Inventories (CDIs) by combining the Bayesian-inspired approach introduced by Mayor and Mani (2019) with an item response theory-based computerized adaptive testing that adapts to the ability of each child, in line with Makransky et al. (2016). Method: We evaluated the performance of our approach--dynamically selecting maximally informative words from the CDI and combining parental response with prior vocabulary data--by conducting real-data simulations using four CDI versions having varying sample sizes on Wordbank--the online repository of digitalized CDIs: American English (a very large data set), Danish (a large data set), Beijing Mandarin (a medium-sized data set), and Italian (a small data set). Results: Real-data simulations revealed that correlations exceeding 0.95 with full CDI administrations were reached with as few as 15 test items, with high levels of reliability, even when languages (e.g., Italian) possessed few digitalized administrations on Wordbank. Conclusions: The current approach establishes a generic framework that produces very short (less than 20 items) adaptive early vocabulary assessments--hence considerably reducing their administration time. This approach appears to be robust even when CDIs have smaller samples in online repositories, for example, with around 50 samples per month-age.
Variational Estimation for Multidimensional Generalized Partial Credit Model
Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.