Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
38,757 result(s) for "Test Theory"
Sort by:
TechCheck: Development and Validation of an Unplugged Assessment of Computational Thinking in Early Childhood Education
There is a need for developmentally appropriate Computational Thinking (CT) assessments that can be implemented in early childhood classrooms. We developed a new instrument called TechCheck for assessing CT skills in young children that does not require prior knowledge of computer programming. TechCheck is based on developmentally appropriate CT concepts and uses a multiple-choice “unplugged” format that allows it to be administered to whole classes or online settings in under 15 min. This design allows assessment of a broad range of abilities and avoids conflating coding with CT skills. We validated the instrument in a cohort of 5–9-year-old students ( N  = 768) participating in a research study involving a robotics coding curriculum. TechCheck showed good reliability and validity according to measures of classical test theory and item response theory. Discrimination between skill levels was adequate. Difficulty was suitable for first graders and low for second graders. The instrument showed differences in performance related to race/ethnicity. TechCheck scores correlated moderately with a previously validated CT assessment tool ( TACTIC-KIBO ). Overall, TechCheck has good psychometric properties, is easy to administer and score, and discriminates between children of different CT abilities. Implications, limitations, and directions for future work are discussed.
Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment
The sum score on a psychological test is, and should continue to be, a tool central in psychometric practice. This position runs counter to several psychometricians’ belief that the sum score represents a pre-scientific conception that must be abandoned from psychometrics in favor of latent variables. First, we reiterate that the sum score stochastically orders the latent variable in a wide variety of much-used item response models. In fact, item response theory provides a mathematically based justification for the ordinal use of the sum score. Second, because discussions about the sum score often involve its reliability and estimation methods as well, we show that, based on very general assumptions, classical test theory provides a family of lower bounds several of which are close to the true reliability under reasonable conditions. Finally, we argue that eventually sum scores derive their value from the degree to which they enable predicting practically relevant events and behaviors. None of our discussion is meant to discredit modern measurement models; they have their own merits unattainable for classical test theory, but the latter model provides impressive contributions to psychometrics based on very few assumptions that seem to have become obscured in the past few decades. Their generality and practical usefulness add to the accomplishments of more recent approaches.
Reliability, Population Classification and Weighting in Multidimensional Poverty Measurement
In poverty measurement, differential weighting aims to take into account the unequal importance of the diverse dimensions and aspects of poverty and to add valuable information that improves the classification of the poor and the not-poor. This practice, however, is in contention with both classical test theory and modern measurement theories, which state that high reliability is a necessary condition for consistent population classification, while differential weighting is not so. The literature needs a clear numerical illustration of the relationship between high/low reliability and good/poor population classification to dissolve this tension and assist applied researchers in the assessment of multidimensional poverty indexes, using different reliability statistics. This paper uses a Monte Carlo study based on factor mixture models to draw up a series of uni-and multidimensional poverty measures with different reliabilities and predefined groups. The article shows that low reliability results in a high proportion of the poor group erroneously classified as part of the not poor group. Therefore, reliability inspections should be a systematic practice in poverty measurement. The article provides guidelines for interpreting the effects of unreliability upon adequate population classification and suggest that the classification error of current unreliable multidimensional indexes is above 10%.
Part II: On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha: Discussing Lower Bounds and Correlated Errors
Prior to discussing and challenging two criticisms on coefficient α , the well-known lower bound to test-score reliability, we discuss classical test theory and the theory of coefficient α . The first criticism expressed in the psychometrics literature is that coefficient α is only useful when the model of essential τ -equivalence is consistent with the item-score data. Because this model is highly restrictive, coefficient α is smaller than test-score reliability and one should not use it. We argue that lower bounds are useful when they assess product quality features, such as a test-score’s reliability. The second criticism expressed is that coefficient α incorrectly ignores correlated errors. If correlated errors would enter the computation of coefficient α , theoretical values of coefficient α could be greater than the test-score reliability. Because quality measures that are systematically too high are undesirable, critics dismiss coefficient α . We argue that introducing correlated errors is inconsistent with the derivation of the lower bound theorem and that the properties of coefficient α remain intact when data contain correlated errors.
Classical Test Theory
Classical test theory (CTT) comprises a set of concepts and methods that provide a basis for many of the measurement tools currently used in health research. The assumptions and concepts underlying CTT are discussed. These include item and scale characteristics that derive from CTT as well as types of reliability and validity. Procedures commonly used in the development of scales under CTT are summarized, including factor analysis and the creation of scale scores. The advantages and disadvantages of CTT, its use across populations, and its continued use in the face of more recent measurement models are also discussed.
Are Sum Scores a Great Accomplishment of Psychometrics or Intuitive Test Theory?
Sijtsma, Ellis, and Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) provide a thoughtful treatment in Psychometrika of the value and properties of sum scores and classical test theory at a depth at which few practicing psychometricians are familiar. In this note, I offer comments on their article from the perspective of evidentiary reasoning.
Using Rasch Analysis to Inform Rating Scale Development
The use of surveys, questionnaires, and rating scales to measure important outcomes in higher education is pervasive, but reliability and validity information is often based on problematic Classical Test Theory approaches. Rasch Analysis, based on Item Response Theory, provides a better alternative for examining the psychometric quality of rating scales and informing scale improvements. This paper outlines a six-step process for using Rasch Analysis to review the psychometric properties of a rating scale. The Partial Credit Model and Andrich Rating Scale Model will be described in terms of the pyschometric information (i.e., reliability, validity, and item difficulty) and diagnostic indices generated. Further, this approach will be illustrated through the example of authentic data from a university-wide student evaluation of teaching.
Gender fairness within the Force Concept Inventory
Research on the test structure of the Force Concept Inventory (FCI) has largely ignored gender, and research on FCI gender effects (often reported as \"gender gaps\") has seldom interrogated the structure of the test. These rarely crossed streams of research leave open the possibility that the FCI may not be structurally valid across genders, particularly since many reported results come from calculus-based courses where 75% or more of the students are men. We examine the FCI considering both psychometrics and gender disaggregation (while acknowledging this as a binary simplification), and find several problematic questions whose removal decreases the apparent gender gap. We analyze three samples (total N[subscript pre] = 5391 , N[subscript post] = 5769) looking for gender asymmetries using classical test theory, item response theory, and differential item functioning. The combination of these methods highlights six items that appear substantially unfair to women and two items biased in favor of women. No single physical concept or prior experience unifies these questions, but they are broadly consistent with problematic items identified in previous research. Removing all significantly gender-unfair items halves the gender gap in the main sample in this study. We recommend that instructors using the FCI report the reduced-instrument score as well as the 30-item score, and that credit or other benefits to students not be assigned using the biased items.
Neither Cronbach’s Alpha nor McDonald’s Omega: A Commentary on Sijtsma and Pfadt
Sijtsma and Pfadt ( 2021 ) published a thought-provoking article on coefficient alpha. I make the following arguments against their work. 1) Kuder and Richardson (1937) deserve more credit for coefficient alpha than Cronbach ( 1951 ). 2) We should distinguish between the definition of reliability and its meaning. 3) We should be wary of overfitting in the use of FA reliability. 4) Our primary concern is to obtain accurate reliability estimates rather than conservative estimates. 5) Several reliability estimators, such as λ 2 , μ 2 , congeneric reliability and the Gilmer-Feldt coefficient are more accurate than coefficient alpha. 6) The name omega should not be used to refer to a specific reliability estimator.
The Attack of the Psychometricians
This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of “construct validity” as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.