Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
215
result(s) for
"Validity argument"
Sort by:
Application of validity theory and methodology to patient-reported outcome measures (PROMs): building an argument for validity
by
Osborne, Richard H.
,
Elsworth, Gerald R.
,
Hawkins, Melanie
in
Clinical outcomes
,
Health education
,
Health Literacy
2018
Background Data from subjective patient-reported outcome measures (PROMs) are now being used in the health sector to make or support decisions about individuals, groups and populations. Contemporary validity theorists define validity not as a statistical property of the test but as the extent to which empirical evidence supports the interpretation of test scores for an intended use. However, validity testing theory and methodology are rarely evident in the PROM validation literature. Application of this theory and methodology would provide structure for comprehensive validation planning to support improved PROM development and sound arguments for the validity of PROM score interpretation and use in each new context. Objective This paper proposes the application of contemporary validity theory and methodology to PROM validity testing. Illustrative example The validity testing principles will be applied to a hypothetical case study with a focus on the interpretation and use of scores from a translated PROM that measures health literacy (the Health Literacy Questionnaire or HLQ). Discussion Although robust psychometric properties of a PROM are a pre-condition to its use, a PROM's validity lies in the sound argument that a network of empirical evidence supports the intended interpretation and use of PROM scores for decision making in a particular context. The health sector is yet to apply contemporary theory and methodology to PROM development and validation. The theoretical and methodological processes in this paper are offered as an advancement of the theory and practice of PROM validity testing in the health sector.
Journal Article
Validation of educational assessments: a primer for simulation and beyond
2016
Background
Simulation plays a vital role in health professions assessment. This review provides a primer on assessment validation for educators and education researchers. We focus on simulation-based assessment of health professionals, but the principles apply broadly to other assessment approaches and topics.
Key principles
Validation refers to the process of collecting validity evidence to evaluate the appropriateness of the interpretations, uses, and decisions based on assessment results. Contemporary frameworks view validity as a hypothesis, and validity evidence is collected to support or refute the validity hypothesis (i.e., that the proposed interpretations and decisions are defensible). In validation, the educator or researcher defines the proposed interpretations and decisions, identifies and prioritizes the most questionable assumptions in making these interpretations and decisions (the “interpretation-use argument”), empirically tests those assumptions using existing or newly-collected evidence, and then summarizes the evidence as a coherent “validity argument.” A framework proposed by Messick identifies potential evidence sources: content, response process, internal structure, relationships with other variables, and consequences. Another framework proposed by Kane identifies key inferences in generating useful interpretations: scoring, generalization, extrapolation, and implications/decision. We propose an eight-step approach to validation that applies to either framework: Define the construct and proposed interpretation, make explicit the intended decision(s), define the interpretation-use argument and prioritize needed validity evidence, identify candidate instruments and/or create/adapt a new instrument, appraise existing evidence and collect new evidence as needed, keep track of practical issues, formulate the validity argument, and make a judgment: does the evidence support the intended use?
Conclusions
Rigorous validation first prioritizes and then empirically evaluates key assumptions in the interpretation and use of assessment scores. Validation science would be improved by more explicit articulation and prioritization of the interpretation-use argument, greater use of formal validation frameworks, and more evidence informing the consequences and implications of assessment.
Journal Article
Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence
by
Hatala, Rose
,
Cook, David A.
,
Brydges, Ryan
in
Clinical Competence
,
Education
,
Educational Measurement - standards
2015
In order to construct and evaluate the validity argument for the Objective Structured Assessment of Technical Skills (OSATS), based on Kane’s framework, we conducted a systematic review. We searched MEDLINE, EMBASE, CINAHL, PsycINFO, ERIC, Web of Science, Scopus, and selected reference lists through February 2013. Working in duplicate, we selected original research articles in any language evaluating the OSATS as an assessment tool for any health professional. We iteratively and collaboratively extracted validity evidence from included articles to construct and evaluate the validity argument for varied uses of the OSATS. Twenty-nine articles met the inclusion criteria, all focussed on surgical technical skills assessment. We identified three intended uses for the OSATS, namely formative feedback, high-stakes assessment and program evaluation. Following Kane’s framework, four inferences in the validity argument were examined (scoring, generalization, extrapolation, decision). For formative feedback and high-stakes assessment, there was reasonable evidence for scoring and extrapolation. However, for high-stakes assessment there was a dearth of evidence for generalization aside from inter-rater reliability data and an absence of evidence linking multi-station OSATS scores to performance in real clinical settings. For program evaluation, the OSATS validity argument was supported by reasonable generalization and extrapolation evidence. There was a complete lack of evidence regarding implications and decisions based on OSATS scores. In general, validity evidence supported the use of the OSATS for formative feedback. Research to provide support for decisions based on OSATS scores is required if the OSATS is to be used for higher-stakes decisions and program evaluation.
Journal Article
A Systematic Review of the Validity of Questionnaires in Second Language Research
2022
Questionnaires have been widely used in second language (L2) research. To examine the accuracy and trustworthiness of research that uses questionnaires, it is necessary to examine the validity of questionnaires before drawing conclusions or conducting further analysis based on the data collected. To determine the validity of questionnaires that have been investigated in previous L2 research, we adopted the argument-based validation framework to conduct a systematic review. Due to the extensive nature of the extant questionnaire-based research, only the most recent literature, that is, research in 2020, was included in this review. A total of 118 questionnaire-based L2 studies published in 2020 were identified, coded, and analyzed. The findings showed that the validity of the questionnaires in the studies was not satisfactory. In terms of the validity inferences for the questionnaires, we found that (1) the evaluation inference was not supported by psychometric evidence in 41.52% of the studies; (2) the generalization inference was not supported by statistical evidence in 44.07% of the studies; and (3) the explanation inference was not supported by any evidence in 65.25% of the studies, indicating the need for more rigorous validation procedures for questionnaire development and use in future research. We provide suggestions for the validation of questionnaires.
Journal Article
Argument-based validation of Chulalongkorn University Language Institute (CULI) test: a Rasch-based evidence investigation
2025
Chulalongkorn University Language Institute (CULI) test was developed as a local standardised test of English for professional and international communication. To ensure that the CULI test fulfils its intended purposes, this study employed Kane’s argument-based validation and Rasch measurement approaches to construct the validity argument for the CULI test. This study analysed score data from a single test administration involving 237 test-takers and used a dichotomous Rasch model for the data analysis in Winsteps. The Rasch results indicated appropriate psychometric properties of the CULI test. Overall, the Rasch-based evidence reasonably contributes to the validity argument for the CULI test. Specifically, it partly supports the plausibility of the claims that CULI test tasks and contents represent those in the intended target language use (TLU) domain, and CULI test scores accurately summarise test-takers’ performance and reflect test-takers’ performance consistency, the intended construct, and language performance levels in the TLU domain. Although the CULI test showed sound psychometric functioning, the Rasch results indicated the need for further investigation into certain correct and incorrect choices and the difficulty levels of writing items, which could provide valuable insights for optimising test quality. This study offers implications for future research on argument-based and Rasch-based validation.
Journal Article
Validity arguments for diagnostic assessment using automated writing evaluation
by
Lee, Jooyoung
,
Chapelle, Carol A
,
Cotos, Elena
in
Academic disciplines
,
Academic discourse
,
Academic writing
2015
Two examples demonstrate an argument-based approach to validation of diagnostic assessment using automated writing evaluation (AWE). Criterion ®, was developed by Educational Testing Service to analyze students' papers grammatically, providing sentence-level error feedback. An interpretive argument was developed for its use as part of the diagnostic assessment process in undergraduate university English for academic purposes (EAP) classes. The Intelligent Academic Discourse Evaluator (IADE) was developed for use in graduate EAP university classes, where the goal was to help students improve their discipline-specific writing. The validation for each was designed to support claims about the intended purposes of the assessments. The authors present the interpretive argument for each and show some of the data that have been gathered as backing for the respective validity arguments, which include the range of inferences that one would make in claiming validity of the interpretations, uses, and consequences of diagnostic AWE-based assessments. (Verlag, adapt.).
Journal Article
Argument-stretching: (slightly) invalid political arguments and their effects on public opinion
2024
To stretch an argument means to make a political argument that is slightly (but not glaringly) invalid. I add to existing research, which focuses on the analysis of facts and stark binary views of validity by introducing the concept of argument-stretching, which identifies subtle violations of the validity of arguments. Using this conceptual foundation, I outline an impression-formation theory to explain the impact of argument-stretching on public opinion. I suggest that people spontaneously form negative impressions of stretched arguments, and that they add these impressions to a cumulative tally of satisfaction with the argument. Finally, people translate the negative effect of argument-stretching on their account satisfaction into reduced support for the politician who stretched the argument and the policy justified by it. I confirm the hypothesized direct effects of argument-stretching on policy support and politician support in three experimental studies, and I also find evidence for the mediating effect of account satisfaction.
Journal Article
Validity argument for assessing L2 pragmatics in interaction using mixed methods
2015
This study investigates the validity of assessing L2 pragmatics in interaction using mixed methods, focusing on the evaluation inference. Open role-plays that are meaningful and relevant to the stakeholders in an English for Academic Purposes context were developed for classroom assessment. For meaningful score interpretations and accurate evaluations of interaction-involved pragmatic performances, interaction-sensitive data-driven rating criteria were developed, based on the qualitative analyses of examinees' role-play performances. The conversation analysis performed on the data revealed various pragmatic and interactional features indicative of differing levels of pragmatic competence in interaction. The FACETS analysis indicated that the role-plays stably differentiated between the varying degrees of the 102 examinees' pragmatic abilities. The raters showed internal consistency despite their differing degrees of severity. Stable fit statistics and distinct difficulties were reported within each of the interaction-sensitive rating criteria.The findings served as backing for the evaluation inference in the validity argument. Finally, implications of the findings in operationalizing interaction-involved language performances and developing rating criteria are discussed. (Verlag).
Journal Article
Constructing validity evidence from a pilot key-features assessment of clinical decision-making in cerebral palsy diagnosis: application of Kane’s validity framework to implementation evaluations
by
Boyd, RN
,
Scott, KM
,
Webb, AE
in
Allied Health Occupations Education
,
Care and treatment
,
Cerebral palsy
2023
Background
Physician decision-making skills training is a priority to improve adoption of the cerebral palsy (CP) clinical guideline and, through this, lower the age of CP diagnosis. Clinical guideline implementation aims to improve physician practice, but evaluating meaningful change is complex. Limitations in the validity evidence of evaluation instruments impact the evidence base. Validity frameworks, such as Kane’s, enable a targeted process to gather evidence for instrument scores, congruent to context and purpose. Yet, application of argument-based methodology to implementation validation is rare. Key-features examination methodology has established validity evidence supporting its use to measure decision-making skills, with potential to predict performance. We aimed to apply Kane’s framework to evaluate a pilot key-features examination on physician decision-making in early CP diagnosis.
Methods
Following Kane’s framework, we evaluated evidence across inferences of scoring, generalisation, extrapolation and implications in a study design describing the development and pilot of a CP diagnosis key-features examination for practising physicians. If found to be valid, we proposed to use the key-feature scores as an outcome measure of decision-making post education intervention to expedite CP diagnosis and to correlate with real-world performance data to predict physician practice.
Results
Supporting evidence for acceptance of scoring inferences was achieved through examination development with an expert group (
n
= 10) and pilot results (
n
= 10): (1) high internal consistency (0.82); (2) acceptable mean item-discrimination (0.34); and (3) acceptable reliability of examination scorers (95.2% congruence). Decreased physician acceptance of examination time (70%) was identified as a threat and prioritised in case reduction processes. Partial acceptance of generalisation, extrapolation and implications inferences were defensible with: (1) accumulated development evidence following established key-features methodology; (2) high pilot acceptance for authenticity (90%); and (3) plausibility of assumptions of score correlation with population register data.
Conclusions
Kane’s approach is beneficial for prioritising sources of validity evidence alongside the iterative development of a key-features examination in the CP field. The validity argument supports scoring assumptions and use of scores as an outcome measure of physician decision-making for CP guideline education implementation interventions. Scoring evidence provides the foundation to direct future studies exploring association of key-feature scores with real-world performance.
Journal Article
How do we go about investigating test fairness?
2010
Previous test fairness frameworks have greatly expanded the scope of fairness, but do not provide a means to fully integrate fairness investigations and set priorities. The article proposes an approach to guide practitioners on fairness research and practices. This approach treats fairness as an aspect of validity and conceptualizes it as comparable validity for all relevant groups. Anything that weakens fairness compromises the validity of a test. This conceptualization expands the scope and enriches the interpretations of fairness by drawing on well-defined validity theories while enhancing the meaning of validity by integrating fairness in a principled way. TOEFL® iBTTM is then used to illustrate how a fairness argument may be established and supported in a validity argument. The fairness argument consists of a series of rebuttals to the validity argument that would compromise the comparability of score-based interpretations and uses for relevant groups, and it provides a logical mechanism for identifying critical research areas and setting research priorities. This approach will hopefully inspire more investigations motivated by and built on a central fairness argument. It may also foster a deeper understanding and expanded explorations of actions based on test results and social consequences, as impartiality and justice of actions and comparability of test consequences are at the core of fairness. (Verlag, adapt.).
Journal Article