Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
387
result(s) for
"Hays, Ron D"
Sort by:
Overview of Classical Test Theory and Item Response Theory for the Quantitative Assessment of Items in Developing Patient-Reported Outcomes Measures
by
Jason Lundy, J.
,
Hays, Ron D.
,
Cappelleri, Joseph C.
in
classical test theory
,
content validity
,
Descriptive labeling
2014
The US Food and Drug Administration’s guidance for industry document on patient-reported outcomes (PRO) defines content validity as “the extent to which the instrument measures the concept of interest” (FDA, 2009, p. 12). According to Strauss and Smith (2009), construct validity \"is now generally viewed as a unifying form of validity for psychological measurements, subsuming both content and criterion validity” (p. 7). Hence, both qualitative and quantitative information are essential in evaluating the validity of measures.
We review classical test theory and item response theory (IRT) approaches to evaluating PRO measures, including frequency of responses to each category of the items in a multi-item scale, the distribution of scale scores, floor and ceiling effects, the relationship between item response options and the total score, and the extent to which hypothesized “difficulty” (severity) order of items is represented by observed responses.
If a researcher has few qualitative data and wants to get preliminary information about the content validity of the instrument, then descriptive assessments using classical test theory should be the first step. As the sample size grows during subsequent stages of instrument development, confidence in the numerical estimates from Rasch and other IRT models (as well as those of classical test theory) would also grow.
Classical test theory and IRT can be useful in providing a quantitative assessment of items and scales during the content-validity phase of PRO-measure development. Depending on the particular type of measure and the specific circumstances, the classical test theory and/or the IRT should be considered to help maximize the content validity of PRO measures.
Journal Article
Between-group minimally important change versus individual treatment responders
2021
Purpose
Estimates of the minimally important change (MIC) can be used to evaluate whether group-level differences are large enough to be important. But responders to treatment have been based upon group-level MIC thresholds, resulting in inaccurate classification of change over time. This article reviews options and provides suggestions about individual-level statistics to assess whether individuals have improved, stayed the same, or declined.
Methods
Review of MIC estimation and an example of misapplication of MIC group-level estimates to assess individual change. Secondary data analysis to show how perceptions about meaningful change can be used along with significance of individual change.
Results
MIC thresholds yield over-optimistic conclusions about responders to treatment because they classify those who have not changed as responders.
Conclusions
Future studies need to evaluate the significance of individual change using appropriate individual-level statistics such as the reliable change index or the equivalent coefficient of repeatability. Supplementing individual statistical significance with retrospective assessments of change is desirable.
Journal Article
Development of Physical and Mental Health Summary Scores from the Patient-Reported Outcomes Measurement Information System (PROMIS) Global Items
2009
Background The use of global health items permits an efficient way of gathering general perceptions of health. These items provide useful summary information about health and are predictive of health care utilization and subsequent mortality. Methods Analyses of 10 self-reported global health items obtained from an internet survey as part of the Patient-Reported Outcome Measurement Information System (PROMIS) project. We derived summary scores from the global health items. We estimated the associations of the summary scores with the EQ-5D index score and the PROMIS physical function, pain, fatigue, emotional distress, and social health domain scores. Results Exploratory and confirmatory factor analyses supported a two-factor model. Global physical health (GPH; 4 items on overall physical health, physical function, pain, and fatigue) and global mental health (GMH; 4 items on quality of life, mental health, satisfaction with social activities, and emotional problems) scales were created. The scales had internal consistency reliability coefficients of 0.81 and 0.86, respectively. GPH correlated more strongly with the EQ-5D than did GMH (r = 0.76 vs. 0.S9). GPH correlated most strongly with pain impact (r = —0.75) whereas GMH correlated most strongly with depressive symptoms (r = —0.71). Conclusions Two dimensions representing physical and mental health underlie the global health items in PROMIS. These global health scales can be used to efficiently summarize physical and mental health in patient-reported outcome studies.
Journal Article
Effects of Excluding Those Who Report Having “Syndomitis” or “Chekalism” on Data Quality: Longitudinal Health Survey of a Sample From Amazon’s Mechanical Turk
by
Qureshi, Nabeel
,
Edelen, Maria Orlando
,
Kapteyn, Arie
in
Anxiety
,
Back pain
,
Clinical assessment
2023
Researchers have implemented multiple approaches to increase data quality from existing web-based panels such as Amazon's Mechanical Turk (MTurk).
This study extends prior work by examining improvements in data quality and effects on mean estimates of health status by excluding respondents who endorse 1 or both of 2 fake health conditions (\"Syndomitis\" and \"Chekalism\").
Survey data were collected in 2021 at baseline and 3 months later from MTurk study participants, aged 18 years or older, with an internet protocol address in the United States, and who had completed a minimum of 500 previous MTurk \"human intelligence tasks.\" We included questions about demographic characteristics, health conditions (including the 2 fake conditions), and the Patient Reported Outcomes Measurement Information System (PROMIS)-29+2 (version 2.1) preference-based score survey. The 3-month follow-up survey was only administered to those who reported having back pain and did not endorse a fake condition at baseline.
In total, 15% (996/6832) of the sample endorsed at least 1 of the 2 fake conditions at baseline. Those who endorsed a fake condition at baseline were more likely to identify as male, non-White, younger, report more health conditions, and take longer to complete the survey than those who did not endorse a fake condition. They also had substantially lower internal consistency reliability on the PROMIS-29+2 scales than those who did not endorse a fake condition: physical function (0.69 vs 0.89), pain interference (0.80 vs 0.94), fatigue (0.80 vs 0.92), depression (0.78 vs 0.92), anxiety (0.78 vs 0.90), sleep disturbance (-0.27 vs 0.84), ability to participate in social roles and activities (0.77 vs 0.92), and cognitive function (0.65 vs 0.77). The lack of reliability of the sleep disturbance scale for those endorsing a fake condition was because it includes both positively and negatively worded items. Those who reported a fake condition reported significantly worse self-reported health scores (except for sleep disturbance) than those who did not endorse a fake condition. Excluding those who endorsed a fake condition improved the overall mean PROMIS-29+2 (version 2.1) T-scores by 1-2 points and the PROMIS preference-based score by 0.04. Although they did not endorse a fake condition at baseline, 6% (n=59) of them endorsed at least 1 of them on the 3-month survey and they had lower PROMIS-29+2 score internal consistency reliability and worse mean scores on the 3-month survey than those who did not report having a fake condition. Based on these results, we estimate that 25% (1708/6832) of the MTurk respondents provided careless or dishonest responses.
This study provides evidence that asking about fake health conditions can help to screen out respondents who may be dishonest or careless. We recommend this approach be used routinely in samples of members of MTurk.
Journal Article
ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research
2013
Purpose An essential aspect of patient-centered outcomes research (PCOR) and comparative effectiveness research (CER) is the integration of patient perspectives and experiences with clinical data to evaluate interventions. Thus, PCOR and CER require capturing patient-reported outcome (PRO) data appropriately to inform research, healthcare delivery, and policy. This initiative’s goal was to identify minimum standards for the design and selection of a PRO measure for use in PCOR and CER. Methods We performed a literature review to find existing guidelines for the selection of PRO measures. We also conducted an online survey of the International Society for Quality of Life Research (ISOQOL) membership to solicit input on PRO standards. A standard was designated as “recommended” when >50 % respondents endorsed it as “required as a minimum standard.” Results The literature review identified 387 articles. Survey response rate was 120 of 506 ISOQOL members. The respondents had an average of 15 years experience in PRO research, and 89 % felt competent or very competent providing feedback. Final recommendations for PRO measure standards included: documentation of the conceptual and measurement model; evidence for reliability, validity (content validity, construct validity, responsiveness); interpretability of scores; quality translation, and acceptable patient and investigator burden. Conclusion The development of these minimum measurement standards is intended to promote the appropriate use of PRO measures to inform PCOR and CER, which in turn can improve the effectiveness and efficiency of healthcare delivery. A next step is to expand these minimum standards to identify best practices for selecting decision-relevant PRO measures.
Journal Article
The Role of the Bifactor Model in Resolving Dimensionality Issues in Health Outcomes Measures
by
Morizot, Julien
,
Hays, Ron D.
,
Reise, Steven P.
in
Applied psychology
,
Data visualization
,
Dimensionality
2007
Objectives We propose the application of a bifactor model for exploring the dimensional structure of an item response matrix, and for handling multidimensionality. Background We argue that a bifactor analysis can complement traditional dimensionality investigations by: (a) providing an evaluation of the distortion that may occur when unidimensional models are fit to multidimensional data, (b) allowing researchers to examine the utility of forming subscales, and, (c) providing an alternative to nonhierarchical multidimensional models for scaling individual differences. Method To demonstrate our arguments, we use responses (N = 1,000 Medicaid recipients) to 16 items in the Consumer Assessment of Healthcare Providers and Systems (CAHPS©2.0) survey. Analyses Exploratory and confirmatory factor analytic and item response theory models (unidimensional, multidimensional, and bifactor) were estimated. Results CAHPS© items are consistent with both unidimensional and multidimensional solutions. However, the bifactor model revealed that the overwhelming majority of common variance was due to a general factor. After controlling for the general factor, subscales provided little measurement precision. Conclusion The bifactor model provides a valuable tool for exploring dimensionality related questions. In the Discussion, we describe contexts where a bifactor analysis is most productively used, and we contrast bifactor with multidimensional IRT models (MIRT). We also describe implications of bifactor models for IRT applications, and raise some limitations.
Journal Article
Cross-sectional validation of the PROMIS-Preference scoring system
2018
The PROMIS-Preference (PROPr) score is a recently developed summary score for the Patient-Reported Outcomes Measurement Information System (PROMIS). PROPr is a preference-based scoring system for seven PROMIS domains created using multiplicative multi-attribute utility theory. It serves as a generic, societal, preference-based summary scoring system of health-related quality of life. This manuscript evaluates construct validity of PROPr in two large samples from the US general population.
We utilized 2 online panel surveys, the PROPr Estimation Survey and the Profiles-Health Utilities Index (HUI) Survey. Both included the PROPr measure, patient demographic information, self-reported chronic conditions, and other preference-based summary scores: the EuroQol-5D (EQ-5D-5L) and HUI in the PROPr Estimation Survey and the HUI in the Profiles-HUI Survey. The HUI was scored as both the Mark 2 and the Mark 3. Known-groups validity was evaluated using age- and gender-stratified mean scores and health condition impact estimates. Condition impact estimates were created using ordinary least squares regression in which a summary score was regressed on age, gender, and a single health condition. The coefficient for the health condition is the estimated effect on the preference score of having a condition vs. not having it. Convergent validity was evaluated using Pearson correlations between PROPr and other summary scores.
The sample consisted of 983 respondents from the PROPr Estimation Survey and 3,000 from the Profiles-HUI survey. Age- and gender-stratified mean PROPr scores were lower than EQ-5D and HUI scores, with fewer subjects having scores corresponding to perfect health on the PROPr. In the PROPr Estimation survey, all 11 condition impact estimates were statistically significant using PROPr, 8 were statistically significant by the EQ-5D, 7 were statistically significant by HUI Mark 2, and 9 were statistically significant by HUI Mark 3. In the Profiles-HUI survey, all 21 condition impact estimates were statistically significant using summary scores from all three scoring systems. In these samples, the correlations between PROPr and the other summary measures ranged from 0.67 to 0.70.
These results provide evidence of construct validity for PROPr using samples from the US general population.
Journal Article
Correction: Cross-sectional validation of the PROMIS-Preference scoring system
2025
[This corrects the article DOI: 10.1371/journal.pone.0201093.].
Journal Article
US general population norms for telephone administration of the SF-36v2
2012
US general population norms for mail administration of the Medical Outcomes Study 36-Item Short Form Version 2 (SF-36v2) were established in 1998. This article reports SF-36v2 telephone-administered norms collected in 2005–2006 for adults aged 35–89 years.
The SF-36v2 was administered to 3,844 adults in the National Health Measurement Study (NHMS), a random-digit dial telephone survey. Scale scores and physical and mental component summary (PCS and MCS) scores were computed.
When compared with 1998 norms (mean=50.00, standard deviation [SD]=10.00), SF-36v2 scores for the 2005–2006 general population tended to be higher: physical functioning (mean=50.68, SD=14.48); role limitations due to physical health problems (mean=49.47, SD=14.71); bodily pain (mean=50.66, SD=16.28); general health perceptions (mean=50.10, SD=16.87); vitality (mean=53.71, SD=15.35); social functioning (mean=51.37, SD=13.93); role limitations due to emotional problems (mean=51.44, SD=13.93); mental health (mean=54.27, SD=13.28); PCS (mean=49.22, SD=15.13); MCS (mean=53.78, SD=13.14). PCS and MCS factor scoring coefficients were similar to those previously reported for the 1998 norms. SF-36v2 norms for telephone administration were created.
The higher scores for NHMS data are likely due to the effect of telephone administration. The 2005–2006 norms can be used as a reference to interpret scale and component summary scores for telephone-administered surveys with the SF-36v2.
Journal Article
Comparing Health Survey Data Cost and Quality Between Amazon’s Mechanical Turk and Ipsos’ KnowledgePanel: Observational Study
by
Herman, Patricia M
,
Hays, Ron D
,
Rodriguez, Anthony
in
Adult
,
Comparative analysis
,
Data Accuracy
2024
Researchers have many options for web-based survey data collection, ranging from access to curated probability-based panels, where individuals are selectively invited to join based on their membership in a representative population, to convenience panels, which are open for anyone to join. The mix of respondents available also varies greatly regarding representation of a population of interest and in motivation to provide thoughtful and accurate responses. Despite the additional dataset-building labor required of the researcher, convenience panels are much less expensive than probability-based panels. However, it is important to understand what may be given up regarding data quality for those cost savings.
This study examined the relative costs and data quality of fielding equivalent surveys on Amazon's Mechanical Turk (MTurk), a convenience panel, and KnowledgePanel, a nationally representative probability-based panel.
We administered the same survey measures to MTurk (in 2021) and KnowledgePanel (in 2022) members. We applied several recommended quality assurance steps to enhance the data quality achieved using MTurk. Ipsos, the owner of KnowledgePanel, followed their usual (industry standard) protocols. The survey was designed to support psychometric analyses and included >60 items from the Patient-Reported Outcomes Measurement Information System (PROMIS), demographics, and a list of health conditions. We used 2 fake conditions (\"syndomitis\" and \"chekalism\") to identify those more likely to be honest respondents. We examined the quality of each platform's data using several recommended metrics (eg, consistency, reliability, representativeness, missing data, and correlations) including and excluding those respondents who had endorsed a fake condition and examined the impact of weighting on representativeness.
We found that prescreening in the MTurk sample (removing those who endorsed a fake health condition) improved data quality but KnowledgePanel data quality generally remained superior. While MTurk's unweighted point estimates for demographics exhibited the usual mismatch with national averages (younger, better educated, and lower income), weighted MTurk data matched national estimates. KnowledgePanel's point estimates better matched national benchmarks even before poststratification weighting. Correlations between PROMIS measures and age and income were similar in MTurk and KnowledgePanel; the mean absolute value of the difference between each platform's 137 correlations was 0.06, and 92% were <0.15. However, correlations between PROMIS measures and educational level were dramatically different; the mean absolute value of the difference across these 17 correlation pairs was 0.15, the largest difference was 0.29, and the direction of more than half of these relationships in the MTurk sample was the opposite from that expected from theory. Therefore, caution is needed if using MTurk for studies where educational level is a key variable.
The data quality of our MTurk sample was often inferior to that of the KnowledgePanel sample but possibly not so much as to negate the benefits of its cost savings for some uses.
RR2-10.1186/s12891-020-03696-2.
Journal Article