Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
3,410
result(s) for
"Interrater Reliability"
Sort by:
Interpretation of Thoracic Radiography Shows Large Discrepancies Depending on the Qualification of the Physician—Quantitative Evaluation of Interobserver Agreement in a Representative Emergency Department Scenario
by
Julien Dinkel
,
Maximilian Fischer
,
Maximilian Jörgens
in
chest radiography
,
chest radiography; emergency department; interrater reliability; radiologists; clinicians
,
clinicians
2021
(1) Background: Chest radiography (CXR) is still a key diagnostic component in the emergency department (ED). Correct interpretation is essential since some pathologies require urgent treatment. This study quantifies potential discrepancies in CXR analysis between radiologists and non-radiology physicians in training with ED experience. (2) Methods: Nine differently qualified physicians (three board-certified radiologists [BCR], three radiology residents [RR], and three non-radiology residents involved in ED [NRR]) evaluated a series of 563 posterior-anterior CXR images by quantifying suspicion for four relevant pathologies: pleural effusion, pneumothorax, pneumonia, and pulmonary nodules. Reading results were noted separately for each hemithorax on a Likert scale (0–4; 0: no suspicion of pathology, 4: safe existence of pathology) adding up to a total of 40,536 reported pathology suspicions. Interrater reliability/correlation and Kruskal–Wallis tests were performed for statistical analysis. (3) Results: While interrater reliability was good among radiologists, major discrepancies between radiologists’ and non-radiologists’ reading results could be observed in all pathologies. Highest overall interrater agreement was found for pneumothorax detection and lowest agreement in raising suspicion for malignancy suspicious nodules. Pleural effusion and pneumonia were often suspected with indifferent choices (1–3). In terms of pneumothorax detection, all readers mainly decided for a clear option (0 or 4). Interrater reliability was usually higher when evaluating the right hemithorax (all pathologies except pneumothorax). (4) Conclusions: Quantified CXR interrater reliability analysis displays a general uncertainty and strongly depends on medical training. NRR can benefit from radiology reporting in terms of time efficiency and diagnostic accuracy. CXR evaluation of long-time trained ED specialists has not been tested.
Journal Article
Intercoder Reliability in Qualitative Research: Debates and Practical Guidelines
2020
Evaluating the intercoder reliability (ICR) of a coding frame is frequently recommended as good practice in qualitative analysis. ICR is a somewhat controversial topic in the qualitative research community, with some arguing that it is an inappropriate or unnecessary step within the goals of qualitative analysis. Yet ICR assessment can yield numerous benefits for qualitative studies, which include improving the systematicity, communicability, and transparency of the coding process; promoting reflexivity and dialogue within research teams; and helping convince diverse audiences of the trustworthiness of the analysis. Few guidelines exist to help researchers negotiate the assessment of ICR in qualitative analysis. The current article explains what ICR is, reviews common arguments for and against its incorporation in qualitative analysis and offers guidance on the practical elements of performing an ICR assessment.
Journal Article
The revised Cochrane risk of bias tool for randomized trials (RoB 2) showed low interrater reliability and challenges in its application
2020
The objective of the study is to assess the interrater reliability (IRR) and usability of the revised Cochrane risk of bias tool for randomized trials (RoB 2).
This is a cross-sectional study. Four raters independently applied RoB 2 on the primary outcome of a random sample of individually randomized parallel-group trials (randomized controlled trials (RCTs)). We calculated the Fleiss’ kappa for multiple raters, the time needed to complete the tool, and discussed the application of RoB 2 to identify difficulties and reasons for disagreement.
A total of 70 outcomes from 70 RCTs were included. IRR was slight for overall judgment (IRR 0.16, 95% confidence interval (CI) 0.08–0.24); individual domain analysis gave IRR as moderate for “randomization process” (IRR 0.45, 95% CI 0.37–0.53), slight for “deviations from intended intervention” for RCTs assessing the effect of the assignment to an intervention (IRR 0.04, 95% CI −0.06 to 0.14), fair for those assessing the effect of adhering (IRR 0.21, 95% CI 0.11–0.31), and fair for the other domains, ranging from 0.22 (95% CI 0.14–0.30) for “missing outcome data” to 0.30 (95% CI 0.22–0.38) for “selection of reported results”. Mean time to apply the tool was 28 minutes (standard deviation 13.4) per study outcome. The main difficulties were due to poor knowledge of the subject matter of primary studies, new terminology, different approaches for some domains compared with the previous tool, and way of formulating signaling questions.
RoB 2 is a detailed and comprehensive tool but difficult and demanding, even for raters with substantial expertise in systematic reviews. Calibration exercises and intensive training are needed before its application, to improve reliability.
Journal Article
The Relation Between Mathematics Anxiety and Mathematics Performance Among School-Aged Students: A Meta-Analysis
by
Lin, Xin
,
Peng, Peng
,
Namkung, Jessica M.
in
Affective Behavior
,
Anxiety
,
Cognitive Processes
2019
The purpose of this meta-analysis was to examine the relation between mathematics anxiety (MA) and mathematics performance among school-aged students, and to identify potential moderators and underlying mechanisms of such relation, including grade level, temporal relations, difficulty of mathematical tasks, dimensions of MA measures, effects on student grades, and working memory. A meta-analysis of 131 studies with 478 effect sizes was conducted. The results indicated that a significant negative correlation exist between MA and mathematics performance, r = -.34. Moderation analyses indicated that dimensions of MA, difficulty of mathematical tasks, and effects on student grades differentially affected the relation between MA and mathematics performance. MA assessed with both cognitive and affective dimensions showed a stronger negative correlation with mathematics performance compared to MA assessed with either an affective dimension only or mixed/unspecified dimensions. Advanced mathematical task that require multistep processes showed a stronger negative correlation to MA compared to foundational mathematical tasks. Mathematics measures that affected/reflected student grades (e.g., final exam, students 'course grade, GPA) had a stronger negative correlation to MA than did other measures of mathematics performance that did not affect student grades (e.g., mathematics measures administered as part of research). Theoretical and practical implications of the findings are discussed.
Journal Article
Low agreement among reviewers evaluating the same NIH grant applications
by
Pier, Elizabeth L.
,
Filut, Amarette
,
Brauer, Markus
in
Biomedical Research - economics
,
Funding
,
Grants
2018
Obtaining grant funding from the National Institutes of Health (NIH) is increasingly competitive, as funding success rates have declined over the past decade. To allocate relatively scarce funds, scientific peer reviewers must differentiate the very best applications from comparatively weaker ones. Despite the importance of this determination, little research has explored how reviewers assign ratings to the applications they review and whether there is consistency in the reviewers’ evaluation of the same application. Replicating all aspects of the NIH peer-review process, we examined 43 individual reviewers’ ratings and written critiques of the same group of 25 NIH grant applications. Results showed no agreement among reviewers regarding the quality of the applications in either their qualitative or quantitative evaluations. Although all reviewers received the same instructions on how to rate applications and format their written critiques, we also found no agreement in how reviewers “translated” a given number of strengths and weaknesses into a numeric rating. It appeared that the outcome of the grant review depended more on the reviewer to whom the grant was assigned than the research proposed in the grant. This research replicates the NIH peer-review process to examine in detail the qualitative and quantitative judgments of different reviewers examining the same application, and our results have broad relevance for scientific grant peer review.
Journal Article
A Meta-Analytic Review of the Relations Between Motivation and Reading Achievement for K–12 Students
by
Filderman, Marissa J.
,
Toste, Jessica R.
,
Didion, Lisa
in
Correlation
,
Educational Practices
,
Effect Size
2020
The purpose of this meta-analytic review was to investigate the relation between motivation and reading achievement among students in kindergarten through 12th grade. A comprehensive search of peer-reviewed published research resulted in 132 articles with 185 independent samples and 1,154 reported effect sizes (Pearson’s r). Results of our random-effects metaregression model indicate a significant, moderate relation between motivation and reading, r = .22, p < .001. Moderation analyses revealed that the motivation construct being measured influenced the relation between motivation and reading. There were no other significant moderating or interaction effects related to reading domain, sample type, or grade level. Evidence to support the bidirectional nature of the relation between motivation and reading was provided through longitudinal analyses, with findings suggesting that earlier reading is a stronger predictor of later motivation than motivation is of reading. Taken together, the findings from this meta-analysis provide a better understanding of how motivational processes relate to reading performance, which has important implications for developing effective instructional practices and fostering students’ active engagement in reading. Theoretical and practical implications of these findings for reading development are discussed.
Journal Article
Interrater reliability of sleep stage scoring: a meta-analysis
by
Cho, Jae Hoon
,
Lee, Yun Ji
,
Lee, Jae Yong
in
Agreements
,
Cardiovascular health
,
Confidence intervals
2022
Study Objectives:We evaluated the interrater reliabilities of manual polysomnography sleep stage scoring. We included all studies that employed Rechtschaffen and Kales rules or American Academy of Sleep Medicine standards. We sought the overall degree of agreement and those for each stage.Methods:The keywords were “Polysomnography (PSG),” “sleep staging,” “Rechtschaffen and Kales (R&K),” “American Academy of Sleep Medicine (AASM),” “interrater (interscorer) reliability,” and “Cohen’s kappa.” We searched PubMed, OVID Medline, EMBASE, the Cochrane library, KoreaMed, KISS, and the MedRIC. The exclusion criteria included automatic scoring and pediatric patients. We collected data on scorer histories, scoring rules, numbers of epochs scored, and the underlying diseases of the patients.Results:A total of 101 publications were retrieved; 11 satisfied the selection criteria. The Cohen’s kappa for manual, overall sleep scoring was 0.76, indicating substantial agreement (95% confidence interval, 0.71–0.81; P < .001). By sleep stage, the figures were 0.70, 0.24, 0.57, 0.57, and 0.69 for the W, N1, N2, N3, and R stages, respectively. The interrater reliabilities for stage N2 and N3 sleep were moderate, and that for stage N1 sleep was only fair.Conclusions:We conducted a meta-analysis to generalize the variation in manual scoring of polysomnography and provide reference data for automatic sleep stage scoring systems. The reliability of manual scorers of polysomnography sleep stages was substantial. However, for certain stages, the results were poor; validity requires improvement.Citation:Lee YJ, Lee JY, Cho JH, Choi JH. Interrater reliability of sleep stage scoring: a meta-analysis. J Clin Sleep Med. 2022;18(1):193–202.
Journal Article
A Systematic Review of Reviews of the Outcome of Noninstitutional Child Maltreatment
by
Carr, Alan
,
Craddock, Fiona
,
Duff, Hollie
in
Abused children
,
Adjustment
,
Adult Survivors of Child Abuse
2020
The aim of the systematic review described in this article was to synthesize available high-quality evidence on the outcomes of noninstitutional child maltreatment across the life span. A systematic review of previous systematic reviews and meta-analyses was conducted. Ten databases were searched. One hundred eleven papers which met stringent inclusion and exclusion criteria were selected for review. Papers were included if they reported systematic reviews and meta-analyses of longitudinal or cross-sectional controlled studies, or single-group cohort primary studies of the outcomes of child maltreatment in the domains of physical and mental health and psychosocial adjustment of individuals who were children lived mainly with their families. Using AMSTAR criteria, selected systematic reviews and meta-analyses were found to be of moderate or high quality. Searches, study selection, data extraction, and study quality assessments were independently conducted by two researchers, with a high degree of interrater reliability. The 111 systematic reviews and meta-analyses reviewed in this article covered 2,534 independent primary studies involving 30,375,962 participants, of whom more than 518,022 had been maltreated. The magnitude and quality of this evidence base allow considerable confidence to be placed in obtained results. Significant associations were found between a history of child maltreatment and adjustment in the domains of physical health, mental health, and psychosocial adjustment in a very wide range of areas. The many adverse outcomes associated with child maltreatment documented in this review highlight the importance of implementing evidence-based child protection policies and practices to prevent maltreatment and treat child abuse survivors.
Journal Article
ChatGPT vs Google for Queries Related to Dementia and Other Cognitive Decline: Comparison of Results
by
Hristidis, Vagelis
,
Ganta, Sai Rithesh Reddy
,
Brown, Ellen L
in
Alternative approaches
,
Artificial intelligence
,
Caregivers
2023
People living with dementia or other cognitive decline and their caregivers (PLWD) increasingly rely on the web to find information about their condition and available resources and services. The recent advancements in large language models (LLMs), such as ChatGPT, provide a new alternative to the more traditional web search engines, such as Google.
This study compared the quality of the results of ChatGPT and Google for a collection of PLWD-related queries.
A set of 30 informational and 30 service delivery (transactional) PLWD-related queries were selected and submitted to both Google and ChatGPT. Three domain experts assessed the results for their currency of information, reliability of the source, objectivity, relevance to the query, and similarity of their response. The readability of the results was also analyzed. Interrater reliability coefficients were calculated for all outcomes.
Google had superior currency and higher reliability. ChatGPT results were evaluated as more objective. ChatGPT had a significantly higher response relevance, while Google often drew upon sources that were referral services for dementia care or service providers themselves. The readability was low for both platforms, especially for ChatGPT (mean grade level 12.17, SD 1.94) compared to Google (mean grade level 9.86, SD 3.47). The similarity between the content of ChatGPT and Google responses was rated as high for 13 (21.7%) responses, medium for 16 (26.7%) responses, and low for 31 (51.6%) responses.
Both Google and ChatGPT have strengths and weaknesses. ChatGPT rarely includes the source of a result. Google more often provides a date for and a known reliable source of the response compared to ChatGPT, whereas ChatGPT supplies more relevant responses to queries. The results of ChatGPT may be out of date and often do not specify a validity time stamp. Google sometimes returns results based on commercial entities. The readability scores for both indicate that responses are often not appropriate for persons with low health literacy skills. In the future, the addition of both the source and the date of health-related information and availability in other languages may increase the value of these platforms for both nonmedical and medical professionals.
Journal Article
Trending topics in careers: a review and future research agenda
by
Akkermans, Jos
,
Kubasch, Stella
in
Career advancement
,
Career Development
,
Career development planning
2017
Purpose
Virtually all contemporary scientific papers studying careers emphasize its changing nature. Indeed, careers have been changing during recent decades, for example becoming more complex and unpredictable. Furthermore, hallmarks of the new career – such as individual agency – are clearly increasing in importance in today’s labor market. This led the authors to ask the question of whether these changes are actually visible in the topics that career scholars research. In other words, the purpose of this paper is to discover the trending topics in careers.
Design/methodology/approach
To achieve this goal, the authors analyzed all published papers from four core career journals (i.e. Career Development International, Career Development Quarterly, Journal of Career Assessment, and Journal of Career Development) between 2012 and 2016. Using a five-step procedure involving three researchers, the authors formulated the 16 most trending topics.
Findings
Some traditional career topics are still quite popular today (e.g. career success as the #1 trending topic), whereas other topics have emerged during recent years (e.g. employability as the #3 trending topic). In addition, some topics that are closely related to career research – such as unemployment and job search – surprisingly turned out not to be a trending topic.
Originality/value
In reviewing all published papers in CDI, CDQ, JCA, and JCD between 2012 and 2016, the authors provide a unique overview of currently trending topics, and the authors compare this to the overall discourse on careers. In addition, the authors formulate key questions for future research.
Journal Article