Catalogue Search | MBRL

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

by Marshall, Iain J. , Elliott, Julian , Mavergames, Chris in Algorithms , Automation , Bibliographic data bases

2021

This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”), with the algorithm trained using a data set of title–abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification. The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98–0.99) and precision of 0.08 (95% confidence interval 0.06–0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published. The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production. •Systematic review processes need to become more efficient.•Machine learning is sufficiently mature for real-world use.•A machine learning classifier was built using data from Cochrane Crowd.•It was calibrated to achieve very high recall.•It is now live and in use in Cochrane review production systems.

Journal Article

Share this book

Add to My Shelf

Machine learning reduced workload for the Cochrane COVID-19 Study Register: development and evaluation of the Cochrane COVID-19 Study Classifier

by Featherstone, Robin , Mavergames, Chris , Shemilt, Ian in Biomedicine , Calibration , Coronaviruses

2022

Background This study developed, calibrated and evaluated a machine learning (ML) classifier designed to reduce study identification workload in maintaining the Cochrane COVID-19 Study Register (CCSR), a continuously updated register of COVID-19 research studies. Methods A ML classifier for retrieving COVID-19 research studies (the ‘Cochrane COVID-19 Study Classifier’) was developed using a data set of title-abstract records ‘included’ in, or ‘excluded’ from, the CCSR up to 18th October 2020, manually labelled by information and data curation specialists or the Cochrane Crowd. The classifier was then calibrated using a second data set of similar records ‘included’ in, or ‘excluded’ from, the CCSR between October 19 and December 2, 2020, aiming for 99% recall. Finally, the calibrated classifier was evaluated using a third data set of similar records ‘included’ in, or ‘excluded’ from, the CCSR between the 4th and 19th of January 2021. Results The Cochrane COVID-19 Study Classifier was trained using 59,513 records (20,878 of which were ‘included’ in the CCSR). A classification threshold was set using 16,123 calibration records (6005 of which were ‘included’ in the CCSR) and the classifier had a precision of 0.52 in this data set at the target threshold recall >0.99. The final, calibrated COVID-19 classifier correctly retrieved 2285 (98.9%) of 2310 eligible records but missed 25 (1%), with a precision of 0.638 and a net screening workload reduction of 24.1% (1113 records correctly excluded). Conclusions The Cochrane COVID-19 Study Classifier reduces manual screening workload for identifying COVID-19 research studies, with a very low and acceptable risk of missing eligible studies. It is now deployed in the live study identification workflow for the Cochrane COVID-19 Study Register.

Journal Article

Share this book

Add to My Shelf

Types of minority class examples and their influence on learning classifiers from imbalanced data

by Napierala, Krystyna , Stefanowski, Jerzy in Algorithms , Analysis , Artificial Intelligence

2016

Many real-world applications reveal difficulties in learning classifiers from imbalanced data. Although several methods for improving classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the minority class distribution and their influence on classification performance. However, current studies on imbalanced data difficulty factors have been mainly done with artificial datasets and their conclusions are not easily applicable to the real-world problems, also because the methods for their identification are not sufficiently developed. In our paper, we capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. First, we confirm their occurrence in real data by exploring multidimensional visualizations of selected datasets. Then, we introduce a method for an identification of these types of examples, which is based on analyzing a class distribution in a local neighbourhood of the considered example. Two ways of modeling this neighbourhood are presented: with k-nearest examples and with kernel functions. Experiments with artificial datasets show that these methods are able to re-discover simulated types of examples. Next contributions of this paper include carrying out a comprehensive experimental study with 26 real world imbalanced datasets, where (1) we identify new data characteristics basing on the analysis of types of minority examples; (2) we demonstrate that considering the results of this analysis allow to differentiate classification performance of popular classifiers and pre-processing methods and to evaluate their areas of competence. Finally, we highlight directions of exploiting the results of our analysis for developing new algorithms for learning classifiers and pre-processing methods.

Journal Article

Share this book

Add to My Shelf

Comparative study of machine learning algorithms for Kannada twitter sentimental analysis

by Bhuyyar, Rani , Ijeri, Dakshayani , Burkaposh, Sayed Salman in Algorithms , Comparative studies , Computer Communication Networks

2024

Analyzing the client’s reviews from various online platform helps to improvise the business to higher levels. These User’s opinions can be analyzed using Sentiment Analysis. Sentimental analysis on Indian languages is a tedious work as there is a wide diversity in different languages of the India. Kannada is one of the prominent languages in India as 43 million of Indian population use Kannada as their native language for communication and it holds 27 th rank among top 30 languages across the world, as there is very less work carried out on Indian languages, especially in Kannada language, more work is required to process the Kannada language across different domains. The sentimental analysis on the Kannada language has the accuracy about 72% from the previous work. So, in this work, we have made comparative study of various machine learning algorithms for Kannada Twitter sentimental analysis. It is experimented on live Twitter data and found that Multinomial Naive Bayes Classifier has performed better with accuracy of 75%.

Journal Article

Share this book

Add to My Shelf

ilastik: interactive machine learning for (bio)image analysis

by Berg, Stuart , Straehle, Christoph N , Kreshuk, Anna in Algorithms , Annotations , Case studies

2019

We present ilastik, an easy-to-use interactive tool that brings machine-learning-based (bio)image analysis to end users without substantial computational expertise. It contains pre-defined workflows for image segmentation, object classification, counting and tracking. Users adapt the workflows to the problem at hand by interactively providing sparse training annotations for a nonlinear classifier. ilastik can process data in up to five dimensions (3D, time and number of channels). Its computational back end runs operations on-demand wherever possible, allowing for interactive prediction on data larger than RAM. Once the classifiers are trained, ilastik workflows can be applied to new data from the command line without further user interaction. We describe all ilastik workflows in detail, including three case studies and a discussion on the expected performance.

Journal Article

Share this book

Add to My Shelf

Rapid Flood Mapping and Evaluation with a Supervised Classifier and Change Detection in Shouguang Using Sentinel-1 SAR and Sentinel-2 Optical Data

by Jin, Shuanggen , Huang, Minmin in backscatter characteristics , Backscattering , Case studies

2020

Rapid flood mapping is crucial in hazard evaluation and forecasting, especially in the early stage of hazards. Synthetic aperture radar (SAR) images are able to penetrate clouds and heavy rainfall, which is of special importance for flood mapping. However, change detection is a key part and the threshold selection is very complex in flood mapping with SAR. In this paper, a novel approach is proposed to rapidly map flood regions and estimate the flood degree, avoiding the critical step of thresholding. It converts the change detection of thresholds to land cover backscatter classifications. Sentinel-1 SAR images are used to get the land cover backscatter classifications with the help of Sentinel-2 optical images using a supervised classifier. A pixel-based change detection is used for change detection. Backscatter characteristics and variation rules of different ground objects are essential prior knowledge for flood analysis. SAR image classifications of pre-flood and flooding periods both take the same input to make sense of the change detection between them. This method avoids the inaccuracy caused by a single threshold. A case study in Shouguang is tested by this new method, which is compared with the flood map extracted by Otsu thresholding and normalized difference water index (NDWI) methods. The results show that our approach can identify the flood beneath vegetation well. Moreover, all required data and data processing are simple, so it can be popularized in rapid flooding mapping in early disaster relief.

Journal Article

Share this book

Add to My Shelf

Real-world Comparison of Afirma GEC and GSC for the Assessment of Cytologically Indeterminate Thyroid Nodules

by Lawrence, Lima , San Martin, Vicente T , Madhun, Nabil Z in Aged , Aged, 80 and over , Benign

2020

Abstract Context Molecular tests have improved the accuracy of preoperative diagnosis of indeterminate thyroid nodules. The Afirma Gene Sequencing Classifier (GSC) was developed to improve the specificity of the Gene Expression Classifier (GEC). Independent studies are needed to assess the performance of GSC. Objective The aim was to compare the performance of GEC and GSC in the assessment of indeterminate nodules. Design, Settings, and Participants Retrospective analysis of Bethesda III and IV nodules tested with GEC or GSC in an academic center between December 2011 and September 2018. Benign call rates (BCRs) and surgical outcomes were compared. Histopathologic data were collected on nodules that were surgically resected to calculate measures of test performance. Results The BCR was 41% (73/178) for GEC and 67.8% (82/121) for GSC (P < .001). Among specimens with dominant Hürthle cell cytology, the BCR was 22% (6/27) for GEC and 63.2% (12/19) for GSC (P = .005). The overall surgery rate decreased from 47.8% in the GEC group to 34.7% in the GSC group (P = .025). One GEC-benign and 3 GSC-benign nodules proved to be malignant on surgical excision. GSC had a statistically significant higher specificity (94% vs 60%, P < .001) and positive predictive value (PPV) (85.3% vs 40%, P < .001) than GEC. While sensitivity and negative predictive value (NPV) dropped with GSC (97.0% vs 90.6% and 98.6% vs 96.3%, respectively), these differences were not significant. Conclusions GSC reclassified more indeterminate nodules as benign and improved the specificity and PPV of the test. These enhancements appear to be resulting in fewer diagnostic surgeries.

Journal Article

Share this book

Add to My Shelf

DNA methylation-based classification of central nervous system tumours

by Bendszus, Martin , Lechner, Matt , Monoranu, Camelia-Maria in 13/56 , 45/61 , 631/114/1386

2018

Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging—with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology. An online approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups has been developed to help to improve current diagnostic standards. Classifying tumour types for better diagnoses Precise cancer diagnoses are essential to ensure the best treatment plans for patients, but standardization of the diagnostic process has been challenging. The authors present a comprehensive approach for DNA-methylation-based classification of brain tumours. The tool improves diagnostic precision of standard methods, and is made available online for broad accessibility. The results illustrate the potential applications of molecular diagnosis tools.

Journal Article

Share this book

Add to My Shelf

Hate speech detection and racial bias mitigation in social media based on BERT model

by Crespi, Noël , Farahbakhsh, Reza , Mozafari, Marzieh in Abuse , African American English , African Americans

2020

Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. In this paper, we first introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers) and evaluate the proposed model on two publicly available datasets that have been annotated for racism, sexism, hate or offensive content on Twitter. Next, we introduce a bias alleviation mechanism to mitigate the effect of bias in training set during the fine-tuning of our pre-trained BERT-based model for hate speech detection. Toward that end, we use an existing regularization method to reweight input samples, thereby decreasing the effects of high correlated training set' s n-grams with class labels, and then fine-tune our pre-trained BERT-based model with the new re-weighted samples. To evaluate our bias alleviation mechanism, we employed a cross-domain approach in which we use the trained classifiers on the aforementioned datasets to predict the labels of two new datasets from Twitter, AAE-aligned and White-aligned groups, which indicate tweets written in African-American English (AAE) and Standard American English (SAE), respectively. The results show the existence of systematic racial bias in trained classifiers, as they tend to assign tweets written in AAE from AAE-aligned group to negative classes such as racism, sexism, hate, and offensive more often than tweets written in SAE from White-aligned group. However, the racial bias in our classifiers reduces significantly after our bias alleviation mechanism is incorporated. This work could institute the first step towards debiasing hate speech and abusive language detection systems.

Journal Article

Share this book

Add to My Shelf

EpiSmokEr: a robust classifier to determine smoking status from DNA methylation data

by Anders, Simon , Korhonen, Tellervo , Kaprio, Jaakko in Adult , Aged , Artificial intelligence

2019

Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter