Catalogue Search | MBRL

Machine truth serum: a surprisingly popular approach to improving ensemble methods

by Luo, Tianyi , Liu, Yang in Algorithms , Artificial Intelligence , Classification

2023

Wisdom of the crowd (Surowiecki, 2005a ) disclosed a striking fact that the majority voting answer from a crowd is usually more accurate than a few individual experts. The same story is observed in machine learning - ensemble methods (Dietterich, 2000 ) leverage this idea to exploit multiple machine learning algorithms in various settings e.g., supervised learning and semi-supervised learning to achieve better performance by aggregating the predictions of different algorithms than that obtained from any constituent algorithm alone. Nonetheless, the existing aggregating rule would fail when the majority answer of all the constituent algorithms is more likely to be wrong. In this paper, we extend the idea proposed in Bayesian Truth Serum (Prelec, 2004 ) that “a surprisingly more popular answer is more likely to be the true answer instead of the majority one” to supervised classification further improved by ensemble final predictions method and semi-supervised classification (e.g., MixMatch (Berthelot et al., 2019 )) enhanced by ensemble data augmentations method. The challenge for us is to define or detect when an answer should be considered as being “surprising”. We present two machine learning aided methods which can reveal the truth when the minority instead of majority has the true answer on both settings of supervised and semi-supervised classification problems. We name our proposed method the Machine Truth Serum. Our experiments on a set of classification tasks (image, text, etc.) show that the classification performance can be further improved by applying Machine Truth Serum in the ensemble final predictions step (supervised) and in the ensemble data augmentations step (semi-supervised).

Journal Article

Share this book

Add to My Shelf

Semi‐supervised classification of fundus images combined with CNN and GCN

by Sun, Xiaolei , Duan, Sixu , Wang, Ting in attention mechanism , Blood vessels , Classification

2022

Purpose Diabetic retinopathy (DR) is one of the most serious complications of diabetes, which is a kind of fundus lesion with specific changes. Early diagnosis of DR can effectively reduce the visual damage caused by DR. Due to the variety and different morphology of DR lesions, automatic classification of fundus images in mass screening can greatly save clinicians' diagnosis time. To alleviate these problems, in this paper, we propose a novel framework—graph attentional convolutional neural network (GACNN). Methods and Materials The network consists of convolutional neural network (CNN) and graph convolutional network (GCN). The global and spatial features of fundus images are extracted by using CNN and GCN, and attention mechanism is introduced to enhance the adaptability of GCN to topology map. We adopt semi‐supervised method for classification, which greatly improves the generalization ability of the network. Results In order to verify the effectiveness of the network, we conducted comparative experiments and ablation experiments. We use confusion matrix, precision, recall, kappa score, and accuracy as evaluation indexes. With the increase of the labeling rates, the classification accuracy is higher. Particularly, when the labeling rate is set to 100%, the classification accuracy of GACNN reaches 93.35%. Compared with DenseNet121, the accuracy rate is improved by 6.24%. Conclusions Semi‐supervised classification based on attention mechanism can effectively improve the classification performance of the model, and attain preferable results in classification indexes such as accuracy and recall. GACNN provides a feasible classification scheme for fundus images, which effectively reduces the screening human resources.

Journal Article

Share this book

Add to My Shelf

A survey on ensemble learning

by SHI, Yifan , MA, Qianli , DONG, Xibin in Algorithms , clustering ensemble , Computer Science

2020

Despite significant successes achieved in knowledge discovery, traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data, such as imbalanced, high-dimensional, noisy data, etc. The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data. In this context, it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model. Ensemble learning, as one research hot spot, aims to integrate data fusion, data modeling, and data mining into a unified framework. Specifically, ensemble learning firstly extracts a set of features with a variety of transformations. Based on these learned features, multiple learning algorithms are utilized to produce weak predictive results. Finally, ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way. In this paper, we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics. In addition, we present challenges and possible research directions for each mainstream approach of ensemble learning, and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning, reinforcement learning, etc.

Journal Article

Share this book

Add to My Shelf

Supervised versus un-supervised classification

by Laurance, Susan G. W. , Addicott, Eda in Australia , Boundaries , Classification

2019

Question What are the differences between plant communities recognised using supervised versus un‐supervised methods? Location Northeastern Australia. Methods Two classifications of savanna plant communities were formed independently with two different approaches: supervised and un‐supervised (using agglomerative hierarchical clustering). Each approach used the same vegetation datasets and, importantly, classification criteria. The communities occur on two different landscapes, with differing environmental gradients, covering an area of 53,500 km2. We compared the internal characteristics of plant communities between approaches and landscapes using four evaluation criteria: identifiability, distinctiveness, similarity of internal heterogeneity and predictability of species foliage cover. Additionally, we compared the central floristic concepts and compositional boundaries of communities identified by each approach. Results Supervised and un‐supervised approaches recognised similar floristic community concepts. Compositional boundaries between communities were similar on the landscape with steeper environmental gradients but significantly different on the landscape with gradual environmental gradients. However, communities distinguished using supervised methods were significantly less distinct and identifiable, worse at predicting species foliage cover and significantly more variable in species composition than those identified using un‐supervised methods. Conclusions Using supervised rather than un‐supervised approaches to distinguish plant communities can result in less recognisable communities, possibly reducing their usefulness for land management planning. Importantly, we found a large disparity between the two approaches in delineating compositional boundaries between communities on landscapes with gradual environmental gradients. This is particularly relevant to communities in biomes such as the savanna which comprises 20% of the Earth's landmass. Ecologists can be more confident using a supervised approach on landscapes with steep environmental gradients but should target landscapes with gradual environmental gradients for un‐supervised classification. Supervised and un‐supervised classification approaches are common. We tested composition attributes of plant communities recognised by each approach in savanna vegetation, northeastern Australia. Communities from the un‐supervised approach were significantly more recognisable, identifiable and useful for land‐management planning. This was especially true for landscapes with broad environmental gradients. Understanding these implications is important when deciding which classification approach to use. Photograph by M.R. Newton

Journal Article

Share this book

Add to My Shelf

Comparative Analysis of CART and Random Forest Classifiers for LULC Mapping: A Case Study of Brahmani-Baitarani River Basin, India

by Gujar, Jotiram , Patil, Sangram , Kadam, Sonali in Accuracy , Agriculture , Algorithms

2025

Land Use and Land Cover (LULC) classification is essential for monitoring environmental changes, managing resources, and planning sustainable development. However, accurate classification remains challenging because of the diversity of landscapes and the computational demands of processing large datasets. Among various machine learning (ML) algorithms, such as Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest (RF), and Classification and Regression Trees (CART), RF and CART were chosen for this study because of their robustness, simplicity, and efficiency in handling complex LULC classification tasks. This study focuses on the Brahmani-Baitarani River Basin, a region known for its environmental significance and susceptibility to land-use changes. Using remote sensing data from Landsat 8, Landsat 9, and Sentinel-2 satellites, a comparative analysis of RF and CART was conducted to evaluate their LULC mapping performance. The datasets were processed and analyzed on the Google Earth Engine (GEE) platform using multi-temporal image data and advanced filtering techniques. The results revealed that RF consistently delivered higher classification accuracy than CART, making it a reliable choice for LULC studies in dynamic and heterogeneous landscapes. By integrating high-resolution satellite imagery with ML algorithms, this study provided detailed insights into the spatial distribution of land use across the Brahmani-Baitarani Basin. These findings have practical applications in urban planning, natural resource management, and environmental conservation, and offer valuable information for decision-makers and researchers working to address global environmental challenges.

Journal Article

Share this book

Add to My Shelf

Fingerprint Liveness Detection in the Presence of Capable Intruders

by Cardoso, Jaime , Sequeira, Ana in Biometric Identification - methods , Biometric Identification - standards , biometrics

2015

Fingerprint liveness detection methods have been developed as an attempt to overcome the vulnerability of fingerprint biometric systems to spoofing attacks. Traditional approaches have been quite optimistic about the behavior of the intruder assuming the use of a previously known material. This assumption has led to the use of supervised techniques to estimate the performance of the methods, using both live and spoof samples to train the predictive models and evaluate each type of fake samples individually. Additionally, the background was often included in the sample representation, completely distorting the decision process. Therefore, we propose that an automatic segmentation step should be performed to isolate the fingerprint from the background and truly decide on the liveness of the fingerprint and not on the characteristics of the background. Also, we argue that one cannot aim to model the fake samples completely since the material used by the intruder is unknown beforehand. We approach the design by modeling the distribution of the live samples and predicting as fake the samples very unlikely according to that model. Our experiments compare the performance of the supervised approaches with the semi-supervised ones that rely solely on the live samples. The results obtained differ from the ones obtained by the more standard approaches which reinforces our conviction that the results in the literature are misleadingly estimating the true vulnerability of the biometric system.

Journal Article

Share this book

Add to My Shelf

Texture Analysis and Land Cover Classification of Tehran Using Polarimetric Synthetic Aperture Radar Imagery

by Liu, Wen , Zakeri, Homa , Yamazaki, Fumio in Classification , Principal components analysis

2017

Land cover classification of built-up and bare land areas in arid or semi-arid regions from multi-spectral optical images is not simple, due to the similarity of the spectral characteristics of the ground and building materials. However, synthetic aperture radar (SAR) images could overcome this issue because of the backscattering dependency on the material and the geometry of different surface objects. Therefore, in this paper, dual-polarized data from ALOS-2 PALSAR-2 (HH, HV) and Sentinel-1 C-SAR (VV, VH) were used to classify the land cover of Tehran city, Iran, which has grown rapidly in recent years. In addition, texture analysis was adopted to improve the land cover classification accuracy. In total, eight texture measures were calculated from SAR data. Then, principal component analysis was applied, and the first three components were selected for combination with the backscattering polarized images. Additionally, two supervised classification algorithms, support vector machine and maximum likelihood, were used to detect bare land, vegetation, and three different built-up classes. The results indicate that land cover classification obtained from backscatter values has better performance than that obtained from optical images. Furthermore, the layer stacking of texture features and backscatter values significantly increases the overall accuracy.

Journal Article

Share this book

Add to My Shelf

MONITORING DEGRADATION OF WETLAND AREAS USING SATELLITE IMAGERY AND GEOGRAPHIC INFORMATION SYSTEM TECHNIQUES

by Jaber, Ali in Atmosphere , Biodiversity , Calibration

2020

In order to conserve the ecosystems and biodiversity of wetland areas, it is necessary to monitor the degradation of these areas. Currently, Al Razzazah lake and its surrounding areas have degradation significantly due to its low water level, which has negatively affected its biodiversity. Hence, this research aims to propose a method to model the monitoring of spatio-temporal changes in that lake and its surrounding areas with an area estimated at 4660 km² between (1998 – 2018) using Remote Sensing and Geographic Information System (GIS) techniques. After conducting the supervised classification by the method of Support Vector Machine (SVM) for all satellite images, we extracted thematic maps, which contain five classes. The results showed the overall accuracy was 90.11%, 91.60% and 90.57% while the Kappa coefficient were 0.8764, 0.8950 and 0.8821 for 1998, 2008 and 2018 respectively. Results showed that the lake area decreased by 86.21% in the study area in 2018.

Journal Article

Share this book

Add to My Shelf

Recognizing hotspots in Brief Eclectic Psychotherapy for PTSD by text and audio mining

by van Hessen, Arjan J. , Olff, Miranda , Veldkamp, Bernard P. in análisis de discurso , Basic , brief eclectic psychotherapy

2020

Background: Identifying and addressing hotspots is a key element of imaginal exposure in Brief Eclectic Psychotherapy for PTSD (BEPP). Research shows that treatment effectiveness is associated with focusing on these hotspots and that hotspot frequency and characteristics may serve as indicators for treatment success. Objective: This study aims to develop a model to automatically recognize hotspots based on text and speech features, which might be an efficient way to track patient progress and predict treatment efficacy. Method: A multimodal supervised classification model was developed based on analog tape recordings and transcripts of imaginal exposure sessions of 10 successful and 10 non-successful treatment completers. Data mining and machine learning techniques were used to extract and select text (e.g. words and word combinations) and speech (e.g. speech rate, pauses between words) features that distinguish between 'hotspot' (N = 37) and 'non-hotspot' (N = 45) phases during exposure sessions. Results: The developed model resulted in a high training performance (mean F 1 -score of 0.76) but a low testing performance (mean F 1 -score = 0.52). This shows that the selected text and speech features could clearly distinguish between hotspots and non-hotspots in the current data set, but will probably not recognize hotspots from new input data very well. Conclusions: In order to improve the recognition of new hotspots, the described methodology should be applied to a larger, higher quality (digitally recorded) data set. As such this study should be seen mainly as a proof of concept, demonstrating the possible application and contribution of automatic text and audio analysis to therapy process research in PTSD and mental health research in general.

Journal Article

Share this book

Add to My Shelf

Semi-Supervised Classification of Graph Convolutional Networks with Laplacian Rank Constraints

by Lu, Guangquan , Zhan, Mengmeng , Zhang, Haiqi in Artificial Intelligence , Artificial neural networks , Classification

2022

Graph convolutional networks (GCNs), as an extension of classic convolutional neural networks (CNNs) in graph processing, have achieved good results in completing semi-supervised learning tasks. Traditional GCNs usually use fixed graph to complete various semi-supervised classification tasks, such as chemical molecules and social networks. Graph is an important basis for the classification of GCNs model, and its quality has a large impact on the performance of the model. For low-quality input graph, the classification results of the GCNs model are often not ideal. In order to improve the classification effect of GCNs model, we propose a graph learning method to generate high-quality topological graph, which is more suitable for GCNs model classification. We use the correlation between the data to generate a data similarity matrix, and apply Laplacian rank constraint to similarity matrix, so that the number of connected components of the topological graph is consistent with the number of categories of the original data. Experimental results on 10 real datasets show that our method is better than the comparison method in classification effect.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter