Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
81
result(s) for
"re-sampling"
Sort by:
Resurveying historical vegetation data — opportunities and challenges
by
Schei, Fride H.
,
Kapfer, Jutta
,
Kopecký, Martin
in
Bias
,
data collection
,
Environmental change
2017
Background: Resurveying historical vegetation plots has become more and more popular in recent years as it provides a unique opportunity to estimate vegetation and environmental changes over the past decades. Most historical plots, however, are not permanently marked and uncertainty in plot location, in addition to observer bias and seasonal bias, may add significant errors to temporal change. These errors may have major implications for the reliability of studies on long-term environmental change and deserve closer attention of vegetation ecologists. Methods: Vegetation data obtained from the resurveying of non-permanently marked plots are assessed for their potential to study environmental change effects on plant communities and the challenges the use of such data have to meet. We describe the properties of vegetation resurveys, distinguishing basic types of plots according to relocation error, and we highlight the potential of such data types for studying vegetation dynamics and their drivers. Finally, we summarize the challenges and limitations of resurveying non-permanently marked vegetation plots for different purposes in environmental change research. Results and conclusions: Re-sampling error is caused by three main independent sources of error: error caused by plot relocation, observer bias and seasonality bias. For relocation error, vegetation plots can be divided into permanent and non-permanent plots, while the latter are further divided into quasi-permanent (with approximate relocation) and non-traceable (with random relocation within a sampled area) plots. To reduce the inherent sources of error in resurvey data, the following precautions should be followed: (i) resurvey historical vegetation plots whose approximate plot location within a study area is known; (ii) consider all information available from historical studies in order to keep plot relocation errors low; (iii) resurvey at times of the year when vegetation development is comparable to the historical survey to control for seasonal variability in vegetation; (iv) retain a high level of experience of the observers to keep observer bias low; and (v) edit and standardize data sets before analyses.
Journal Article
Biotic homogenization of upland vegetation: patterns and drivers at multiple spatial scales over five decades
by
Birks, H. John B.
,
Thompson, Des B.A.
,
Ross, Louise C.
in
Biodiversity
,
Conservation biology
,
dissimilarity
2012
Questions: Is there evidence for biotic homogenization of upland vegetation? Do the magnitude and nature of floristic and compositional change vary between vegetation types? What can be inferred about the drivers responsible for the observed changes? Location: Upland heath, mire and grassland communities of the northwest Highlands of Scotland, UK. Methods: We re-survey plots first described in a phytosociological study of 1956—1958 to assess the changes in plant species composition over the last 50 yr in five major upland vegetation types. Using a combination of multivariate analysis, dissimilarity measures, diversity metrics and published data on species attributes; we quantify, characterize and link potential drivers of environmental change with the observed changes in species composition. Results: Grassland and heath vegetation declined in species richness and variation in community composition, while mires showed little change. Previously distinct vegetation types became more similar in composition, characterized by the increased dominance of generalist upland graminoids and reduced dwarfshrub, forb and lichen cover, although novel assemblages were not apparent. Species with an oceanic distribution increased at the expense of those with an arctic-montane distribution. Temperature, precipitation and acidity were found to be potentially important in explaining changes in species composition: species that had undergone the greatest increases had a preference for warmer, drier and more acidic conditions. Conclusions: The vegetation of the northwest Scottish Highlands has undergone marked biotic homogenization over the last 50 yr, manifested through a loss of various aspects of diversity at the local, community and landscape scales. The magnitude of change varies between vegetation types, although the nature of change shows many similar characteristics. Analyses of species attributes suggest these changes are driven by climate warming and acidification, although over-grazing may also be important. This study highlights the importance of the link between the loss of plant diversity and homogenization at multiple scales, and demonstrates that boreal heath communities are particularly at risk from these processes.
Journal Article
Class-Difficulty Based Methods for Long-Tailed Visual Recognition
by
Ohashi, Hiroki
,
Nakamura, Katsuyuki
,
Sinha, Saptarshi
in
Artificial neural networks
,
Data sampling
,
Datasets
2022
Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.
Journal Article
A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems
by
Aickelin, Uwe
,
Khorshidi, Hadi Akbarzadeh
,
Yang, Yuxuan
in
Algorithms
,
Cancer
,
Cardiovascular disease
2024
There has been growing attention to multi-class classification problems, particularly those challenges of imbalanced class distributions. To address these challenges, various strategies, including data-level re-sampling treatment and ensemble methods, have been introduced to bolster the performance of predictive models and Artificial Intelligence (AI) algorithms in scenarios where excessive level of imbalance is present. While most research and algorithm development have been focused on binary classification problems, in health informatics there is an increased interest in the field to address the problem of multi-class classification in imbalanced datasets. Multi-class imbalance problems bring forth more complex challenges, as a delicate approach is required to generate synthetic data and simultaneously maintain the relationship between the multiple classes. The aim of this review paper is to examine over-sampling methods tailored for medical and other datasets with multi-class imbalance. Out of 2,076 peer-reviewed papers identified through searches, 197 eligible papers were chosen and thoroughly reviewed for inclusion, narrowing to 37 studies being selected for in-depth analysis. These studies are categorised into four categories: metric, adaptive, structure-based, and hybrid approaches. The most significant finding is the emerging trend toward hybrid resampling methods that combine the strengths of various techniques to effectively address the problem of imbalanced data. This paper provides an extensive analysis of each selected study, discusses their findings, and outlines directions for future research.
Journal Article
Antlion re-sampling based deep neural network model for classification of imbalanced multimodal stroke dataset
by
Khan, Wazir Zada
,
Hakak, Saqib
,
Bhattacharya, Sweta
in
Algorithms
,
Artificial neural networks
,
Computer Communication Networks
2022
Stroke is enlisted as one of the leading causes of death and serious disability affecting millions of human lives across the world with high possibilities of becoming an epidemic in the next few decades. Timely detection and prompt decision making pertinent to this disease, plays a major role which can reduce chances of brain death, paralysis and other resultant outcomes. Machine learning algorithms have been a popular choice for the diagnosis, analysis and predication of this disease but there exists issues related to data quality as they are collected cross-institutional resources. The present study focuses on improving the quality of stroke data implementing a rigorous pre-processing technique. The present study uses a multimodal stroke dataset available in the publicly available Kaggle repository. The missing values in this dataset are replaced with attribute means and LabelEncoder technique is applied to achieve homogeneity. However the dataset considered was observed to be imbalanced which reflect that the results may not represent the actual accuracy and would be biased. In order to overcome this imbalance, resampling technique was used. In case of oversampling, some data points in the minority class are replicated to increase the cardinality value and rebalance the dataset. transformed and oversampled data is further normalized using Standardscalar technique. Antlion optimization (ALO) algorithm is implemented on the deep neural network (DNN) model to select optimal hyperparameters in minimal time consumption. The proposed model consumed only 38.13% of the training time which was also a positive aspect. The experimental results proved the superiority of proposed model.
Journal Article
Class overlap handling methods in imbalanced domain: A comprehensive survey
2024
Class overlap in imbalanced datasets is the most common challenging situation for researchers in the fields of deep learning (DL) machine learning (ML), and big data (BD) based applications. Class overlap and imbalance data intrinsic characteristics negatively affect the performance of classification models. The data level, algorithm level, ensemble, and hybrid methods are the most commonly used solutions to reduce the biasing of the standard classification model towards the majority class. The data level methods change the distribution of class instances thus, increasing the information loss and overfitting. The algorithm-level methods attempt to modify its structure which gives more weight to the misclassified minority class instances in the learning phases. However, the changes in the algorithm are less compatible for the users. To overcome the issues in these methods, an in-depth discussion on the state-of-the-art methods is required and thus, presented here. In this survey, we presented a detailed discussion of the existing methods to handle class overlap in imbalanced datasets with their advantages, disadvantages, limitations, and key performance metrics in which the method shown outperformed. The detailed comparative analysis mainly of recent years’ papers discussed and summarized the research gaps and future directions for the researchers in ML, DL, and BD-based applications.
Journal Article
Using Information on Class Interrelations to Improve Classification of Multiclass Imbalanced Data: A New Resampling Algorithm
by
Stefanowski, Jerzy
,
Lango, Mateusz
,
Janicka, Małgorzata
in
Algorithms
,
data difficulty factors
,
Datasets
2019
The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better than specialized preprocessing techniques for multi-imbalanced problems.
Journal Article
Comparing Different Oversampling Methods in Predicting Multi-Class Educational Datasets Using Machine Learning Techniques
by
Tariq, Muhammad Arham
,
Iftikhar, Muhammad Aksam
,
Sargano, Allah Bux
in
Data re-sampling
,
Educational data mining
,
Imbalance educational datasets
2023
Predicting students’ academic performance is a critical research area, yet imbalanced educational datasets, characterized by unequal academic-level representation, present challenges for classifiers. While prior research has addressed the imbalance in binary-class datasets, this study focuses on multi-class datasets. A comparison of ten resampling methods (SMOTE, Adasyn, Distance SMOTE, BorderLineSMOTE, KmeansSMOTE, SVMSMOTE, LN SMOTE, MWSMOTE, Safe Level SMOTE, and SMOTETomek) is conducted alongside nine classification models: K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), Logistic Regression (LR), Extra Tree (ET), Random Forest (RT), Extreme Gradient Boosting (XGB), and Ada Boost (AdaB). Following a rigorous evaluation, including hyperparameter tuning and 10 fold cross-validations, KNN with SmoteTomek attains the highest accuracy of 83.7%, as demonstrated through an ablation study. These results emphasize SMOTETomek’s effectiveness in mitigating class imbalance in educational datasets and highlight KNN’s potential as an educational data mining classifier.
Journal Article
Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
by
Varotto, Giulia
,
Susi, Gianluca
,
Panzica, Ferruccio
in
Classification
,
Convulsions & seizures
,
Datasets
2021
Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery. Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered. Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method. Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome.
Journal Article
Three decades of coastal vegetation dynamics in the Castelporziano Presidential Estate: analysing biodiversity shifts in an exceptionally intact coastal dune system
by
Sperandii, M. G.
,
Sarmati, S.
,
Del Vecchio, S.
in
Abundance
,
Analysis
,
anthropogenic activities
2025
Mediterranean coastal dunes are among the most threatened ecosystems in Europe. Analysing temporal trends in a site with exceptionally well-preserved zonation and minimal anthropogenic disturbance offers a unique opportunity to deepen our understanding of vegetation dynamics under low-impact conditions in these vulnerable ecosystems. This study examines the temporal dynamics of coastal dune ecosystems within the Castelporziano Presidential Estate, which hosts intact Mediterranean dune systems with complete vegetation zonation. Revisiting 80 historical plots initially surveyed 30 years ago, we analysed changes in plant species occurrence and abundance over time using ordination and similarity percentage analysis. Additionally, we assessed shifts in typical, ruderal, and alien species, ecological indicator values, and an index based on rhizomatous geophyte grasses to evaluate the system’s erosion control capacity. Our results revealed no significant decline in species richness in foredunes and dune grasslands, contrasting with trends observed in other coastal dunes in Central Italy. Instead, we recorded an increase in typical species abundance in foredunes, likely resulting from limited human disturbance over the past 30 years. These changes are probably related to ongoing successional dynamics. Coastal shrublands underwent more pronounced changes, transitioning toward woodlands and experiencing an increase in typical species. These transformations suggest positive successional shifts. Our findings indicate that the coastal dune ecosystem is well-preserved, largely due to restricted human disturbance and effective management. This study underscores the value of resurveying methodologies for monitoring vegetation dynamics, offering critical insights to support conservation efforts for these unique Mediterranean habitats.
Journal Article