Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
73 result(s) for "re-sampling"
Sort by:
Resurveying historical vegetation data — opportunities and challenges
Background: Resurveying historical vegetation plots has become more and more popular in recent years as it provides a unique opportunity to estimate vegetation and environmental changes over the past decades. Most historical plots, however, are not permanently marked and uncertainty in plot location, in addition to observer bias and seasonal bias, may add significant errors to temporal change. These errors may have major implications for the reliability of studies on long-term environmental change and deserve closer attention of vegetation ecologists. Methods: Vegetation data obtained from the resurveying of non-permanently marked plots are assessed for their potential to study environmental change effects on plant communities and the challenges the use of such data have to meet. We describe the properties of vegetation resurveys, distinguishing basic types of plots according to relocation error, and we highlight the potential of such data types for studying vegetation dynamics and their drivers. Finally, we summarize the challenges and limitations of resurveying non-permanently marked vegetation plots for different purposes in environmental change research. Results and conclusions: Re-sampling error is caused by three main independent sources of error: error caused by plot relocation, observer bias and seasonality bias. For relocation error, vegetation plots can be divided into permanent and non-permanent plots, while the latter are further divided into quasi-permanent (with approximate relocation) and non-traceable (with random relocation within a sampled area) plots. To reduce the inherent sources of error in resurvey data, the following precautions should be followed: (i) resurvey historical vegetation plots whose approximate plot location within a study area is known; (ii) consider all information available from historical studies in order to keep plot relocation errors low; (iii) resurvey at times of the year when vegetation development is comparable to the historical survey to control for seasonal variability in vegetation; (iv) retain a high level of experience of the observers to keep observer bias low; and (v) edit and standardize data sets before analyses.
Biotic homogenization of upland vegetation: patterns and drivers at multiple spatial scales over five decades
Questions: Is there evidence for biotic homogenization of upland vegetation? Do the magnitude and nature of floristic and compositional change vary between vegetation types? What can be inferred about the drivers responsible for the observed changes? Location: Upland heath, mire and grassland communities of the northwest Highlands of Scotland, UK. Methods: We re-survey plots first described in a phytosociological study of 1956—1958 to assess the changes in plant species composition over the last 50 yr in five major upland vegetation types. Using a combination of multivariate analysis, dissimilarity measures, diversity metrics and published data on species attributes; we quantify, characterize and link potential drivers of environmental change with the observed changes in species composition. Results: Grassland and heath vegetation declined in species richness and variation in community composition, while mires showed little change. Previously distinct vegetation types became more similar in composition, characterized by the increased dominance of generalist upland graminoids and reduced dwarfshrub, forb and lichen cover, although novel assemblages were not apparent. Species with an oceanic distribution increased at the expense of those with an arctic-montane distribution. Temperature, precipitation and acidity were found to be potentially important in explaining changes in species composition: species that had undergone the greatest increases had a preference for warmer, drier and more acidic conditions. Conclusions: The vegetation of the northwest Scottish Highlands has undergone marked biotic homogenization over the last 50 yr, manifested through a loss of various aspects of diversity at the local, community and landscape scales. The magnitude of change varies between vegetation types, although the nature of change shows many similar characteristics. Analyses of species attributes suggest these changes are driven by climate warming and acidification, although over-grazing may also be important. This study highlights the importance of the link between the loss of plant diversity and homogenization at multiple scales, and demonstrates that boreal heath communities are particularly at risk from these processes.
Class-Difficulty Based Methods for Long-Tailed Visual Recognition
Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called ‘class-wise difficulty based weighted (CDB-W) loss’ and a novel data sampling technique called ‘class-wise difficulty based sampling (CDB-S)’. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.
Antlion re-sampling based deep neural network model for classification of imbalanced multimodal stroke dataset
Stroke is enlisted as one of the leading causes of death and serious disability affecting millions of human lives across the world with high possibilities of becoming an epidemic in the next few decades. Timely detection and prompt decision making pertinent to this disease, plays a major role which can reduce chances of brain death, paralysis and other resultant outcomes. Machine learning algorithms have been a popular choice for the diagnosis, analysis and predication of this disease but there exists issues related to data quality as they are collected cross-institutional resources. The present study focuses on improving the quality of stroke data implementing a rigorous pre-processing technique. The present study uses a multimodal stroke dataset available in the publicly available Kaggle repository. The missing values in this dataset are replaced with attribute means and LabelEncoder technique is applied to achieve homogeneity. However the dataset considered was observed to be imbalanced which reflect that the results may not represent the actual accuracy and would be biased. In order to overcome this imbalance, resampling technique was used. In case of oversampling, some data points in the minority class are replicated to increase the cardinality value and rebalance the dataset. transformed and oversampled data is further normalized using Standardscalar technique. Antlion optimization (ALO) algorithm is implemented on the deep neural network (DNN) model to select optimal hyperparameters in minimal time consumption. The proposed model consumed only 38.13% of the training time which was also a positive aspect. The experimental results proved the superiority of proposed model.
Class overlap handling methods in imbalanced domain: A comprehensive survey
Class overlap in imbalanced datasets is the most common challenging situation for researchers in the fields of deep learning (DL) machine learning (ML), and big data (BD) based applications. Class overlap and imbalance data intrinsic characteristics negatively affect the performance of classification models. The data level, algorithm level, ensemble, and hybrid methods are the most commonly used solutions to reduce the biasing of the standard classification model towards the majority class. The data level methods change the distribution of class instances thus, increasing the information loss and overfitting. The algorithm-level methods attempt to modify its structure which gives more weight to the misclassified minority class instances in the learning phases. However, the changes in the algorithm are less compatible for the users. To overcome the issues in these methods, an in-depth discussion on the state-of-the-art methods is required and thus, presented here. In this survey, we presented a detailed discussion of the existing methods to handle class overlap in imbalanced datasets with their advantages, disadvantages, limitations, and key performance metrics in which the method shown outperformed. The detailed comparative analysis mainly of recent years’ papers discussed and summarized the research gaps and future directions for the researchers in ML, DL, and BD-based applications.
Comparing Different Oversampling Methods in Predicting Multi-Class Educational Datasets Using Machine Learning Techniques
Predicting students’ academic performance is a critical research area, yet imbalanced educational datasets, characterized by unequal academic-level representation, present challenges for classifiers. While prior research has addressed the imbalance in binary-class datasets, this study focuses on multi-class datasets. A comparison of ten resampling methods (SMOTE, Adasyn, Distance SMOTE, BorderLineSMOTE, KmeansSMOTE, SVMSMOTE, LN SMOTE, MWSMOTE, Safe Level SMOTE, and SMOTETomek) is conducted alongside nine classification models: K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), Logistic Regression (LR), Extra Tree (ET), Random Forest (RT), Extreme Gradient Boosting (XGB), and Ada Boost (AdaB). Following a rigorous evaluation, including hyperparameter tuning and 10 fold cross-validations, KNN with SmoteTomek attains the highest accuracy of 83.7%, as demonstrated through an ablation study. These results emphasize SMOTETomek’s effectiveness in mitigating class imbalance in educational datasets and highlight KNN’s potential as an educational data mining classifier.
Accessible Conceptions of Statistical Inference: Pulling Ourselves Up by the Bootstraps
With the rapid, ongoing expansions in the world of data, we need to devise ways of getting more students much further, much faster. One of the choke points affecting both accessibility to a broad spectrum of students and faster progress is classical statistical inference based on normal theory. In this paper, bootstrap-based confidence intervals and randomisation tests conveyed through dynamic visualisation are developed as a means of reducing cognitive demands and increasing the speed with which application areas can be opened up. We also discuss conceptual pathways and the design of software developed to enable this approach.
Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery. Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered. Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method. Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome.
Using Information on Class Interrelations to Improve Classification of Multiclass Imbalanced Data: A New Resampling Algorithm
The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better than specialized preprocessing techniques for multi-imbalanced problems.
Allometric models for non-destructive estimation of the leaflet area in acai (Euterpe oleracea Mart.)
Key messageThe leaflet area of acai (Euterpe oleracea) can be estimated by an exponential regression model adjusted by the relationship of leaflet maximum length and width.This work was carried out aiming to fit linear regression models for the non-invasive estimation of leaflet area (LA) in acai (Euterpe oleracea Mart.). Thus, 5010 leaflets were sampled from 403 fronds sampled on 100 acai seedlings. Maximum length (LL) and width (LW) of each leaflet were measured with a ruler and LA was determined using a leaf area meter. Half of the data set was used to adjust the models and the other half was used for model validation. The Jackknife re-sampling method was applied to reduce model bias. Two double-entry models (models A and B) were fitted using LL and LW simultaneously, while these linear dimensions of the leaves were separately considered in single-entry models (models C to F). The adjusted coefficients of determination varied between 0.9075 and 0.9785, with the highest values observed in models A and B, which also showed the lowest standard error of the estimate and Akaike's information criterion (AIC) score. All models were highly accurate in estimating LA, with values above 0.9156; however, the double-entry models A and B showed the best performance regarding the relationship between estimated and observed LA. Comparing the double-entry models, the lowest AIC score in model B indicates that this model is the most parsimonious for non-invasive estimation of acai leaflet area in relation to model A. Therefore, the equation LA=1.0147e0.3685+0.8165lnLL×LW, deduced from model B, is the more precise model for the non-invasive determination of leaflet area in acai seedlings.