Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
5,119 result(s) for "Resampling"
Sort by:
NEGATIVE ASSOCIATION, ORDERING AND CONVERGENCE OF RESAMPLING METHODS
We study convergence and convergence rates for resampling schemes. Our first main result is a general consistency theorem based on the notion of negative association, which is applied to establish the almost sure weak convergence of measures output from Kitagawa’s [J. Comput. Graph. Statist. 5 (1996) 1–25] stratified resampling method. Carpenter, Ckiffird and Fearnhead’s [IEE Proc. Radar Sonar Navig. 146 (1999) 2–7] systematic resampling method is similar in structure but can fail to converge depending on the order of the input samples. We introduce a new resampling algorithm based on a stochastic rounding technique of [In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) (2001) 588–597 IEEE Computer Soc.], which shares some attractive properties of systematic resampling, but which exhibits negative association and, therefore, converges irrespective of the order of the input samples. We confirm a conjecture made by [J. Comput. Graph. Statist. 5 (1996) 1–25] that ordering input samples by their states in ℝ yields a faster rate of convergence; we establish that when particles are ordered using the Hilbert curve in ℝ d , the variance of the resampling error is 𝓞(N −(1+1/d)) under mild conditions, where N is the number of particles. We use these results to establish asymptotic properties of particle algorithms based on resampling schemes that differ from multinomial resampling.
UFBoot2: Improving the Ultrafast Bootstrap Approximation
The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.
Resampling strategies for imbalanced regression: a survey and empirical analysis
Imbalanced problems can arise in different real-world situations, and to address this, certain strategies in the form of resampling or balancing algorithms are proposed. This issue has largely been studied in the context of classification, and yet, the same problem features in regression tasks, where target values are continuous. This work presents an extensive experimental study comprising various balancing and predictive models, and wich uses metrics to capture important elements for the user and to evaluate the predictive model in an imbalanced regression data context. It also proposes a taxonomy for imbalanced regression approaches based on three crucial criteria: regression model, learning process, and evaluation metrics. The study offers new insights into the use of such strategies, highlighting the advantages they bring to each model’s learning process, and indicating directions for further studies. The code, data and further information related to the experiments performed herein can be found on GitHub: https://github.com/JusciAvelino/imbalancedRegression .
Determination of the binding and DK probability of the Ds0∗(2317) from the (D¯K¯)- mass distributions in Λb→Λc(D¯K¯)- decays
We study the Λb→ΛcD¯0K- and Λb→ΛcD-K¯0 decays which proceed via a Cabibbo and Nc favored process of external emission, and we determine the D¯0K- and D-K¯0 mass distributions close to the D¯K¯ threshold. For this, we use the tree level contribution plus the rescattering of the meson-meson components, using the extension of the local hidden gauge approach to the charm sector that produces the Ds0∗(2317) resonance. We observe a large enhancement of the mass distributions close to threshold due to the presence of this resonance below threshold. Next we undertake the inverse problem of extracting the maximum information on the interaction of the D¯K¯ channels from these distributions, and using the resampling method we find that from these data one can obtain precise values of the scattering lengths and effective ranges, the existence of an I=0 bound state with a precision of about 4MeV in the mass, plus the D¯K¯ molecular probability of this state with reasonable precision. Given the fact that the Λb→ΛcD¯0K- decay is already measured by the LHCb collaboration, it is expected that in the next runs with more statistics of the decay, these mass distributions can be measured with precision and the method proposed here can be used to determine the nature of the Ds0∗(2317), which is still an issue of debate.
Pearson's Chi-square Test and Rank Correlation Inferences for Clustered Data
Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration.
A rule-based machine learning model for financial fraud detection
Financial fraud is a growing problem that poses a significant threat to the banking industry, the government sector, and the public. In response, financial institutions must continuously improve their fraud detection systems. Although preventative and security precautions are implemented to reduce financial fraud, criminals are constantly adapting and devising new ways to evade fraud prevention systems. The classification of transactions as legitimate or fraudulent poses a significant challenge for existing classification models due to highly imbalanced datasets. This research aims to develop rules to detect fraud transactions that do not involve any resampling technique. The effectiveness of the rule-based model (RBM) is assessed using a variety of metrics such as accuracy, specificity, precision, recall, confusion matrix, Matthew’s correlation coefficient (MCC), and receiver operating characteristic (ROC) values. The proposed rule-based model is compared to several existing machine learning models such as random forest (RF), decision tree (DT), multi-layer perceptron (MLP), k-nearest neighbor (KNN), naive Bayes (NB), and logistic regression (LR) using two benchmark datasets. The results of the experiment show that the proposed rule-based model beat the other methods, reaching accuracy and precision of 0.99 and 0.99, respectively.
Two-Step Estimation and Inference with Possibly Many Included Covariates
We study the implications of including many covariates in a first-step estimate entering a two-step estimation procedure. We find that a first-order bias emerges when the number of included covariates is “large” relative to the square-root of sample size, rendering standard inference procedures invalid. We show that the jackknife is able to estimate this “many covariates” bias consistently, thereby delivering a new automatic bias-corrected two-step point estimator. The jackknife also consistently estimates the standard error of the original two-step point estimator. For inference, we develop a valid post-bias-correction bootstrap approximation that accounts for the additional variability introduced by the jackknife bias-correction. We find that the jackknife bias-corrected point estimator and the bootstrap post-bias-correction inference perform excellent in simulations, offering important improvements over conventional two-step point estimators and inference procedures, which are not robust to including many covariates. We apply our results to an array of distinct treatment effect, policy evaluation, and other applied microeconomics settings. In particular, we discuss production function and marginal treatment effect estimation in detail.
Feasibility of Low Latency, Single-Sample Delay Resampling: A New Kriging Based Method
Wireless sensor systems often fail to provide measurements with uniform time spacing. Measurements can be delayed or even miss completely. Resampling to uniform intervals is necessary to satisfy the requirements of subsequent signal processing. Common resampling algorithms, based on symmetric finite impulse response (FIR) filters, entail a group delay of 10 s of samples, which is not acceptable regarding the typical interval of wireless sensors of seconds or minutes. The purpose of this paper is to verify the feasibility of single-delay resampling, i.e., the algorithm resamples the data without waiting for future samples. A new method to parametrize Kriging interpolation is presented and compared with two variants of Lagrange interpolation in detailed simulations for the resulting prediction error. Kriging provided the most accurate resampling in the group-delay scenario. The single-delay scenario required almost double the OSR to achieve the same signal-to-noise ratio (SNR). An OSR between 1.8 and 3.1 was necessary for single-delay resampling, depending on the required SNR and signal distortions in terms of jitter, missing samples, and noise. Kriging was the least noise-sensitive method. Especially for signals with missing samples, Kriging provided the best accuracy. The simulations showed that single-delay resampling is feasible, but at the expense of higher OSR and limited SNR.
High-Resolution High-Squint Large-Scene Spaceborne Sliding Spotlight SAR Processing via Joint 2D Time and Frequency Domain Resampling
A frequency domain imaging algorithm, featured as joint two-dimensional (2D) time and frequency domain resampling, used for high-resolution high-squint large-scene (HHL) spaceborne sliding spotlight synthetic aperture radar (SAR) processing is proposed in this paper. Due to the nonlinear beam rotation during HHL data acquisition, the Doppler centroid varies nonlinearly with azimuth time and traditional sub-aperture approaches and two step approach fail to remove the inertial Doppler aliasing of spaceborne sliding spotlight SAR data. In addition, curved orbit effect and long synthetic aperture time make the range histories difficult to model and introduce space-variants in both range and azimuth. In this paper, we use the azimuth deramping and 2D time-domain azimuth resampling, collectively referred to as preprocessing, to eliminate the aliasing in Doppler domain and correct the range-dependent azimuth-variants of range histories. After preprocessing, the squint sliding spotlight SAR data could be considered as equivalent broadside strip-map SAR during processing. Frequency domain focusing, mainly involves phase multiplication and resampling in 2D frequency and RD domain, is then applied to compensate for the residual space-variants and achieve the focusing of SAR data. Moreover, in order to adapt higher resolution and larger scene cases, the combination of the proposed algorithm and partitioning strategy is also discussed in this paper. Processing results of simulation data and Gaofen-3 experimental data are presented to demonstrate the feasibility of the proposed methods.
A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare
In many healthcare applications, datasets for classification may be highly imbalanced due to the rare occurrence of target events such as disease onset. The SMOTE (Synthetic Minority Over-sampling Technique) algorithm has been developed as an effective resampling method for imbalanced data classification by oversampling samples from the minority class. However, samples generated by SMOTE may be ambiguous, low-quality and non-separable with the majority class. To enhance the quality of generated samples, we proposed a novel self-inspected adaptive SMOTE (SASMOTE) model that leverages an adaptive nearest neighborhood selection algorithm to identify the “visible” nearest neighbors, which are used to generate samples likely to fall into the minority class. To further enhance the quality of the generated samples, an uncertainty elimination via self-inspection approach is introduced in the proposed SASMOTE model. Its objective is to filter out the generated samples that are highly uncertain and inseparable with the majority class. The effectiveness of the proposed algorithm is compared with existing SMOTE-based algorithms and demonstrated through two real-world case studies in healthcare, including risk gene discovery and fatal congenital heart disease prediction. By generating the higher quality synthetic samples, the proposed algorithm is able to help achieve better prediction performance (in terms of F1 score) on average compared to the other methods, which is promising to enhance the usability of machine learning models on highly imbalanced healthcare data.