Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,640
result(s) for
"kernel density estimation"
Sort by:
Enhancing Broiler Weight Estimation through Gaussian Kernel Density Estimation Modeling
2024
The management of individual weights in broiler farming is not only crucial for increasing farm income but also directly linked to the revenue growth of integrated broiler companies, necessitating prompt resolution. This paper proposes a model to estimate daily average broiler weights using time and weight data collected through scales. In the proposed model, a method of self-adjusting weights in the bandwidth calculation formula is employed, and the daily average weight representative value is estimated using KDE. The focus of this study is to contribute to the individual weight management of broilers by intensively researching daily fluctuations in average broiler weight. To address this, weight and time data are collected and preprocessed through scales. The Gaussian kernel density estimation model proposed in this paper aims to estimate the representative value of the daily average weight of a single broiler using statistical estimation methods, allowing for self-adjustment of bandwidth values. When applied to the dataset collected through scales, the proposed Gaussian kernel density estimation model with self-adjustable bandwidth values confirmed that the estimated daily weight did not deviate beyond the error range of ±50 g compared with the actual measured values. The next step of this study is to systematically understand the impact of the broiler environment on weight for sustainable management strategies for broiler demand, derive optimal rearing conditions for each farm by combining location and weight data, and develop a model for predicting daily average weight values. The ultimate goal is to develop an artificial intelligence model suitable for weight management systems by utilizing the estimated daily average weight of a single broiler even in the presence of error data collected from multiple weight measurements, enabling more efficient automatic measurement of broiler weight and supporting both farms and broiler demand.
Journal Article
Integrated statistical modeling method: part I—statistical simulations for symmetric distributions
by
Kang, Young-Jin
,
Noh, Yoojeong
,
Lim, O-Kaung
in
Adequacy
,
Computational Mathematics and Numerical Analysis
,
Computer simulation
2019
The use of parametric and nonparametric statistical modeling methods differs depending on data sufficiency. For sufficient data, the parametric statistical modeling method is preferred owing to its high convergence to the population distribution. Conversely, for insufficient data, the nonparametric method is preferred owing to its high flexibility and conservative modeling of the given data. However, it is difficult for users to select either a parametric or nonparametric modeling method because the adequacy of using one of these methods depends on how well the given data represent the population model, which is unknown to users. For insufficient data or limited prior information on random variables, the interval approach, which uses interval information of data or random variables, can be used. However, it is still difficult to be used in uncertainty analysis and design, owing to imprecise probabilities. In this study, to overcome this problem, an integrated statistical modeling (ISM) method, which combines the parametric, nonparametric, and interval approaches, is proposed. The ISM method uses the two-sample Kolmogorov–Smirnov (K–S) test to determine whether to use either the parametric or nonparametric method according to data sufficiency. The sequential statistical modeling (SSM) and kernel density estimation with estimated bounded data (KDE-ebd) are used as the parametric and nonparametric methods combined with the interval approach, respectively. To verify the modeling accuracy, conservativeness, and convergence of the proposed method, it is compared with the original SSM and KDE-ebd according to various sample sizes and distribution types in simulation tests. Through an engineering and reliability analysis example, it is shown that the proposed ISM method has the highest accuracy and reliability in the statistical modeling, regardless of data sufficiency. The ISM method is applicable to real engineering data and is conservative in the reliability analysis for insufficient data, unlike the SSM, and converges to an exact probability of failure more rapidly than KDE-ebd as data increase.
Journal Article
Development of a kernel density estimation with hybrid estimated bounded data
2018
Uncertainty quantification, which identifies a probabilistic distribution for uncertain data, is important for yielding accurate and reliable results in reliability analysis and reliability-based design optimization. Sufficient data are needed for accurate uncertainty quantification, but data is very limited in engineering fields. For statistical modeling using insufficient data, kernel density estimation (KDE) with estimated bounded data (KDE-ebd) has been recently developed for more accurate and conservative estimation than the original KDE by combining given data and bounded data within estimated intervals of random variables from the given data. However, the estimated density function using KDE-ebd is modeled beyond the domain of random variables due to conservative estimation of the density function with long and thick tails. To overcome this problem, this paper proposes kernel density estimation with hybrid estimated bounded data (KDE-Hebd), which does not violate the domain of the random variables, and uses point or interval estimation of the bounds for generating the bounded data. KDE-ebd often yields too wide bounds for very insufficient data or large variations because it uses only the estimated intervals of random variables. The proposed KDE with hybrid estimated bounded data alternatively selects a point estimator or interval estimator according to whether the estimated intervals violate the domain of the random variables. The performance of the proposed method was evaluated by comparing the estimation accuracy from statistical simulation tests for mathematically derived sample data and real experimental data using KDE, KDE-ebd and KDE-Hebd. As a result, it was demonstrated that KDE-Hebd was more accurate than KDE-ebd without the violation of the domain of random variables, especially for a large coefficient of variation.
Journal Article
Fast Computation of Kernel Estimators
2010
The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)-making it prohibitively expensive for large datasets. While approximate methods like binning could speed up the computation, they lack a precise control over the accuracy of the approximation. There is no straightforward way of choosing the binning parameters a priori in order to achieve a desired approximation error. We propose a novel computationally efficient ε-exact approximation algorithm for the univariate Gaussian kernel-based density derivative estimation that reduces the computational complexity from O(nm) to linear O(n+m). The user can specify a desired accuracy ε. The algorithm guarantees that the actual error between the approximation and the original kernel estimate will always be less than ε. We also apply our proposed fast algorithm to speed up automatic bandwidth selection procedures. We compare our method to the best available binning methods in terms of the speed and the accuracy. Our experimental results show that the proposed method is almost twice as fast as the best binning methods and is around five orders of magnitude more accurate. The software for the proposed method is available online.
Journal Article
Density-based weighting for imbalanced regression
2021
In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called DenseWeight and a cost-sensitive learning approach for neural network regression with imbalanced data called DenseLoss based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point’s influence on the loss according to DenseWeight, giving rare data points more influence on model training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.
Journal Article
Flexible methods for species distribution modeling with small samples
2026
Species distribution models (SDMs) predict where species live or could potentially live and are a key resource for ecological research and conservation decision‐making. However, current SDM methods often perform poorly for rare or inadequately sampled species, which include most species on earth, as well as most of those of the greatest conservation concern. Here, we evaluated the performance of three modeling approaches designed for data‐deficient situations: plug‐and‐play modeling, density‐ratio modeling, and environmental‐range modeling. We compared the performance of algorithms within these approaches with the maximum entropy (MaxEnt) model, a widely used density‐ratio algorithm, both for data‐poor species and more generally. We also tested to what extent model cross‐validation performance on training data predicts model performance on independent, presence–absence data. We found that no algorithm performed best in all situations. Across all species, MaxEnt performed best on average but was outperformed by one or more of the plug‐and‐play, density‐ratio, or environmental‐range algorithms in 72% of cases. Six of the other algorithms had the area under the receiver operating characteristic curve (AUC) distributions not significantly different from MaxEnt's, and for data‐poor species (those with 20 or fewer occurrences), 24 of the algorithms considered had AUC distributions not significantly different from MaxEnt's. However, we found that the algorithm outputs (when thresholded to predict presence vs absence) spanned a wide sensitivity–specificity gradient. Specificity and prediction accuracy assessed on training data were strongly correlated with specificity and prediction accuracy assessed on independent presence–absence data. However, AUC and sensitivity were weakly correlated in training versus testing sets, with only 22% of species having the same model perform best when evaluated on training and independent, presence absence data. Finally, we show how ensembles of algorithms that span the sensitivity–specificity gradient can represent model disagreement in poorly sampled species and improve model predictions.
Journal Article
A comprehensive analysis of autocorrelation and bias in home range estimation
by
Paviolo, Agustin
,
da Silva, Marina Xavier
,
Fagan, William F.
in
animal movement
,
animals
,
Autocorrelation
2019
Home range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive data set of GPS locations from 369 individuals representing 27 species distributed across five continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function [AKDE], Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation (N̂area) to quantify the information content of each data set. We found that AKDE 95% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the hold-out sets by AKDE 95% (or 50%) estimates was 95.3% (or 50.1%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing N̂area. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small N̂area. While 72% of the 369 empirical data sets had >1,000 total observations, only 4% had an N̂area >1,000, where 30% had an N̂area <30. In this frequently encountered scenario of small N̂area, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data.
Journal Article
Transcriptome-wide characterization of human cytomegalovirus in natural infection and experimental latency
by
Caviness, Katie
,
Buehler, Jason
,
Cheng, Shu
in
Biological Sciences
,
Cells
,
Correlation analysis
2017
The transcriptional program associated with herpesvirus latency and the viral genes regulating entry into and exit from latency are poorly understood and controversial. Here, we developed and validated a targeted enrichment platform and conducted large-scale transcriptome analyses of human cytomegalovirus (HCMV) infection. We used both an experimental hematopoietic cell model of latency and cells from naturally infected, healthy human subjects (clinical) to define the breadth of viral genes expressed. The viral transcriptome derived from experimental infection was highly correlated with that from clinical infection, validating our experimental latency model. These transcriptomes revealed a broader profile of gene expression during infection in hematopoietic cells than previously appreciated. Further, using recombinant viruses that establish a nonreactivating, latent-like or a replicative infection in CD34⁺ hematopoietic progenitor cells, we defined classes of low to moderately expressed genes that are differentially regulated in latent vs. replicative states of infection. Most of these genes have yet to be studied in depth. By contrast, genes that were highly expressed, were expressed similarly in both latent and replicative infection. From these findings, a model emerges whereby low or moderately expressed genes may have the greatest impact on regulating the switch between viral latency and replication. The core set of viral genes expressed in natural infection and differentially regulated depending on the pattern of infection provides insight into the HCMV transcriptome associated with latency in the host and a resource for investigating virus–host interactions underlying persistence.
Journal Article
Seismic Hazard Analysis Using the Adaptive Kernel Density Estimation Technique for Chennai City
2012
Conventional method of probabilistic seismic hazard analysis (PSHA) using the Cornell–McGuire approach requires identification of homogeneous source zones as the first step. This criterion brings along many issues and, hence, several alternative methods to hazard estimation have come up in the last few years. Methods such as zoneless or zone-free methods, modelling of earth’s crust using numerical methods with finite element analysis, have been proposed. Delineating a homogeneous source zone in regions of distributed seismicity and/or diffused seismicity is rather a difficult task. In this study, the zone-free method using the adaptive kernel technique to hazard estimation is explored for regions having distributed and diffused seismicity. Chennai city is in such a region with low to moderate seismicity so it has been used as a case study. The adaptive kernel technique is statistically superior to the fixed kernel technique primarily because the bandwidth of the kernel is varied spatially depending on the clustering or sparseness of the epicentres. Although the fixed kernel technique has proven to work well in general density estimation cases, it fails to perform in the case of multimodal and long tail distributions. In such situations, the adaptive kernel technique serves the purpose and is more relevant in earthquake engineering as the activity rate probability density surface is multimodal in nature. The peak ground acceleration (PGA) obtained from all the three approaches (i.e., the Cornell–McGuire approach, fixed kernel and adaptive kernel techniques) for 10% probability of exceedance in 50 years is around 0.087 g. The uniform hazard spectra (UHS) are also provided for different structural periods.
Journal Article
n‐dimensional hypervolume
by
Violle, Cyrille
,
Blonder, Benjamin
,
Lamanna, Christine
in
anatomy and morphology
,
Animal and plant ecology
,
Animal, plant and microbial ecology
2014
AIM: The Hutchinsonian hypervolume is the conceptual foundation for many lines of ecological and evolutionary inquiry, including functional morphology, comparative biology, community ecology and niche theory. However, extant methods to sample from hypervolumes or measure their geometry perform poorly on high‐dimensional or holey datasets. INNOVATION: We first highlight the conceptual and computational issues that have prevented a more direct approach to measuring hypervolumes. Next, we present a new multivariate kernel density estimation method that resolves many of these problems in an arbitrary number of dimensions. MAIN CONCLUSIONS: We show that our method (implemented as the ‘hypervolume’ R package) can match several extant methods for hypervolume geometry and species distribution modelling. Tools to quantify high‐dimensional ecological hypervolumes will enable a wide range of fundamental descriptive, inferential and comparative questions to be addressed.
Journal Article