Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
3,253 result(s) for "Kernel Regression"
Sort by:
The Cumulative and Single Effect of 12 Aldehydes Concentrations on Cardiovascular Diseases: An Analysis Based on Bayesian Kernel Machine Regression and Weighted Logistic Regression
Background: This study investigates the individual and cumulative effects of 12 aldehydes concentrations on cardiovascular disease (CVD). Methods: A total of 1529 individuals from the 2013–2014 National Health and Nutrition Examination Survey were enrolled. We assessed serum concentrations of 12 aldehydes, including benzaldehyde, butyraldehyde, crotonaldehyde, decanaldehyde, heptanaldehyde, hexanaldehyde, isopentanaldehyde, nonanaldehyde, octanaldehyde, o-tolualdehyde, pentanaldehyde, and propanaldehyde. CVD patients were identified based on self-reported disease history from questionnaires. The Bayesian kernel machine regression was used to evaluate the cumulative effect of 12 aldehyde concentrations on CVD. Both weighted and unweighted logistic regression were used to assess the association of serum aldehyde concentrations with CVD, presenting effect sizes as odds ratio (OR) with 95% confidence interval (CI). Additionally, a restricted cubic spline analysis was also conducted to explore the relationship between benzaldehyde and CVD. Results: Among the participants, 111 (7.3%) were identified as having CVD. Isopentanaldehyde concentrations were notably higher in CVD patients compared to those without CVD. Bayesian kernel machine regression indicated no cumulative effect of aldehydes on CVD. Unweighted logistic regression revealed a positive association between benzaldehyde and CVD when adjusting for age and sex (OR = 1.12, 95% CI = 1.03–1.21). This association persisted after adjusting for age, sex, race, education, hypertension, diabetes, alcohol consumption, and smoking, with an OR of 1.12 (95% CI = 1.02–1.22). The restricted cubic spline showed a linear association between benzaldehyde and CVD. In the weighted logistic model, the association between benzaldehyde and CVD remains significant (OR = 1.17, 95% CI = 1.06–1.29). However, no significant association was found between other aldehydes and CVD. Conclusions: Our study reveals the potential contributing role of benzaldehyde to CVD. Future studies should further validate these findings in diverse populations and elucidate the underlying biological mechanisms.
Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers
Four approaches using single-nucleotide polymorphism (SNP) information (F∞-metric model, kernel regression, reproducing kernel Hilbert spaces (RKHS) regression, and a Bayesian regression) were compared with a standard procedure of genetic evaluation (E-BLUP) of sires using mortality rates in broilers as a response variable, working in a Bayesian framework. Late mortality (14–42 days of age) records on 12,167 progeny of 200 sires were precorrected for fixed and random (nongenetic) effects used in the model for genetic evaluation and for the mate effect. The average of the corrected records was computed for each sire. Twenty-four SNPs seemingly associated with late mortality were included in three methods used for genomic assisted evaluations. One thousand SNPs were included in the Bayesian regression, to account for markers along the whole genome. The posterior mean of heritability of mortality was 0.02 in the E-BLUP approach, suggesting that genetic evaluation could be improved if suitable molecular markers were available. Estimates of posterior means and standard deviations of the residual variance were 24.38 (3.88), 29.97 (3.22), 17.07 (3.02), and 20.74 (2.87) for E-BLUP, the linear model on SNPs, RKHS regression, and the Bayesian regression, respectively, suggesting that RKHS accounted for more variance in the data. The two nonparametric methods (kernel and RKHS regression) fitted the data better, having a lower residual sum of squares. Predictive ability, assessed by cross-validation, indicated advantages of the RKHS approach, where accuracy was increased from 25 to 150%, relative to other methods.
Aggregate Kernel Inverse Regression Estimation
Sufficient dimension reduction (SDR) is a useful tool for nonparametric regression with high-dimensional predictors. Many existing SDR methods rely on some assumptions about the distribution of predictors. Wang et al. proposed an aggregate dimension reduction method to reduce the dependence on the distributional assumptions. Motivated by their work, we propose a novel and effective method by combining the aggregate method and the kernel inverse regression estimation. The proposed approach can accurately estimate the dimension reduction directions and substantially improve the exhaustivity of the estimates with complex models. At the same time, this method does not depend on the arrangement of slices, and the influence of the extreme values of the response is reduced. In numerical examples and a real data application, it performs well.
Intrapopulation diversity in isotopic niche over landscapes: Spatial patterns inform conservation of bear–salmon systems
Intrapopulation variability in resource acquisition (i.e., niche variation) influences population dynamics, with important implications for conservation planning. Spatial analyses of niche variation within and among populations can provide relevant information about ecological associations and their subsequent management. We used stable isotope analysis and kernel‐weighted regression to examine spatial patterns in a keystone consumer–resource interaction: salmon (Oncorhynchus spp.) consumption by grizzly and black bears (Ursus arctos horribilis, n = 886; and Ursus americanus, n = 557) from 1995 to 2014 in British Columbia (BC), Canada. In a region on the central coast of BC (22,000 km2), grizzly bears consumed far more salmon than black bears (median proportion of salmon in assimilated diet of 0.62 and 0.06, respectively). Males of both species consumed more salmon than females (median proportions of 0.63 and 0.57 for grizzly bears and 0.06 and 0.03 for black bears, respectively). Black bears showed considerably more spatial variation in salmon consumption than grizzlies. Protected areas on the coast captured no more habitat for bears with high‐salmon diets (i.e., proportions >0.5 of total diet) than did unprotected areas. In a continental region (~692,000 km2), which included the entire contemporary range of grizzlies in BC, males had higher salmon diets than females (median proportions of 0.41 and 0.04, respectively). High‐salmon diets were concentrated in coastal areas for female grizzly bears, whereas males with high‐salmon diets in interior areas were restricted to areas near major salmon watersheds. To safeguard this predator–prey association that spans coastal and interior regions, conservation planners and practitioners can consider managing across ecological and jurisdictional boundaries. More broadly, our approach highlights the importance of visualizing spatial patterns of dietary niche variation within populations to characterize ecological associations and inform management.
Bayesian Approximate Kernel Regression With Variable Selection
Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this article, we propose a novel framework that provides an effect size analog for each explanatory variable in Bayesian kernel regression models when the kernel is shift-invariant-for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. This projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion, we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e., phenotypic prediction) and association mapping (i.e., inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings. Supplementary materials for this article are available online.
Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.
LINEARIZED TWO-LAYERS NEURAL NETWORKS IN HIGH DIMENSION
We consider the problem of learning an unknown function f * on the d-dimensional sphere with respect to the square loss, given i.i.d. samples {(yi , xi )} i≤n where xi is a feature vector uniformly distributed on the sphere and yi = f *(xi ) + εi . We study two popular classes of models that can be regarded as linearizations of two-layers neural networks around a random initialization: the random features model of Rahimi–Recht (RF); the neural tangent model of Jacot–Gabriel–Hongler (NT). Both these models can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons N diverges, for a fixed dimension d. We consider two specific regimes: the infinite-sample finite-width regime, in which n = ∞ while d and N are large but finite, and the infinite-width finite-sample regime in which N = ∞ while d and n are large but finite. In the first regime, we prove that if d ℓ+δ ≤ N ≤ d ℓ+1−δ for small δ > 0, then RF effectively fits a degree-ℓ polynomial in the raw features, and NT fits a degree-(ℓ + 1) polynomial. In the second regime, both RF and NT reduce to kernel methods with rotationally invariant kernels. We prove that, if the sample size satisfies d ℓ+δ ≤ n ≤ d ℓ+1−δ , then kernel methods can fit at most a degree-ℓ polynomial in the raw features. This lower bound is achieved by kernel ridge regression, and near-optimal prediction error is achieved for vanishing ridge regularization.
Fast and Stable Multivariate Kernel Density Estimation by Fast Sum Updating
Kernel density estimation and kernel regression are powerful but computationally expensive techniques: a direct evaluation of kernel density estimates at M evaluation points given N input sample points requires a quadratic operations, which is prohibitive for large scale problems. For this reason, approximate methods such as binning with fast Fourier transform or the fast Gauss transform have been proposed to speed up kernel density estimation. Among these fast methods, the fast sum updating approach is an attractive alternative, as it is an exact method and its speed is independent of the input sample and the bandwidth. Unfortunately, this method, based on data sorting, has for the most part been limited to the univariate case. In this article, we revisit the fast sum updating approach and extend it in several ways. Our main contribution is to extend it to the general multivariate case for general input data and rectilinear evaluation grid. Other contributions include its extension to a wider class of kernels, including the triangular, cosine, and Silverman kernels, its combination with parsimonious additive multivariate kernels, and its combination with a fast approximate k-nearest-neighbors bandwidth for multivariate datasets. Our numerical tests confirm the speed, accuracy, and stability of the method.
Construction of a consistent high-definition spatio-temporal atlas of the developing brain using adaptive kernel regression
Medical imaging has shown that, during early development, the brain undergoes more changes in size, shape and appearance than at any other time in life. A better understanding of brain development requires a spatio-temporal atlas that characterizes the dynamic changes during this period. In this paper we present an approach for constructing a 4D atlas of the developing brain, between 28 and 44weeks post-menstrual age at time of scan, using T1 and T2 weighted MR images from 204 premature neonates. The method used for the creation of the average 4D atlas utilizes non-rigid registration between all pairs of images to eliminate bias in the atlas toward any of the original images. In addition, kernel regression is used to produce age-dependent anatomical templates. A novelty in our approach is the use of a time-varying kernel width, to overcome the variations in the distribution of subjects at different ages. This leads to an atlas that retains a consistent level of detail at every time-point. Comparisons between the resulting atlas and atlases constructed using affine and non-rigid registration are presented. The resulting 4D atlas has greater anatomic definition than currently available 4D atlases created using various affine and non-rigid registration approaches, an important factor in improving registrations between the atlas and individual subjects. Also, the resulting 4D atlas can serve as a good representative of the population of interest as it reflects both global and local changes. The atlas is publicly available at www.brain-development.org. ► A new approach for constructing high-definition 4D atlases is presented. ► Multi-modal MR images from 204 preterm neonates (age-range 26–44weeks PMA) are used. ► The resulting atlas retains a consistent level of detail at every time-point. ► The atlas has greater anatomic definition than currently available 4D atlases. ► Such a sharp atlas enables improved registration between the atlas and individuals.
THE HARDNESS OF CONDITIONAL INDEPENDENCE TESTING AND THE GENERALISED COVARIANCE MEASURE
It is a common saying that testing for conditional independence, that is, testing whether whether two random vectors X and Y are independent, given Z, is a hard statistical problem if Z is a continuous random variable (or vector). In this paper, we prove that conditional independence is indeed a particularly difficult hypothesis to test for. Valid statistical tests are required to have a size that is smaller than a pre-defined significance level, and different tests usually have power against a different class of alternatives. We prove that a valid test for conditional independence does not have power against any alternative. Given the nonexistence of a uniformly valid conditional independence test, we argue that tests must be designed so their suitability for a particular problem may be judged easily. To address this need, we propose in the case where X and Y are univariate to nonlinearly regress X on Z, and Y on Z and then compute a test statistic based on the sample covariance between the residuals, which we call the generalised covariance measure (GCM). We prove that validity of this form of test relies almost entirely on the weak requirement that the regression procedures are able to estimate the conditional means X given Z, and Y given Z, at a slow rate. We extend the methodology to handle settings where X and Y may be multivariate or even high dimensional. While our general procedure can be tailored to the setting at hand by combining it with any regression technique, we develop the theoretical guarantees for kernel ridge regression. A simulation study shows that the test based on GCM is competitive with state of the art conditional independence tests. Code is available as the R package GeneralisedCovarianceMeasure on CRAN.