Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
1,460 result(s) for "Simulated data"
Sort by:
Recommendations for motion correction of infant fNIRS data applicable to multiple data sets and acquisition systems
Despite motion artifacts are a major source of noise in fNIRS infant data, how to approach motion correction in this population has only recently started to be investigated. Homer2 offers a wide range of motion correction methods and previous work on simulated and adult data suggested the use of Spline interpolation and Wavelet filtering as optimal methods for the recovery of trials affected by motion. However, motion artifacts in infant data differ from those in adults’ both in amplitude and frequency of occurrence. Therefore, artifact correction recommendations derived from adult data might not be optimal for infant data. We hypothesized that the combined use of Spline and Wavelet would outperform their individual use on data with complex profiles of motion artifacts. To demonstrate this, we first compared, on infant semi-simulated data, the performance of several motion correction techniques on their own and of the novel combined approach; then, we investigated the performance of Spline and Wavelet alone and in combination on real cognitive data from three datasets collected with infants of different ages (5, 7 and 10 months), with different tasks (auditory, visual and tactile) and with different NIRS systems. To quantitatively estimate and compare the efficacy of these techniques, we adopted four metrics: hemodynamic response recovery error, within-subject standard deviation, between-subjects standard deviation and number of trials that survived each correction method. Our results demonstrated that (i) it is always better correcting for motion artifacts than rejecting the corrupted trials; (ii) Wavelet filtering on its own and in combination with Spline interpolation seems to be the most effective approach in reducing the between- and the within-subject standard deviations. Importantly, the combination of Spline and Wavelet was the approach providing the best performance in semi-simulation both at low and high levels of noise, also recovering most of the trials affected by motion artifacts across all datasets, a crucial result when working with infant data. •Comparison of motion correction techniques on semi-simulated and real fNIRS infant data.•Spline and wavelet combined outperform the individual use of these techniques.•Spline and wavelet combined better recovered the true HRF in simulated data.•Spline and wavelet combined had the best performance in motion artifact correction.•Spline and wavelet combined saved nearly all corrupted trials across all datasets.
Which results of the standard test for community-weighted mean approach are too optimistic?
Aims The community‐weighted mean (CWM) approach is used to analyse the relationship between species attributes (traits, Ellenberg‐type indicator values) and sample attributes (environmental variables, richness) via the community matrix. It has recently been shown to suffer from inflated Type I error rate if tested by a standard test and the results of many published studies are probably affected. I review the current knowledge about this problem, and clarify which studies are likely affected and by how much. Methods I suggest classifying hypotheses commonly tested by CWM approach into three categories, which differ in the formulation of the null hypothesis. I use simulated and real data to show how the Type I error rate of the standard test is affected by data characteristics. Results The CWM approach with the standard test returns a correct Type I error rate for hypotheses assuming a link between species attributes and composition (Category A). However, for hypotheses assuming a link between composition and sample attributes (Category B) or not assuming any link (Category C), the standard test is inflated, and alternative tests are needed to control for this. The inflation of standard tests for Category C is negatively related to the compositional β‐diversity, and positively to the strength of the composition–sample attributes relationship and data set sample size. These results apply to CWM analyses with extrinsic sample attributes (not derived from the compositional matrix). CWM analysis with intrinsic sample attributes (derived from the composition, such as species richness) is a case of spurious correlation and can be tested using a column‐based (modified) permutation test. Conclusions The concept of three hypothesis categories offers a simple tool to evaluate which hypothesis has been tested and whether the results have correct or inflated Type I error rate. In the case of inflated results, the level of inflation can be estimated from the data characteristics. Community weighted mean approach tests the relationship of species attributes (traits, indicator values) to sample attributes (environmental variables, richness) and the test is known to have inflated Type I error rate. I argue that whether test results are inflated depends on the type of tested hypothesis and that the level of inflation depends on dataset parameters (e.g. beta diversity).
Consequences of multiple imputation of missing standard deviations and sample sizes in meta‐analysis
Meta‐analyses often encounter studies with incompletely reported variance measures (e.g., standard deviation values) or sample sizes, both needed to conduct weighted meta‐analyses. Here, we first present a systematic literature survey on the frequency and treatment of missing data in published ecological meta‐analyses showing that the majority of meta‐analyses encountered incompletely reported studies. We then simulated meta‐analysis data sets to investigate the performance of 14 options to treat or impute missing SDs and/or SSs. Performance was thereby assessed using results from fully informed weighted analyses on (hypothetically) complete data sets. We show that the omission of incompletely reported studies is not a viable solution. Unweighted and sample size‐based variance approximation can yield unbiased grand means if effect sizes are independent of their corresponding SDs and SSs. The performance of different imputation methods depends on the structure of the meta‐analysis data set, especially in the case of correlated effect sizes and standard deviations or sample sizes. In a best‐case scenario, which assumes that SDs and/or SSs are both missing at random and are unrelated to effect sizes, our simulations show that the imputation of up to 90% of missing data still yields grand means and confidence intervals that are similar to those obtained with fully informed weighted analyses. We conclude that multiple imputation of missing variance measures and sample sizes could help overcome the problem of incompletely reported primary studies, not only in the field of ecological meta‐analyses. Still, caution must be exercised in consideration of potential correlations and pattern of missingness. Meta‐analyses often encounter studies with incompletely reported variance measures (e.g., standard deviation values) or sample sizes, both needed to conduct weighted meta‐analyses. We present a systematic literature survey on the frequency and treatment of missing data in published ecological meta‐analyses. Simulating the effect of 14 different options to treat missing data in meta‐analysis, we show that multiple imputation of missing variance measures and sample sizes could help overcome the problem of incompletely reported primary studies.
Physics-guided machine learning from simulated data with different physical parameters
Physics-based models are widely used to study dynamical systems in a variety of scientific and engineering problems. However, these models are necessarily approximations of reality due to incomplete knowledge or excessive complexity in modeling underlying processes. As a result, they often produce biased simulations due to inaccurate parameterizations or approximations used to represent the true physics. In this paper, we aim to build a new physics-guided machine learning framework to monitor dynamical systems. The idea is to use advanced machine learning model to extract complex spatio-temporal data patterns while also incorporating general scientific knowledge embodied in simulated data generated by the physics-based model. To handle the bias in simulated data caused by imperfect parameterization, we propose to extract general physical relations jointly from multiple sets of simulations generated by a physics-based model under different physical parameters. In particular, we develop a spatio-temporal network architecture that uses its gating variables to capture the variation of physical parameters. We initialize this model using a pre-training strategy that helps discover common physical patterns shared by different sets of simulated data. Then, we fine-tune it combining limited observations and adequate simulations. By leveraging the complementary strength of machine learning and domain knowledge, our method has been shown to produce accurate predictions, use less training samples and generalize to out-of-sample scenarios. We further show that the method can provide insights about the variation of physical parameters over space and time in two domain applications: predicting temperature in streams and predicting temperature in lakes.
Comparative evaluation of full-length isoform quantification from RNA-Seq
Background Full-length isoform quantification from RNA-Seq is a key goal in transcriptomics analyses and has been an area of active development since the beginning. The fundamental difficulty stems from the fact that RNA transcripts are long, while RNA-Seq reads are short. Results Here we use simulated benchmarking data that reflects many properties of real data, including polymorphisms, intron signal and non-uniform coverage, allowing for systematic comparative analyses of isoform quantification accuracy and its impact on differential expression analysis. Genome, transcriptome and pseudo alignment-based methods are included; and a simple approach is included as a baseline control. Conclusions Salmon, kallisto, RSEM, and Cufflinks exhibit the highest accuracy on idealized data, while on more realistic data they do not perform dramatically better than the simple approach. We determine the structural parameters with the greatest impact on quantification accuracy to be length and sequence compression complexity and not so much the number of isoforms. The effect of incomplete annotation on performance is also investigated. Overall, the tested methods show sufficient divergence from the truth to suggest that full-length isoform quantification and isoform level DE should still be employed selectively.
Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction
Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.
Fiber estimation and tractography in diffusion MRI: Development of simulated brain images and comparison of multi-fiber analysis methods at clinical b-values
Advances in diffusion-weighted magnetic resonance imaging (DW-MRI) have led to many alternative diffusion sampling strategies and analysis methodologies. A common objective among methods is estimation of white matter fiber orientations within each voxel, as doing so permits in-vivo fiber-tracking and the ability to study brain connectivity and networks. Knowledge of how DW-MRI sampling schemes affect fiber estimation accuracy, tractography and the ability to recover complex white-matter pathways, differences between results due to choice of analysis method, and which method(s) perform optimally for specific data sets, all remain important problems, especially as tractography-based studies become common. In this work, we begin to address these concerns by developing sets of simulated diffusion-weighted brain images which we then use to quantitatively evaluate the performance of six DW-MRI analysis methods in terms of estimated fiber orientation accuracy, false-positive (spurious) and false-negative (missing) fiber rates, and fiber-tracking. The analysis methods studied are: 1) a two-compartment “ball and stick” model (BSM) (Behrens et al., 2003); 2) a non-negativity constrained spherical deconvolution (CSD) approach (Tournier et al., 2007); 3) analytical q-ball imaging (QBI) (Descoteaux et al., 2007); 4) q-ball imaging with Funk–Radon and Cosine Transform (FRACT) (Haldar and Leahy, 2013); 5) q-ball imaging within constant solid angle (CSA) (Aganj et al., 2010); and 6) a generalized Fourier transform approach known as generalized q-sampling imaging (GQI) (Yeh et al., 2010). We investigate these methods using 20, 30, 40, 60, 90 and 120 evenly distributed q-space samples of a single shell, and focus on a signal-to-noise ratio (SNR=18) and diffusion-weighting (b=1000s/mm2) common to clinical studies. We found that the BSM and CSD methods consistently yielded the least fiber orientation error and simultaneously greatest detection rate of fibers. Fiber detection rate was found to be the most distinguishing characteristic between the methods, and a significant factor for complete recovery of tractography through complex white-matter pathways. For example, while all methods recovered similar tractography of prominent white matter pathways of limited fiber crossing, CSD (which had the highest fiber detection rate, especially for voxels containing three fibers) recovered the greatest number of fibers and largest fraction of correct tractography for complex three-fiber crossing regions. The synthetic data sets, ground-truth, and tools for quantitative evaluation are publically available on the NITRC website as the project “Simulated DW-MRI Brain Data Sets for Quantitative Evaluation of Estimated Fiber Orientations” at http://www.nitrc.org/projects/sim_dwi_brain. •Development of simulated diffusion-weighted brain images based on in-vivo data.•Improvements in fiber estimation engendered more complete white-matter pathways.•Accurate fiber estimation essential to tractography through complex crossing-regions.•Non-negativity constrained super-resolved spherical deconvolution yielded best results on clinical diffusion-weighted data.
Human inspired deep learning to locate and classify terrestrial and arboreal animals in thermal drone surveys
Drones are an effective tool for animal surveys, capable of generating an abundance of high‐quality ecological data. However, the large volume of ecological data generated introduces an additional problem of the requisite human resources to process and analyse such data. Deep learning models offer a solution to this challenge, capable of autonomously processing drone footage to detect animals with higher fidelity and lower latency when compared with humans. This work aimed to develop an animal detection architecture that classifies animals in accordance to their location (terrestrial vs. arboreal). The model incorporates human pilot inspired techniques for greater performance and consistency across time. Thermal drone footage across the state of New South Wales, Australia from surveys over a 2+ year period was used to construct a diverse training and validation dataset. A high‐resolution 3D simulation was developed to workload by autonomously generating labelled data to supplement manually labelled field data. The model was evaluated on 130 hours of thermal imagery (14 million images) containing 57 unique animal species where 1637 out of 1719 (95.23%) of human pilot recorded animals were detected. The model achieved an F1 score of 0.9410, a 4.36 percentage point increase in performance over a benchmark YOLOv8 model. Simulated data improved model performance by 1.7x for low data scenarios, lowering data labelling costs due to higher quality image pre‐labels. The proposed animal detection model demonstrates strong reporting accuracy in the detection and tracking of animals. The approach enables widespread adoption of drone‐capturing technology by providing in‐field real‐time assistance, allowing novice pilots to detect animals at the level of experienced pilots, whilst also reducing the burden of report generation and data labelling costs.
Predictive Maintenance in Industry 4.0 for the SMEs: A Decision Support System Case Study Using Open-Source Software
Predictive maintenance is one of the most important topics within the Industry 4.0 paradigm. We present a prototype decision support system (DSS) that collects and processes data from many sensors and uses machine learning and artificial intelligence algorithms to report deviations from the optimal process in a timely manner and correct them to the correct parameters directly or indirectly through operator intervention or self-correction. We propose to develop the DSS using open-source R packages because using open-source software such as R for predictive maintenance is beneficial for small and medium enterprises (SMEs) as it provides an affordable, adaptable, flexible, and tunable solution. We validate the DSS through a case study to show its application to SMEs that need to maintain industrial equipment in real time by leveraging IoT technologies and predictive maintenance of industrial cooling systems. The dataset used was simulated based on the information on the indicators measured as well as their ranges collected by in-depth interviews. The results show that the software provides predictions and actionable insights using collaborative filtering. Feedback is collected from SMEs in the manufacturing sector as potential system users. Positive feedback emphasized the advantages of employing open-source predictive maintenance tools, such as R, for SMEs, including cost savings, increased accuracy, community assistance, and program customization. However, SMEs have overwhelmingly voiced comments and concerns regarding the use of open-source R in their infrastructure development and daily operations.
Uncertainty Quantification in Data Fusion Classifier for Ship-Wake Detection
Using deep learning model predictions requires not only understanding the model’s confidence but also its uncertainty, so we know when to trust the prediction or require support from a human. In this study, we used Monte Carlo dropout (MCDO) to characterize the uncertainty of deep learning image classification algorithms, including feature fusion models, on simulated synthetic aperture radar (SAR) images of persistent ship wakes. Comparing to a baseline, we used the distribution of predictions from dropout with simple mean value ensembling and the Kolmogorov—Smirnov (KS) test to classify in-domain and out-of-domain (OOD) test samples, created by rotating images to angles not present in the training data. Our objective was to improve the classification robustness and identify OOD images during the test time. The mean value ensembling did not improve the performance over the baseline, in that there was a –1.05% difference in the Matthews correlation coefficient (MCC) from the baseline model averaged across all SAR bands. The KS test, by contrast, saw an improvement of +12.5% difference in MCC and was able to identify the majority of OOD samples. Leveraging the full distribution of predictions improved the classification robustness and allowed labeling test images as OOD. The feature fusion models, however, did not improve the performance over the single SAR-band models, demonstrating that it is best to rely on the highest quality data source available (in our case, C-band).