Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
205,746 result(s) for "Datasets"
Sort by:
F51 Enroll-HD platform data resources
SummaryEnroll-HD is a clinical research platform that includes at its core a global observational study of Huntington’s Disease (HD) families who are followed annually. Currently, the Enroll-HD study includes over 25,700 participants from 21 countries in Europe, North America, Latin America and Australasia. Enroll-HD provides high quality coded clinical data and biosamples to qualified researchers in the Huntington’s Disease research community via a straightforward request process (https://enroll-hd.org/for-researchers/). Every 1-2 years an easy access Enroll-HD dataset (periodic dataset, PDS), that includes approximately 80% of the variables collected in the study, is prepared and made available to qualified HD researchers. The last Enroll-HD PDS release was made available in December 2020. The risk for participant identification from the PDS is low, but if researchers request any of the remaining 20% of the collected variables, the risk for participant identification may be increased and therefore a specified dataset request must be reviewed and approved by the Enroll-HD Scientific Review Committee (SRC). When approved by the SRC, an SPS dataset can be prepared and released. In addition to Enroll-HD, clinical data can be requested from the studies Registry, HDClarity and TRACK-HD/ON. A large, easy access Registry dataset (RDS) prepared in a format similar to the Enroll-HD PDS can augment the Enroll-HD PDS and thereby increase the total number of participants for modeling purposes. The RDS can be requested by contacting the EHDN Scientific Bioethics Advisory Board (SBAC). Datasets are prepared free of charge. In addition to clinical data, the Enroll-HD platform distributes smaller imaging, brain morphometric/volumetric, GWAS, RNAseq, MiSeq, methylation and proteomics datasets collected across a number of HD studies.
Uncertain Data Analysis with Regularized XGBoost
Uncertainty is a ubiquitous element in available knowledge about the real world. Data sampling error, obsolete sources, network latency, and transmission error are all factors that contribute to the uncertainty. These kinds of uncertainty have to be handled cautiously, or else the classification results could be unreliable or even erroneous. There are numerous methodologies developed to comprehend and control uncertainty in data. There are many faces for uncertainty i.e., inconsistency, imprecision, ambiguity, incompleteness, vagueness, unpredictability, noise, and unreliability. Missing information is inevitable in real-world data sets. While some conventional multiple imputation approaches are well studied and have shown empirical validity, they entail limitations in processing large datasets with complex data structures. In addition, these standard approaches tend to be computationally inefficient for medium and large datasets. In this paper, we propose a scalable multiple imputation frameworks based on XGBoost, bootstrapping and regularized method. XGBoost, one of the fastest implementations of gradient boosted trees, is able to automatically retain interactions and non-linear relations in a dataset while achieving high computational efficiency with the aid of bootstrapping and regularized methods. In the context of high-dimensional data, this methodology provides fewer biased estimates and reflects acceptable imputation variability than previous regression approaches. We validate our adaptive imputation approaches with standard methods on numerical and real data sets and shown promising results.
A Novel Cyber-attack Leads Prediction System using Cascaded R2CNN Model
Novel prediction systems are required in almost all internet-connected platforms to safeguard the user information to get hacked by intermediate peoples. Finding the real impacted factors associated with the Cyber-attack probes are being considered for research. The proposed methodology is derived from various literature studies that motivated to find the unique prediction model that shows improved accuracy and performance. The proposed model is represented as R2CNN that acts as the cascaded combination of Gradient boosted regression detector with recurrent convolution neural network for pattern prediction. The given input data is the collection of various applications engaged with the wireless sensor nodes in a smart city. Each user connected with a certain number of applications that access the authorization of the device owner. The dataset comprises device information, the number of connectivity, device type, simulation time, connectivity duration, etc. The proposed R2CNN extracts the features of the dataset and forms a feature mapping that related to the parameter being focused on. The features are tested for correlation with the trained dataset and evaluate the early prediction of Cyber-attacks in the massive connected IoT devices.
CORRIGENDUM
doi: 10.1038/nature19104 Corrigendum: Holocene shifts in the assembly of plant and animal communities implicate human impacts S. Kathleen Lyons, Kathryn L. Amatangelo, Anna K. Behrensmeyer, Antoine Bercovici, Jessica L. Blois, Matt Davis, William A. DiMichele, Andrew Du, Jussi T. Eronen, J. Tyler Faith, Gary R. Graves, Nathan Jud, conrad Labandeira, cindy V. Looy, brian McGill, Joshua H. Miller, David Patterson, Silvia Pineda-Munoz, Richard Potts, Brett Riddle, Rebecca Terry, Anikó Tóth, Werner Ulrich, Amelia Villaseñor, Scott Wing, Heidi Anderson, John Anderson, Donald Waller & Nicholas J. Gotelli Nature 529, 80-83 (2016); doi:10.1038/nature16447 It has come to our attention that in this Letter, there were some errors in the categorization of some of the modern datasets (R. Telford et al., personal communication).
Versioned data: why it is needed and how it can be achieved (easily and cheaply)
The sharing and re-use of data has become a cornerstone of modern science. Multiple platforms now allow quick and easy data sharing. So far, however, data publishing models have not accommodated on-going scientific improvements in data: for many problems, datasets continue to grow with time -- more records are added, errors fixed, and new data structures are created. In other words, datasets, like scientific knowledge, advance with time. We therefore suggest that many datasets would be usefully published as a series of versions, with a simple naming system to allow users to perceive the type of change between versions. In this article, we argue for adopting the paradigm and processes for versioned data, analogous to software versioning. We also introduce a system called Versioned Data Delivery and present tools for creating, archiving, and distributing versioned data easily, quickly, and cheaply. These new tools allow for individual research groups to shift from a static model of data curation to a dynamic and versioned model that more naturally matches the scientific process.