Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
129,334
result(s) for
"Benchmarks"
Sort by:
Combination of turbine-specific and neighbor-based power prediction for accurate and robust under-performance diagnosis
2026
In this study, a benchmark of normal behavior models is implemented to predict the power of a turbine of interest from various sets of inputs, including exogenous variables and power from the neighbors. Sequential filtering steps are utilized to identify normal operation data and label under-performance events. Then, the capacity of each benchmark model to detect under-performance is investigated, and compared with the others. Lastly, a flagging criterion combining several models is defined, taking advantage of the complementarity between exogenous and neighbor information.
Journal Article
WeatherBench 2: A Benchmark for the Next Generation of Data‐Driven Global Weather Models
by
Rasp, Stephan
,
Sha, Fei
,
Bromberg, Carla
in
Algorithms
,
Artificial intelligence
,
Baseline studies
2024
WeatherBench 2 is an update to the global, medium‐range (1–14 days) weather forecasting benchmark proposed by (Rasp et al., 2020, https://doi.org/10.1029/2020ms002203), designed with the aim to accelerate progress in data‐driven weather modeling. WeatherBench 2 consists of an open‐source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state‐of‐the‐art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state‐of‐the‐art physical and data‐driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data‐driven weather forecasting. Plain Language Summary Traditionally, weather forecasts are made by models that attempt to replicate the physical processes of the atmosphere. This has been very successful over the last few decades as better computers, better observations and model upgrades have lead to steadily improving weather forecasts. However, with rapid advances in artificial intelligence (AI), the question can be asked whether one can simply learn a weather model from past observations or reanalyzes. In the last couple of years, we have seen tremendous progress with state‐of‐the‐art AI models rivaling the best “traditional” weather models in skill. WeatherBench 2 is a benchmark data set designed to evaluate and compare the quality of AI and traditional models. By setting a standard for evaluation, alongside providing open‐source data and code, this project aims to accelerate this research direction and lead to better weather prediction. Key Points WeatherBench 2 is a framework for evaluating and comparing data‐driven and traditional numerical weather forecasting models It provides an evaluation framework, publicly available data sets and a website to assess the state‐of‐the‐art weather models The evaluation protocol has been designed following best practices established in the operational weather forecasting community
Journal Article
Inland Transport Enterprises Process Maturity Assessment--Theoretical Aspects
2024
Purpose: The critical function of inland transport enterprises within the expansive domain of the global maritime container supply chain is acknowledged. The responsibility for managing terrestrial segments of the supply chain, in conjunction with the multifaceted entities impacting the maritime segment, contributes to the complexity of integrating and coordinating the entire supply chain. The effectiveness of processes executed in various activities across the supply chain is instrumental in determining the allure and competitive edge of specific participants and the supply chain at large. Owing to the broad spectrum of tasks and obligations bestowed upon inland transport companies, the necessity for adopting an apt process-oriented management system is underscored. Process maturity is characterized by a framework in which individual processes are formalized in terms of their definition, identification, measurement, adaptability, and efficiency. Regrettably, the literature evidences a dearth of process maturity models applicable to inland transport firms. Thus, the aim of this study is to introduce a theoretical framework for assessing process maturity in inland transport entities. Design/Methodology/Approach: The investigation employed several research methodologies, including a review of existing literature, the questionnaire method, and a process maturity evaluation model. Findings: The proposed process maturity assessment model for inland transport companies is segmented into various levels and dimensions, offering enhanced insights into the augmentation of process maturity within enterprises. Practical implications: The process maturity model for inland transport enterprises is presented as a reference model that managers might utilize for benchmarking purposes, as well as a compilation of recommendations. Orginality: This study represents the inaugural endeavor to formulate a process maturity model tailored to the needs of inland transport companies. Keywords: Process maturity model, inland transport, process management. JEL classification: L15, M10, M16, R49.
Journal Article
DELTA50: A Highly Accurate Database of Experimental sup.1H and sup.13C NMR Chemical Shifts Applied to DFT Benchmarking
2023
Density functional theory (DFT) benchmark studies of [sup.1]H and [sup.13]C NMR chemical shifts often yield differing conclusions, likely due to non-optimal test molecules and non-standardized data acquisition. To address this issue, we carefully selected and measured [sup.1]H and [sup.13]C NMR chemical shifts for 50 structurally diverse small organic molecules containing atoms from only the first two rows of the periodic table. Our NMR dataset, DELTA50, was used to calculate linear scaling factors and to evaluate the accuracy of 73 density functionals, 40 basis sets, 3 solvent models, and 3 gauge-referencing schemes. The best performing DFT methodologies for [sup.1]H and [sup.13]C NMR chemical shift predictions were WP04/6-311++G(2d,p) and ωB97X-D/def2-SVP, respectively, when combined with the polarizable continuum solvent model (PCM) and gauge-independent atomic orbital (GIAO) method. Geometries should be optimized at the B3LYP-D3/6-311G(d,p) level including the PCM solvent model for the best accuracy. Predictions of 20 organic compounds and natural products from a separate probe set had root-mean-square deviations (RMSD) of 0.07 to 0.19 for [sup.1]H and 0.5 to 2.9 for [sup.13]C. Maximum deviations were less than 0.5 and 6.5 ppm for [sup.1]H and [sup.13]C, respectively.
Journal Article
K-means properties on six clustering benchmark datasets
2018
This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unbalance of cluster sizes. The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level.
Journal Article
Resources and benchmark corpora for hate speech detection
by
Bosco, Cristina
,
Basile, Valerio
,
Patti, Viviana
in
Benchmarks
,
Computational Linguistics
,
Computer generated language analysis
2021
Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.
Journal Article
The journey towards HEPScore, the HEP-specific CPU benchmark for WLCG
2026
HEPScore is a CPU benchmark, based on HEP applications, that the HEPiX Benchmarking working group is proposing as a replacement of the currently used HEP-SPEC06 benchmark, adopted in WLCG for procurement, computing resource pledges and performance studies. In 2019, we presented at ACAT the motivations for building a benchmark for the HEP community based on HEP applications. The process from the conception to the implementation and validation of this objective has been inspiring and challenging. In the spirit of the HEP community, it has involved many contributions from software developers, data analysts, experts of the experiments, representatives of several WLCG computing centres, as well as the WLCG HEPScore Deployment Task Force. In this contribution, we review this long journey, the technological solutions selected, the readiness of HEPScore, and the deployment plans for 2023.
Journal Article
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
by
Zhang, Amy
,
Rocktäschel, Tim
,
Grefenstette, Edward
in
Algorithms
,
Artificial intelligence
,
Benchmarks
2023
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation.
Journal Article
Monarch butterfly optimization
2019
In nature, the eastern North American monarch population is known for its southward migration during the late summer/autumn from the northern USA and southern Canada to Mexico, covering thousands of miles. By simplifying and idealizing the migration of monarch butterflies, a new kind of nature-inspired metaheuristic algorithm, called monarch butterfly optimization (MBO), a first of its kind, is proposed in this paper. In MBO, all the monarch butterfly individuals are located in two distinct lands, viz. southern Canada and the northern USA (Land 1) and Mexico (Land 2). Accordingly, the positions of the monarch butterflies are updated in two ways. Firstly, the offsprings are generated (position updating) by migration operator, which can be adjusted by the migration ratio. It is followed by tuning the positions for other butterflies by means of butterfly adjusting operator. In order to keep the population unchanged and minimize fitness evaluations, the sum of the newly generated butterflies in these two ways remains equal to the original population. In order to demonstrate the superior performance of the MBO algorithm, a comparative study with five other metaheuristic algorithms through thirty-eight benchmark problems is carried out. The results clearly exhibit the capability of the MBO method toward finding the enhanced function values on most of the benchmark problems with respect to the other five algorithms. Note that the source codes of the proposed MBO algorithm are publicly available at GitHub (
https://github.com/ggw0122/Monarch-Butterfly-Optimization
, C++/MATLAB) and MATLAB Central (
http://www.mathworks.com/matlabcentral/fileexchange/50828-monarch-butterfly-optimization
, MATLAB).
Journal Article