Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
952 result(s) for "multi-armed bandit algorithm"
Sort by:
A Study of Anode-Supported Solid Oxide Fuel Cell Modeling and Optimization Using Neural Network and Multi-Armed Bandit Algorithm
Anode-supported solid oxide fuel cells (SOFCs) model based on artificial neural network (ANN) and optimized design variables were modeled. The input parameters of the anode-supported SOFC model developed in this study are as follows: current density, temperature, electrolyte thickness, anode thickness, anode porosity, and cathode thickness. Voltage was estimated from the SOFC model with the input parameters. Numerical results show that the SOFC model constructed in this study can represent the actual SOFC characteristics very well. There are four design parameters to be optimized: electrolyte, anode, cathode thickness, and anode porosity. To derive the optimal combination of the design parameters, we have used a multi-armed bandit algorithm (MAB), and developed a methodology for deriving near-optimal parameter set without searching for all possible parameter sets.
A Fine-Grain Batching-Based Task Allocation Algorithm for Spatial Crowdsourcing
Task allocation is a critical issue of spatial crowdsourcing. Although the batching strategy performs better than the real-time matching mode, it still has the following two drawbacks: (1) Because the granularity of the batch size set obtained by batching is too coarse, it will result in poor matching accuracy. However, roughly designing the batch size for all possible delays will result in a large computational overhead. (2) Ignoring non-stationary factors will lead to a change in optimal batch size that cannot be found as soon as possible. Therefore, this paper proposes a fine-grained, batching-based task allocation algorithm (FGBTA), considering non-stationary setting. In the batch method, the algorithm first uses variable step size to allow for fine-grained exploration within the predicted value given by the multi-armed bandit (MAB) algorithm and uses the results of pseudo-matching to calculate the batch utility. Then, the batch size with higher utility is selected, and the exact maximum weight matching algorithm is used to obtain the allocation result within the batch. In order to cope with the non-stationary changes, we use the sliding window (SW) method to retain the latest batch utility and discard the historical information that is too far away, so as to finally achieve refined batching and adapt to temporal changes. In addition, we also take into account the benefits of requesters, workers, and the platform. Experiments on real data and synthetic data show that this method can accomplish the task assignment of spatial crowdsourcing effectively and can adapt to the non-stationary setting as soon as possible. This paper mainly focuses on the spatial crowdsourcing task of ride-hailing.
Deep Analysis of Unbalanced Offline Data in Bridge Monitoring Based on Improved Upper Confidence Bound Algorithm
The continuous advancement of infrastructure construction and the increase in transportation pressure have caused great damage to bridge structures. Monitoring bridge structures can promptly detect potential structural damage. Therefore, this study proposes an unbalanced offline data monitoring method based on an improved upper confidence bound algorithm, aiming to provide strong support for bridge safety assessment. This study first conducts a deep analysis of the unbalanced offline data of the fusion of multi-armed bandit algorithm and upper confidence bound algorithm, then constructs an improved model, and finally analyzes the results of the proposed model. Therefore, the regret generated by the research algorithm is much smaller than that of the comparative algorithm, indicating that it can be applied to bridge monitoring and has better monitoring ability for unbalanced offline data.
Distributed algorithm under cooperative or competitive priority users in cognitive networks
Opportunistic spectrum access (OSA) problem in cognitive radio (CR) networks allows a secondary (unlicensed) user (SU) to access a vacant channel allocated to a primary (licensed) user (PU). By finding the availability of the best channel, i.e., the channel that has the highest availability probability, a SU can increase its transmission time and rate. To maximize the transmission opportunities of a SU, various learning algorithms are suggested: Thompson sampling (TS), upper confidence bound (UCB), ε-greedy, etc. In our study, we propose a modified UCB version called AUCB (Arctan-UCB) that can achieve a logarithmic regret similar to TS or UCB while further reducing the total regret, defined as the reward loss resulting from the selection of non-optimal channels. To evaluate AUCB’s performance for the multi-user case, we propose a novel uncooperative policy for a priority access where the kth user should access the kth best channel. This manuscript theoretically establishes the upper bound on the sum regret of AUCB under the single or multi-user cases. The users thus may, after finite time slots, converge to their dedicated channels. It also focuses on the Quality of Service AUCB (QoS-AUCB) using the proposed policy for the priority access. Our simulations corroborate AUCB’s performance compared to TS or UCB.
Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment
There is strong support for formative assessment inclusion in learning processes, with the main emphasis on corrective feedback for students. However, traditional testing and Computer Adaptive Testing can be problematic to implement in the classroom. Paper based tests are logistically inconvenient and are hard to personalize, and thus must be longer to accurately assess every student in the classroom. Computer Adaptive Testing can mitigate these problems by making use of Multi-Dimensional Item Response Theory at cost of introducing several new problems, most problematic of which are the greater test creation complexity, because of the necessity of question pool calibration, and the debatable premise that different questions measure one common latent trait. In this paper a new approach of modelling formative assessment as a Multi-Armed bandit problem is proposed and solved using Upper-Confidence Bound algorithm. The method in combination with e-learning paradigm has the potential to mitigate such problems as question item calibration and lengthy tests, while providing accurate formative assessment feedback for students. A number of simulation and empirical data experiments (with 104 students) are carried out to explore and measure the potential of this application with positive results.
Autonomous Power Decision for the Grant Free Access MUSA Scheme in the mMTC Scenario
Non-orthogonal multiple access schemes with grant free access have been recently highlighted as a prominent solution to meet the stringent requirements of massive machine-type communications (mMTCs). In particular, the multi-user shared access (MUSA) scheme has shown great potential to grant free access to the available resources. For the sake of simplicity, MUSA is generally conducted with the successive interference cancellation (SIC) receiver, which offers a low decoding complexity. However, this family of receivers requires sufficiently diversified received user powers in order to ensure the best performance and avoid the error propagation phenomenon. The power allocation has been considered as a complicated issue especially for a decentralized decision with a minimum signaling overhead. In this paper, we propose a novel algorithm for an autonomous power decision with a minimal overhead based on a tight approximation of the bit error probability (BEP) while considering the error propagation phenomenon. We investigate the efficiency of multi-armed bandit (MAB) approaches for this problem in two different reward scenarios: (i) in Scenario 1, each user reward only informs about whether its own packet was successfully transmitted or not; (ii) in Scenario 2, each user reward may carry information about the other interfering user packets. The performances of the proposed algorithm and the MAB techniques are compared in terms of the successful transmission rate. The simulation results prove that the MAB algorithms show a better performance in the second scenario compared to the first one. However, in both scenarios, the proposed algorithm outperforms the MAB techniques with a lower complexity at user equipment.
Meta Heuristic Fusion Model for Classification with Modified U-Net-based Segmentation
General cause of diabetes mellitus is Diabetic Retinopathy (DR), which outcomes in lesions on the retinas that impair vision. If it is not detected in time, the result is severe blindness issues. Regrettably, there is no treatment for DR. Early diagnosis and treatment of DR can greatly lower the risk of visual loss. In contrast to computer-aided diagnosis methods, the manual diagnosis of DR using retina fundus images is more time-consuming effort, and high cost as well, as it is highly prone to error. Deep learning has emerged as one of the most popular methods for improving performance, particularly in the classification and analysis of medical images. Therefore, a deep structure-based DR detection and severity classification has been demonstrated for treating the DR with the usage of fundus images. The major aim of this developed technique is to classify the severity level of the retinal region of the human eye from the fundus images. At first, the required retinal fundus images are collected from the standard benchmark data sources. Secondly, image enhancement techniques are applied to the collected fundus images to improve the quality of images. Thirdly, the abnormality segmentations are carried out by using the optic disc removal process using active contouring model and then, the regional segmentation is done via the Modified U-Net method. Finally, the segmented image is subjected to the hybrid classifier network named a Hybrid Soft Attention-based DenseNet with Multi-Scale Gated ResNet (HSADMGR Net) for classifying the retinal fundus images and finding the severity level of the retinal images with higher accuracy. Furthermore, the parameters present inside the hybrid classifier network are optimized with the help of implemented Multi-Armed Bandits Groundwater Flow Algorithm (MABGFA). The test results regarding the developed deep structure-based DR model are validated with the existing DR detection and classification approaches by using different performance measures.
KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148—177], based on upper confidence bounds of the arm payoffs computed using the Kullback—Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: the kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins [Adv. in Appl. Math. 6 (1985) 4—22] and Burnetas and Katehakis [Adv. in Appl. Math. 17 (1996) 122—142], respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.
ADAPTIVE TREATMENT ASSIGNMENT IN EXPERIMENTS FOR POLICY CHOICE
Standard experimental designs are geared toward point estimation and hypothesis testing, while bandit algorithms are geared toward in-sample outcomes. Here, we instead consider treatment assignment in an experiment with several waves for choosing the best among a set of possible policies (treatments) at the end of the experiment. We propose a computationally tractable assignment algorithm that we call “exploration sampling,” where assignment probabilities in each wave are an increasing concave function of the posterior probabilities that each treatment is optimal. We prove an asymptotic optimality result for this algorithm and demonstrate improvements in welfare in calibrated simulations over both non-adaptive designs and bandit algorithms. An application to selecting between six different recruitment strategies for an agricultural extension service in India demonstrates practical feasibility.
Reinforcement Learning in Economics and Finance
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy – a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.