Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Structured and Correlated Multi-Armed Bandits: Algorithms, Theory and Applications
by
Gupta, Samarth
in
Statistics
2022
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Structured and Correlated Multi-Armed Bandits: Algorithms, Theory and Applications
by
Gupta, Samarth
in
Statistics
2022
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Structured and Correlated Multi-Armed Bandits: Algorithms, Theory and Applications
Dissertation
Structured and Correlated Multi-Armed Bandits: Algorithms, Theory and Applications
2022
Request Book From Autostore
and Choose the Collection Method
Overview
Multi-Armed bandit (MAB) framework is a widely used sequential decision making framework in which a decision-maker needs to select one of the available K actions in each round, with the objective of maximizing their long-term reward. This framework has been used in practice for several applications including web advertising, medical testing by viewing the use of different ads/treatments as the arms in the MAB problem. The user's response corresponding to these different actions generates a reward for the decision-maker. Under the classical MAB framework, it is implicitly assumed that the rewards corresponding to different actions are independent of each other. But, this may not be the case in practice as rewards corresponding to different actions (i.e., ads/drugs) are likely to be correlated. Motivated by this, we study the structured and correlated MAB problem in this thesis. First, we study the structured MAB problem where the mean rewards corresponding to different actions are a known function of a hidden parameter, thereby imposing a structured on the mean rewards of different actions. We study this problem in the most general form by imposing no restriction on the form of mean reward functions and as a result subsume the setting of several previously studied structured bandit frameworks where the mean reward function is assumed to be of a specific form. While mean rewards of different actions may be related to one another in the structured bandit setup, the reward realizations may not necessarily be correlated. Motivated by this, we propose a novel correlated MAB framework which explicitly captures the correlation in reward across different actions. For both these frameworks, we design novel algorithms that allow us to extend any classical bandit algorithm to the structured and correlated bandit settings. Through rigorous analysis, we show that our proposed algorithms sample certain sub-optimal actions, termed as non-competitive actions, only O(1) times as opposed to the typical O(log T) samples required by classical algorithms such as Upper Confidence Bound (UCB), Thompson sampling. These significant theoretical performance gains are reflected in our experiments performed on real-world recommendation system datasets such as Movielens, Goodreads. For our proposed correlated bandit framework, we also design best-arm identification algorithms where the task is to identify the best action in as few samples as possible. We demonstrate the achieved performance gains theoretically through sample complexity analysis and empirically through experiments on recommendation system datasets. To further demonstrate the utility of our proposed correlated bandit framework, we show how the framework can be employed to solve online resource allocation problems, which frequently arise in tasks such as power allocation in wireless systems, financial optimization and multi-server scheduling. This is done by extending our correlated bandit framework and algorithms to the setting of online resource allocation. The performance gains are demonstrated theoretically through regret analysis and empirically through synthetic experiments on the task of online power allocation in wireless systems, job scheduling in multi-server systems and channel assignment under the ALOHA protocol.
This website uses cookies to ensure you get the best experience on our website.