Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Advances in Online Convex Optimization, Games, and Problems with Bandit Feedback
by
Cardoso, Adrian Rivera
in
Convex analysis
/ Decision making
/ Distance learning
/ Educational technology
/ Gaming machines
/ Mathematics
/ Privacy
2020
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Advances in Online Convex Optimization, Games, and Problems with Bandit Feedback
by
Cardoso, Adrian Rivera
in
Convex analysis
/ Decision making
/ Distance learning
/ Educational technology
/ Gaming machines
/ Mathematics
/ Privacy
2020
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Advances in Online Convex Optimization, Games, and Problems with Bandit Feedback
Dissertation
Advances in Online Convex Optimization, Games, and Problems with Bandit Feedback
2020
Request Book From Autostore
and Choose the Collection Method
Overview
In this thesis we study sequential decision making through the lens of Online Learn- ing. Online Learning is a very powerful and general framework for multi-period decision making. Due to its simple farmulation and effectiveness it has become a tool of daily use in multibillion companies. Moreover, duc to its beautiful theory and its tight connections with other fields, Online Learning has caught the attention of academics all over the world and driven first-class research.In the first chapter of this thesis. joint work with Huan Xu, we study a problem called: Risk-Averse Convex Bandit. Risk-aversion mukes reference to the fact that humans prefer consistent sequences of good rewards instead of highly variable sequences with slightly better rewards. The Risk-Averse Convex Bandit addresses the fact that, while hisman deci- sion makers are risk-averse. most algorithms for Online Learning are not. In this thesis we provide the first efficient algorithms with strong theoretical guarantees for the Risk-Averse Convex Bandit problem.In the second chapter. joint work with Rachel Cummings, we study the problem of pre- serving privacy in the setting of online submodular minimization. Submodular functions have multiple applications in machine leaming and economics. which usually involve sen- sitive data from individuals. Using tools from Online Convex Optimization, we provide the first «-differentially private algorithms for this problem which are almost as good as the non-private versions for this problem.In the third chapter, joint work with Jacob Abernethy, He Wang. and Huan Xu. we study a dynamic version of two player zero-sum games. Zeto-sum games are ubiquitous in economics, and central to understanding Linear Programming Duality, Convex and Robust Optimization. and Statistics. For many decades it was thought that one could solve this kind of games using sublinear regret algorithms for Online Convex Optimization. We show that while the previous is tue when the game docs not change with time, a naive application of these algorithms can be fatal if the game changes and the players are trying to compete with the Nash Equilibrium of the sum of the games in hindsight.In the fourth chapter. joint work with He Wang and Huan Xu. we revisit the decade old problem of Markov Decision Processes (MDPs) with Adversurial Rewards. MDPs provide a genera) mathematical framework for sequemial decision making under uncertainty when there is a notion of ‘state’, moreover they are the backbone of all Reinforcemem Leaming. We provide an elegant algorithm for this problem using tools fram Online Convex Opti- mization. The algorithm's performance is comparable with cusrent state of the art. We also consider the problem under the large state-space reginw:. and provide the first algorithm with strong theoretical guarantees.
Publisher
ProQuest Dissertations & Theses
Subject
ISBN
9798263375171
This website uses cookies to ensure you get the best experience on our website.