MbrlCatalogueTitleDetail

Do you wish to reserve the book?
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Hey, we have placed the reservation for you!
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Title added to your shelf!
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
How would you like to get it?
We have requested the book for you! Sorry the robot delivery is not available at the moment
We have requested the book for you!
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Journal Article

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

2021
Request Book From Autostore and Choose the Collection Method
Overview
Reinforcement learning (RL) aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by RL, however, is placed in an environment. In many real-world applications, the policy training in the real environment can cause an unbearable cost due to the exploration. Environment estimation from the past data is thus an appealing way to release the power of RL in these applications. The estimation of the environment is, basically, to extract the causal effect model from the data. However, real-world applications are often too complex to offer fully observable environment information. Therefore, quite possibly there are unobserved variables lying behind the data, which can obstruct an effective estimation of the environment. In this paper, by treating the hidden variables as a hidden policy, we propose a partially-observed multi-agent environment estimation (POMEE) approach to learn the partially-observed environment. To make a better extraction of the causal relationship between actions and rewards, we design a deep uplift inference network (DUIN) model to learn the causal effects of different actions. By implementing the environment model in the DUIN structure, we propose a POMEE with uplift inference (POMEE-UI) approach to generate a partially-observed environment with a causal reward mechanism. We analyze the effect of our method in both artificial and real-world environments. We first use an artificial recommender environment, abstracted from a real-world application, to verify the effectiveness of POMEE-UI. We then test POMEE-UI in the real application of Didi Chuxing. Experiment results show that POMEE-UI can effectively estimate the hidden variables, leading to a more reliable virtual environment. The online A/B testing results show that POMEE can derive a well-performing recommender policy in the real-world application.