Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
by
Yang, Yu
, Shang Wenjie
, Qin Zhiwei
, Li, Qingyang
, Meng Yiping
, Ye Jieping
in
Control theory
/ Decision making
/ Environment models
/ Inference
/ Learning
/ Machine learning
/ Multiagent systems
/ Recommender systems
/ Training
/ Uplift
/ Virtual environments
2021
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
by
Yang, Yu
, Shang Wenjie
, Qin Zhiwei
, Li, Qingyang
, Meng Yiping
, Ye Jieping
in
Control theory
/ Decision making
/ Environment models
/ Inference
/ Learning
/ Machine learning
/ Multiagent systems
/ Recommender systems
/ Training
/ Uplift
/ Virtual environments
2021
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
by
Yang, Yu
, Shang Wenjie
, Qin Zhiwei
, Li, Qingyang
, Meng Yiping
, Ye Jieping
in
Control theory
/ Decision making
/ Environment models
/ Inference
/ Learning
/ Machine learning
/ Multiagent systems
/ Recommender systems
/ Training
/ Uplift
/ Virtual environments
2021
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Journal Article
Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
2021
Request Book From Autostore
and Choose the Collection Method
Overview
Reinforcement learning (RL) aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by RL, however, is placed in an environment. In many real-world applications, the policy training in the real environment can cause an unbearable cost due to the exploration. Environment estimation from the past data is thus an appealing way to release the power of RL in these applications. The estimation of the environment is, basically, to extract the causal effect model from the data. However, real-world applications are often too complex to offer fully observable environment information. Therefore, quite possibly there are unobserved variables lying behind the data, which can obstruct an effective estimation of the environment. In this paper, by treating the hidden variables as a hidden policy, we propose a partially-observed multi-agent environment estimation (POMEE) approach to learn the partially-observed environment. To make a better extraction of the causal relationship between actions and rewards, we design a deep uplift inference network (DUIN) model to learn the causal effects of different actions. By implementing the environment model in the DUIN structure, we propose a POMEE with uplift inference (POMEE-UI) approach to generate a partially-observed environment with a causal reward mechanism. We analyze the effect of our method in both artificial and real-world environments. We first use an artificial recommender environment, abstracted from a real-world application, to verify the effectiveness of POMEE-UI. We then test POMEE-UI in the real application of Didi Chuxing. Experiment results show that POMEE-UI can effectively estimate the hidden variables, leading to a more reliable virtual environment. The online A/B testing results show that POMEE can derive a well-performing recommender policy in the real-world application.
Publisher
Springer Nature B.V
Subject
This website uses cookies to ensure you get the best experience on our website.