Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

by Yang, Yu , Shang Wenjie , Qin Zhiwei , Li, Qingyang , Meng Yiping , Ye Jieping

in Control theory / Decision making / Environment models / Inference / Learning / Machine learning / Multiagent systems / Recommender systems / Training / Uplift / Virtual environments

2021

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

by Yang, Yu , Shang Wenjie , Qin Zhiwei , Li, Qingyang , Meng Yiping , Ye Jieping

in Control theory / Decision making / Environment models / Inference / Learning / Machine learning / Multiagent systems / Recommender systems / Training / Uplift / Virtual environments

2021

Confirm

Do you wish to request the book?

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

by Yang, Yu , Shang Wenjie , Qin Zhiwei , Li, Qingyang , Meng Yiping , Ye Jieping

in Control theory / Decision making / Environment models / Inference / Learning / Machine learning / Multiagent systems / Recommender systems / Training / Uplift / Virtual environments

2021

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

Partially observable environment estimation with uplift inference for reinforcement learning based recommendation

Yang, Yu,

Shang Wenjie,

Qin Zhiwei,

Li, Qingyang,

Meng Yiping,

Ye Jieping

2021

Overview

Reinforcement learning (RL) aims at searching the best policy model for decision making, and has been shown powerful for sequential recommendations. The training of the policy by RL, however, is placed in an environment. In many real-world applications, the policy training in the real environment can cause an unbearable cost due to the exploration. Environment estimation from the past data is thus an appealing way to release the power of RL in these applications. The estimation of the environment is, basically, to extract the causal effect model from the data. However, real-world applications are often too complex to offer fully observable environment information. Therefore, quite possibly there are unobserved variables lying behind the data, which can obstruct an effective estimation of the environment. In this paper, by treating the hidden variables as a hidden policy, we propose a partially-observed multi-agent environment estimation (POMEE) approach to learn the partially-observed environment. To make a better extraction of the causal relationship between actions and rewards, we design a deep uplift inference network (DUIN) model to learn the causal effects of different actions. By implementing the environment model in the DUIN structure, we propose a POMEE with uplift inference (POMEE-UI) approach to generate a partially-observed environment with a causal reward mechanism. We analyze the effect of our method in both artificial and real-world environments. We first use an artificial recommender environment, abstracted from a real-world application, to verify the effectiveness of POMEE-UI. We then test POMEE-UI in the real application of Didi Chuxing. Experiment results show that POMEE-UI can effectively estimate the hidden variables, leading to a more reliable virtual environment. The online A/B testing results show that POMEE can derive a well-performing recommender policy in the real-world application.

Share this book

Add to My Shelf

Publisher

Springer Nature B.V

Subject