Загрузка страницы

Pessimistic Reward Models for Off-Policy Learning in Recommendation

RecSys 2021 Pessimistic Reward Models for Off-Policy Learning in Recommendation

Authors: Olivier Jeunen, University of Antwerp | Bart Goethals, University of Antwerp

Abstract: Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield -- for example, the probability of a click on a recommendation.
This common machine learning task is highly non-trivial, as the data-generating process for contexts and actions is often skewed by the recommender system itself.
Indeed, when the deployed recommendation policy at data collection time does not pick its actions uniformly-at-random, this leads to a selection bias that can impede effective reward modelling.
This in turn makes off-policy learning -- the typical setup in industry -- particularly challenging.

In this work, we propose and validate a general pessimistic reward modelling approach for off-policy learning in recommendation.
Bayesian uncertainty estimates allow us to express scepticism about our own reward model, which can in turn be used to generate a conservative decision rule.
We show how it alleviates a well-known decision making phenomenon known as the Optimiser's Curse, and draw parallels with existing work on pessimistic policy learning.
Leveraging the available closed-form expressions for both the posterior mean and variance when a ridge regressor models the reward, we show how to apply pessimism effectively and efficiently to an off-policy recommendation use-case.
Empirical observations in a wide range of environments show that being conservative in decision-making leads to a significant and robust increase in recommendation performance.
The merits of our approach are most outspoken in realistic settings with limited logging randomisation, limited training samples, and larger action spaces.

DOI: https://doi.org/10.1145/3460231.3474247

Видео Pessimistic Reward Models for Off-Policy Learning in Recommendation канала ACM RecSys
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
30 января 2022 г. 17:43:48
00:15:08
Яндекс.Метрика