Original title: Probabilistic Inference in Reinforcement Learning Done Right
Authors: Jean Tarbouriech, Tor Lattimore, Brendan O’Donoghue
In their article, researchers explore Reinforcement Learning (RL) through a probabilistic lens, focusing on Markov decision processes (MDP) and the probability of state-action pairs visited under the optimal policy. They note existing approximations often fail, leading to underperforming algorithms lacking true statistical inference. To address this, they rigorously apply Bayesian methods to estimate the posterior probability of state-action optimality and its flow through the MDP. While directly computing this quantity is impractical, they introduce VAPOR, a novel variational Bayesian approximation, solving it as a tractable convex optimization problem. VAPOR not only efficiently explores to minimize regret but also connects to established methods like Thompson sampling and K-learning. Their experiments showcase a deep RL version of VAPOR, demonstrating its superior performance, highlighting the method’s potential in navigating challenging RL problems.
Original article: https://arxiv.org/abs/2311.13294