Optimal Sample Complexity for Average Reward MDPs: How Span-Based?

Original title: Span-Based Optimal Sample Complexity for Average Reward MDPs Authors: Matthew Zurek, Yudong Chen The article delves into learning optimal policies in average-reward Markov decision processes (MDPs) and their sample complexity under a generative…

Read more of Optimal Sample Complexity for Average Reward MDPs: How Span-Based?