Boosting Soft Q-Learning by Bounding

发布于 2024-06-27 17 次阅读

AI 摘要

这篇文章介绍了软Q学习中如何利用任何值函数估计来推导最优值函数的双边界，并展示了如何利用这些边界来提高训练性能。作者指出，他们发现了一个提出Q函数更新的替代方法，从而提高了性能。通过实验证实，这个提出的框架可以改进训练性能。

An agent's ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft Q-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the Q-function, leading to boosted performance.