A PDE-based model-free algorithm for Continuous-time Reinforcement Learning

Data Science Seminar

Yuhua Zhu (UCLA)

Abstract

This talk addresses the problem of continuous-time reinforcement learning (RL). When the underlying dynamics remain unknown and only discrete-time observations are available, how can we effectively conduct policy evaluation and policy iteration? We first highlight that while model-free RL algorithms are straightforward to implement, they are often not a reliable approximation of the true value function. On the other hand, model-based PDE approaches are more accurate, but the inverse problem is not easy to solve. To bridge this gap, we introduce a new Bellman equation, PhiBE, which integrates discrete-time information into a PDE formulation. PhiBE allows us to skip the identification of the dynamics and directly evaluate the value function using discrete-time data. Additionally, it offers a more accurate approximation of the true value function, especially in scenarios where the underlying dynamics change slowly. Moreover, we extend PhiBE to higher orders, providing increasingly accurate approximations.

Start date
Tuesday, Dec. 3, 2024, 1:25 p.m.
End date
Tuesday, Dec. 3, 2024, 2:25 p.m.
Location

Lind Hall 325 or via Zoom

Zoom registration

Share