ML Seminar: Policy Learning Methods for Confounded POMDPs

The UMN Machine Learning Seminar Series brings together faculty, students, and local industrial partners who are interested in the theoretical, computational, and applied aspects of machine learning, to pose problems, exchange ideas, and foster collaborations. The talks are every Tuesday from 11 a.m. - 12 p.m. during the Spring 2024 semester.

This week's speaker, Zhengling Qi (George Washington University), will be giving a talk, titled "Policy Learning Methods for Confounded POMDPs".

Abstract

In this talk I will present a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting. If time permits, I will describe a model-based method for confounded POMDPs.

Biography

Zhengling Qi is an assistant professor at the School of Business, the George Washington University. He got his PhD degree from the Department of Statistics and Operations Research at the University of North Carolina, Chapel Hill. His research has been focused on statistical machine learning and related non-convex optimization. He is mainly working on reinforcement learning and causal inference problems.

ML Seminar: Policy Learning Methods for Confounded POMDPs

Abstract

Biography

Share