MCFAM Seminar: Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Speaker: Renyuan Xu

Abstract: Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce platform recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. 

In this talk, we explore a scenario where decision-makers aim to optimize a general utility function of cumulative reward. To facilitate the Dynamic Programming Principle and Bellman equation, we consider a state augmentation framework with an additional dimension accounting for cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering net. Under mild assumptions,  our algorithms can efficiently identify a near-optimal policy in terms of sample complexity and regret. If time permits, we will also touch upon the statistical inference of the utility function and the game-theoretical aspect when the principal faces a large population of heterogeneous individuals with unknown utility functions.
Start date
Friday, March 29, 2024, Noon
End date
Friday, March 29, 2024, 1 p.m.

 Join in-person - Vincent Hall 311  - Via Zoom: