ISyE Seminar Series: Vijay Subramanian
"Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States"
Vijay Subramanian
Professor in ECE Division of the EECS Department
University of Michigan, Ann Arbor
About the Seminar:
Multi-agent systems appear in many engineering and socioeconomic settings, wherein a group of agents or controllers interact with each other in a shared and possibly non-stationary environment, and make sequential decisions based on their own information using a (causal) interaction-mechanism.
In this talk we focus attention on cooperative sequential decision making under uncertainty—a decentralized team, where a fixed finite number of agents act as a team with the common goal of minimizing a long-term cost function. We investigate the general situation where one long-term (objective) cost must be minimized, while maintaining multiple other long-term (constraint) costs within prescribed limits via a cooperative Multi-Agent Constrained Partially Observable Markov Decision Process (MAC-POMDP) model. Such constrained sequential team decision problems arise in several real-world applications where efficient operation must be balanced with maintaining safe operating margins—such considerations arise in communication networks, traffic management, energy-grid optimization, e-commerce pricing, environmental monitoring, etc.
We focus on the discounted cost criterion, and start by establishing general results on Lagrangian duality and existence of a global saddle-point. Next, we consider decentralized policy-profiles and their mixtures, and establish that when agents mix jointly over their policy-profiles, there is no (Lagrangian) duality gap, and a global saddle-point exists under the Slater's condition. However, when agents mix independently over their policy-profiles, we show (through a concrete counterexample) that a non-zero duality gap can exist. Then, we consider coordination policies and their mixtures, and establish that, except for pure coordination policies, they are all equivalent to joint mixtures of decentralized policy-profiles. This equivalence result helps reformulate the original multi-agent constrained optimization problem into a single-agent constrained optimization problem, which is then used to propose a primal-dual framework for model-based optimal control. Finally, we extend the notion of a Multi-Agent Approximate Information State (MA-AIS) to constrained decision making, and formalize MA-AIS based coordination policies and their mixtures. We establish through a concrete counter-example that, (in contrast to behavioral coordination policies), MA-AIS based behavioral coordination policies and their mixtures are not equivalent. We also establish approximate optimality of mixtures of MA-AIS based coordination policies, and use this result to guide the development of a data-driven alternative for the aforementioned model-based primal-dual framework.
This is joint work with Nouman Khan, Amazon Search, Seattle, WA, which was carried out when he was a PhD student at the University of Michigan, Ann Arbor.
About the Speaker:
Vijay Subramanian is a Professor in ECE Division of the EECS Department at the University of Michigan, Ann Arbor; from Fall 2014 to Summer 2024 he was an Associate Professor at the same institution. He received the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1999. He worked at Motorola Inc., the Hamilton Institute, Maynooth, Ireland, and the EECS Department, Northwestern University, Evanston, IL, USA; he also held an Adjunct Research Associate Professor in CSL and ECE at UIUC. His current research interests are in stochastic analysis, random graphs, multi-agent systems, and game theory (mechanism and information design) with applications to social, economic and technological networks.
If you wish to be added to the ISyE Graduate Seminar Series emailing list, please email Event Coordinator Emily Rice at [email protected].