ISyE Seminar Series: R. Srikant

"Why is RLHF Data-Efficient in Policy Optimization?"

R. Srikant

Grainger Chair in Engineering, Co-Director of the C3.ai Digital Transformation Institute and a Professor of Electrical and Computer Engineering and the Coordinated Science Lab
University of Illinois Urbana-Champaign

About the Seminar:

We consider a version of a policy optimization in reinforcement learning where one has to learn rewards through human feedback. We study the sample complexity of this algorithm and compare it to the sample complexity of an algorithm where the rewards are known a priori. We show that the amount of additional data needed to infer rewards from human feedback is a small fraction of the total amount of data needed for policy optimization. Joint work with Yihan Du, Anna Winnicki, Gal Dalal and Shie Mannor.

About the Speaker:

R. Srikant is a Grainger Chair in Engineering, Co-Director of the C3.ai Digital Transformation Institute and a Professor of Electrical and Computer Engineering and the Coordinated Science Lab at the University of Illinois Urbana-Champaign. His research interests span machine learning, applied probability and communication networks. He is the recipient of the 2021 ACM SIGMETRICS Achievement Award, the 2019 IEEE Koji Kobayashi Computers and Communication Award and the 2015 IEEE INFOCOM Achievement Award. He has also received several Best Paper awards including the 2015 IEEE INFOCOM Best Paper Award and the 2017 Applied Probability Society Best Publication Award.

ISyE Seminar Series: R. Srikant

"Why is RLHF Data-Efficient in Policy Optimization?"

R. Srikant

About the Seminar:

About the Speaker:

Share