CSE DSI Machine Learning Seminar with Heng Yang (SEAS, Harvard)

Control-oriented Clustering of Visual Latent Representation

We initiate a study of the geometry of the visual representation space --the information channel from the vision encoder to the action decoder-- in an image-based control pipeline learned from behavior cloning. Inspired by the phenomenon of neural collapse (NC) in image classification, we investigate whether a similar law of clustering emerges in the visual representation space. Since image-based control is a regression task without explicitly defined classes, the central piece of the puzzle lies in determining according to what implicit classes the visual features cluster, if such a law exists. Focusing on image-based planar pushing, we posit the most important role of the visual representation in a control task is to convey a goal to the action decoder. We then classify training samples of expert demonstrations into eight "control-oriented" classes based on (a) the relative pose between the object and the target in the input or (b) the relative pose of the object induced by expert actions in the output, where one class corresponds to one relative pose orthant (REPO). Across four different instantiations of architecture, we report the prevalent emergence of control-oriented clustering in the visual representation space according to the eight REPOs. Beyond empirical observation, we show such a law of clustering can be leveraged as an algorithmic tool to improve test-time performance when training a policy with limited expert demonstrations. Particularly, we pretrain the vision encoder using NC as a regularization to encourage control-oriented clustering of the visual features. Surprisingly, such an NC-pretrained vision encoder, when finetuned end-to-end with the action decoder, boosts the test-time performance by 10% to 35% in the low-data regime. Real-world vision-based planar pushing experiments confirmed the surprising advantage of control-oriented visual representation pretraining.

Heng Yang is an Assistant Professor of Electrical Engineering in the School of Engineering and Applied Sciences at Harvard University. He directs the Harvard Computational Robotics Lab. Heng obtained his PhD from MIT in 2022, MS from MIT in 2017, and BEng from Tsinghua University in 2015. He is broadly interested in the intersection of theory and practice, particularly computational algorithms that are robust, efficient, offer strong performance guarantees, and supercharge the next generation of intelligent systems. Heng is a recipient of the Best Paper Award in Robot Vision at the 2020 IEEE International Conference on Robotics and Automation (ICRA), a Best Paper Award Honorable Mention from the 2020 IEEE Robotics and Automation Letters (RA-L), and a Best Paper Award Finalist at the 2021 Robotics: Science and Systems (RSS) conference.

Start date
Tuesday, Oct. 29, 2024, 11 a.m.
End date
Tuesday, Oct. 29, 2024, Noon
Location

Keller Hall 3-180 and via Zoom.

Share