Upcoming Events

Is Data All You Need? Large Robot Action Models and Good Old Fashioned Engineering

Title: Is Data All You Need?: Large Robot Action Models and Good Old Fashioned Engineering

Bio: Ken Goldberg is William S. Floyd, Distinguished Chair of Engineering at UC Berkeley and Chief Scientist of Ambi Robotics and Jacobi Robotics. Ken leads research in robotics and automation: grasping, manipulation, and learning for applications in warehouses, industry, homes, agriculture, and robot-assisted surgery. He is a Professor of IEOR with appointments in EECS and Art Practice.  Ken is Chair of the Berkeley AI Research (BAIR) Steering Committee (60 faculty) and is co-founder and Editor-in-Chief emeritus of the IEEE Transactions on Automation.

Science and Engineering (T-ASE).  He has published ten US patents and over 400 refereed papers and presented over 600 invited lectures to academic and corporate audiences. http://goldberg.berkeley.edu

Abstract: 

bio

Enthusiasm for humanoids has been skyrocketing based on recent advances in "end-to-end" large robot action models. Initial results are promising, and several collaborative efforts are underway to collect the needed demonstration data. But is data really all you need?

Although end-to-end Large Vision, Language, Action (VLA) Models have potential to generalize and reliably solve all problems in robotics, initial results have been mixed1.  It seems likely that the size of the VLA state space and dearth of available demonstration data, combined with challenges in getting models to generalize beyond the training distribution and the inherent challenges in interpreting and debugging large models, will make it difficult for pure end-to-end systems to provide the kind of robot performance that investors expect in the near future.

In this presentation, I share my concerns about current trends in robotics, including task definition, data collection, and experimental evaluation.  I propose that to reach expected performance levels, we will need "Good Old Fashioned Engineering (GOFE)" – modularity, algorithms, and metrics.   I'll present MANIP2, a modular systems architecture that can integrate learning with well-established procedural algorithmic primitives such as Inverse Kinematics, Kalman Filters, RANSAC outlier rejection, PID modules, etc. I’ll show how we are using MANIP to improve performance on robot manipulation tasks such as grasping, cable untangling, surgical suturing, motion planning, and bagging, and propose open directions for research.

References: 

[1] Nishanth J. Kumar.  Will Scaling Solve Robotics? The idea of solving the biggest robotics challenges by training large models is sparking debate. IEEE Spectrum. 28 May 2024.

[2] MANIP: A Modular Architecture for iNtegrating Iteractive Perception into Long-Horizon Robot Manipulation Systems.  Justin Yu*, Tara Sadjadpour*, Abby O’Neill, Mehdi Khfifi, Lawrence Yunliang Chen, Richard Cheng, Ashwin Balakrishna, Thomas Kollar, Ken Goldberg.  IEEE/RSJ International Conference on Robots and Systems (IROS), Abhu Dhabi, UAE.  Oct 2024. Paper