New research teaches AI to see the world in first-person

Facebook AI has just announced Ego4D, a new project focused on addressing current research challenges in the field of egocentric perception.

The University of Minnesota is part of the consortium of 13 universities and labs from across the world that have been collecting first-person videos of participants taking part in everyday activities. These videos will be instrumental in teaching artificial intelligence systems to see the world from an egocentric perspective.

CS&E assistant professor Hyun Soo Park is the lead researcher at the University of Minnesota on the Ego4D project. Park and his students Jayant Sharma, Zachary Chavis, and Tien Do have been collecting hundreds of hours of egocentric videos and formulating computer vision benchmark challenges, including future prediction and social interactions.

According to a blog post from Facebook AI, in total, the research partners have “collected more than 2,200 hours of first-person video in the wild, featuring over 700 participants going about their daily lives.”

The post continues to share that “the video collection captures what the camera wearer chooses to gaze at in a specific environment, what the camera wearer is doing with their hands and objects in front of them, and how the camera wearer interacts with other people from the egocentric perspective.”

The University of Minnesota team, in particular, has been focusing on social interactions such as playing card games, collaborative cooking, and family gatherings that involve multiple participants. This requires extra consideration of video synchronization and calibration.

These videos will be gathered together to create a first-of-its-kind egocentric data set that will be made publicly available to the research community.

“Despite its societal and interdisciplinary impact, computational understanding of social interactions is still lagging behind. A key challenge was that the existing datasets on social interactions are small and biased, which limits the repeatability and generalizability,” stated Professor Park. “We anticipate that the Ego4D benchmark on social interactions will change the way we design a computational model because it provides unprecedentedly large, diverse, and fine-grained data and annotations.”

Read the full blog post from Facebook AI.