On Fri October 15, 2021


Kristen Grauman (UT Austin / Professor)


First-Person Video for Understanding Interactions


Today’s perception systems excel at naming things in third-person Internet photos or videos, which purposefully convey a visual scene or moment. In contrast, first-person or “egocentric” perception requires understanding the multi-modal video that streams to a person’s (or robot’s) wearable camera. While video from an always-on wearable camera lacks the curation of an intentional photographer, it does provide a special window into the camera wearer’s attention, goals, and interactions with people and objects in her environment. These factors make first-person video an exciting avenue for the future of perception in augmented reality and robot learning.

Motivated by this setting, I will present our recent work on first-person video. First, we explore learning visual affordances to anticipate how objects and spaces can be used. We show how to transform egocentric video into a human-centric topological map of a physical space (such as a kitchen) that captures its primary zones of interaction and the activities they support. Moving down to the object level, we develop video anticipation models that localize interaction “hotspots” indicating how/where an object can be manipulated (e.g., pressable, toggleable, etc.). Towards translating these affordances into robot action, we prime reinforcement learning agents to prefer human-like interactions, thereby accelerating their task learning. Turning to audio-visual sensing, we attempt to extract a conversation partner’s speech from competing background sounds or other human speakers. Finally, I will briefly preview a multi-institution large-scale egocentric video dataset effort.


Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist in Facebook AI Research (FAIR).  Her research in computer vision and machine learning focuses on video, visual recognition, and embodied perception.  Before joining UT-Austin in 2007, she received her Ph.D. at MIT.  She is an IEEE Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award.  She was inducted into the UT Academy of Distinguished Teachers in 2017.  She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award).  She served as an Associate Editor-in-Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a Program Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015 and Neural Information Processing Systems (NeurIPS) 2018.