Can Gaze Inform Egocentric Action Recognition?

Zehua Zhang, David Crandall, Michael Proulx, Sachin Talathi, Abhishek Sharma

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

2 Citations (SciVal)
73 Downloads (Pure)


We investigate the hypothesis that gaze-signal can improve egocentric action recognition on the standard benchmark, EGTEA Gaze++ dataset. In contrast to prior work where gaze-signal was only used during training, we formulate a novel neural fusion approach, Cross-modality Attention Blocks (CMA), to leverage gaze-signal for action recognition during inference as well. CMA combines information from different modalities at different levels of abstraction to achieve state-of-the-art performance for egocentric action recognition. Specifically, fusing the video-stream with optical-flow with CMA outperforms the current state-of-the-art by 3%. However, when CMA is employed to fuse gaze-signal with video-stream data, no improvements are observed. Further investigation of this counter-intuitive finding indicates that small spatial overlap between the network's attention-map and gaze ground-truth renders the gaze-signal uninformative for this benchmark. Based on our empirical findings, we recommend improvements to the current benchmark to develop practical systems for egocentric video understanding with gaze-signal.

Original languageEnglish
Title of host publicationProceedings - ETRA 2022
Subtitle of host publicationACM Symposium on Eye Tracking Research and Applications
EditorsStephen N. Spencer
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450392525
Publication statusPublished - 8 Jun 2022
Event2022 ACM Symposium on Eye Tracking Research and Applications, ETRA 2022 - Virtual, Online, USA United States
Duration: 8 Jun 202211 Jun 2022

Publication series

NameEye Tracking Research and Applications Symposium (ETRA)


Conference2022 ACM Symposium on Eye Tracking Research and Applications, ETRA 2022
Country/TerritoryUSA United States
CityVirtual, Online


  • attention
  • deep neural networks
  • egocentric action recognition
  • gaze

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Ophthalmology
  • Sensory Systems


Dive into the research topics of 'Can Gaze Inform Egocentric Action Recognition?'. Together they form a unique fingerprint.

Cite this