ActiveEye: Enabling Continuous and Responsive Video Understanding for Smart Eyewear Systems

Zhenyu Xu, Tianlin Lu, Yingying Zhao, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

Research output: Contribution to journalArticlepeer-review

Abstract

Integrating vision-language models (VLMs) with wearable devices offers great potential for continuous and responsive video understanding, a key capability for applications such as smart eyewear-based conversational assistants. However, achieving this on resource-constrained devices is challenging due to the high energy demands of continuous spatial-temporal sampling and transmission. We propose ActiveEye, a VLM designed for energy-efficient and responsive video understanding. ActiveEye separates visual and motion semantic representations and incorporates an active perception-based feedback path to adaptively adjust spatial-temporal sampling and transmission rates. Implemented as a wearable-mobile-cloud system, ActiveEye is evaluated for energy efficiency, real-time semantic change detection, and video understanding in both laboratory and field studies. Using the EgoSchema dataset, ActiveEye reduces the front-end energy consumption by 49.14%, supporting 8.37 hours of continuous operation on a 2.1 Wh battery. It achieves the highest F1 score (0.80) and the lowest average time difference (1.30 s) compared with heuristic-based event detection algorithms, validating its timely semantic detection. Furthermore, ActiveEye achieves a visual question answering (VQA) accuracy of 61.6%, which is comparable to state-of-the-art VLM agents, despite their reliance on larger language decoders and more computationally intensive frame selection strategies. Two rounds of in-field user evaluations further confirm its effectiveness in real-world settings, demonstrating its practical viability as a continuous and responsive video understanding system, conversational assistant, and wearable companion.

Original languageEnglish
Article number228
Number of pages33
JournalProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume9
Issue number4
DOIs
Publication statusPublished - 2 Dec 2025

Funding

This work was supported in part by the Department for Science, Innovation and Technology under Grant K250071-101. Yujiang Wang was supported by a Basic Research Program of Jiangsu (BK20240414) and a Leadership TalentProgram (Science and Education) of Suzhou Industrial Park (KJQ2024204).

Keywords

  • Energy-efficient
  • Responsive
  • Smart Eyewear
  • Video Understanding

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'ActiveEye: Enabling Continuous and Responsive Video Understanding for Smart Eyewear Systems'. Together they form a unique fingerprint.

Cite this