Abstract
Integrating vision-language models (VLMs) with wearable devices offers great potential for continuous and responsive video understanding, a key capability for applications such as smart eyewear-based conversational assistants. However, achieving this on resource-constrained devices is challenging due to the high energy demands of continuous spatial-temporal sampling and transmission. We propose ActiveEye, a VLM designed for energy-efficient and responsive video understanding. ActiveEye separates visual and motion semantic representations and incorporates an active perception-based feedback path to adaptively adjust spatial-temporal sampling and transmission rates. Implemented as a wearable-mobile-cloud system, ActiveEye is evaluated for energy efficiency, real-time semantic change detection, and video understanding in both laboratory and field studies. Using the EgoSchema dataset, ActiveEye reduces the front-end energy consumption by 49.14%, supporting 8.37 hours of continuous operation on a 2.1 Wh battery. It achieves the highest F1 score (0.80) and the lowest average time difference (1.30 s) compared with heuristic-based event detection algorithms, validating its timely semantic detection. Furthermore, ActiveEye achieves a visual question answering (VQA) accuracy of 61.6%, which is comparable to state-of-the-art VLM agents, despite their reliance on larger language decoders and more computationally intensive frame selection strategies. Two rounds of in-field user evaluations further confirm its effectiveness in real-world settings, demonstrating its practical viability as a continuous and responsive video understanding system, conversational assistant, and wearable companion.
| Original language | English |
|---|---|
| Article number | 228 |
| Number of pages | 33 |
| Journal | Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies |
| Volume | 9 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 2 Dec 2025 |
Funding
This work was supported in part by the Department for Science, Innovation and Technology under Grant K250071-101. Yujiang Wang was supported by a Basic Research Program of Jiangsu (BK20240414) and a Leadership TalentProgram (Science and Education) of Suzhou Industrial Park (KJQ2024204).
Keywords
- Energy-efficient
- Responsive
- Smart Eyewear
- Video Understanding
ASJC Scopus subject areas
- Human-Computer Interaction
- Hardware and Architecture
- Computer Networks and Communications