TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition

Haodong Duan, Nanxuan Zhao, Kai Chen, Dahua Lin

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for selfsupervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative Recog-Trans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporalrelated downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method [28] by 6.4% on UCF101 and 8.3% on HMDB51 for action recognition (Topl Acc); improves video retrieval on UCF101 by 20.4% (R@1). The promising results validate that RecogTrans is still a worth exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE
Pages2990-3000
Number of pages11
ISBN (Electronic)9781665469463
DOIs
Publication statusPublished - 24 Jun 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, USA United States
Duration: 19 Jun 202224 Jun 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUSA United States
CityNew Orleans
Period19/06/2224/06/22

Keywords

  • Representation learning
  • Self-& semi-& meta- & unsupervised learning
  • Video analysis and understanding

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition'. Together they form a unique fingerprint.

Cite this