Abstract
Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for selfsupervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative Recog-Trans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporalrelated downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method [28] by 6.4% on UCF101 and 8.3% on HMDB51 for action recognition (Topl Acc); improves video retrieval on UCF101 by 20.4% (R@1). The promising results validate that RecogTrans is still a worth exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
| Publisher | IEEE |
| Pages | 2990-3000 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781665469463 |
| DOIs | |
| Publication status | Published - 24 Jun 2022 |
| Event | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, USA United States Duration: 19 Jun 2022 → 24 Jun 2022 |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| Volume | 2022-June |
| ISSN (Print) | 1063-6919 |
Conference
| Conference | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
|---|---|
| Country/Territory | USA United States |
| City | New Orleans |
| Period | 19/06/22 → 24/06/22 |
Funding
This study is supported by the General Research Funds (GRF) of Hong Kong (No.14203518) and Shanghai Committee of Science and Technology, China (No. 20DZ1100800).
Keywords
- Representation learning
- Self-& semi-& meta- & unsupervised learning
- Video analysis and understanding
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
Fingerprint
Dive into the research topics of 'TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS