Video object detection with a convolutional regression tracker

Ye Lyu, Michael Ying Yang, George Vosselman, Gui Song Xia

Research output: Contribution to journalArticlepeer-review

13 Citations (SciVal)


Video object detection is a fundamental research task for scene understanding. Compared with object detection in images, object detection in videos has been less researched due to shortage of labelled video datasets. As frames in a video clip are highly correlated, a larger quantity of video labels are needed to have good data variation, which are not always available as the labels are much more expensive to attain. Regarding the above-mentioned problem, it is easy to train an image object detector, but not always possible to train a video object detector if there are insufficient video labels for certain classes. In order to deal with this problem and improve the performance of an image object detector for the classes without video labels, we propose to augment a well-trained image object detector with an efficient and effective class-agnostic convolutional regression tracker for the video object detection task. The tracker learns to track objects by reusing the features from the image object detector, which is a light-weighted increment to the detector, with only a slight speed drop for the video object detection task. The performance of our model is evaluated on the large-scale ImageNet VID dataset. Our strategy improves the mean average precision (mAP) score for the image object detector by around 5% and around 3% for the image object detector plus Seq-NMS post-processing.

Original languageEnglish
Pages (from-to)139-150
Number of pages12
JournalISPRS Journal of Photogrammetry and Remote Sensing
Early online date30 Apr 2021
Publication statusPublished - 30 Jun 2021


  • Convolutional regression tracker
  • Deep learning
  • Plug & Play
  • Tracking
  • Video object detection

ASJC Scopus subject areas

  • Atomic and Molecular Physics, and Optics
  • Engineering (miscellaneous)
  • Computer Science Applications
  • Computers in Earth Sciences


Dive into the research topics of 'Video object detection with a convolutional regression tracker'. Together they form a unique fingerprint.

Cite this