217 Citations (SciVal)

Abstract

Semantic segmentation has been one of the leading research interests in computer vision recently. It serves as a perception foundation for many fields, such as robotics and autonomous driving. The fast development of semantic segmentation attributes enormously to the large scale datasets, especially for the deep learning related methods. There already exist several semantic segmentation datasets for comparison among semantic segmentation methods in complex urban scenes, such as the Cityscapes and CamVid datasets, where the side views of the objects are captured with a camera mounted on the driving car. There also exist semantic labeling datasets for the airborne images and the satellite images, where the nadir views of the objects are captured. However, only a few datasets capture urban scenes from an oblique Unmanned Aerial Vehicle (UAV) perspective, where both of the top view and the side view of the objects can be observed, providing more information for object recognition. In this paper, we introduce our UAVid dataset, a new high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. Our UAV dataset consists of 30 video sequences capturing high-resolution images in oblique views. In total, 300 images have been densely labeled with 8 classes for the semantic labeling task. We have provided several deep learning baseline methods with pre-training, among which the proposed Multi-Scale-Dilation net performs the best via multi-scale feature extraction, reaching a mean intersection-over-union (IoU) score around 50%. We have also explored the influence of spatial-temporal regularization for sequence data by leveraging on feature space optimization (FSO) and 3D conditional random field (CRF). Our UAVid website and the labeling tool have been published online (https://uavid.nl/).

Original languageEnglish
Pages (from-to)108-119
Number of pages12
JournalISPRS Journal of Photogrammetry and Remote Sensing
Volume165
DOIs
Publication statusPublished - 31 Jul 2020

Funding

The work is partially funded by ISPRS Scientific Initiative project SVSB (PI: Michael Ying Yang, co-PI: Alper Yilmaz) and National Natural Science Foundation of China (No. 61922065 and No. 61771350). The authors gratefully acknowledge the support. We also thank several graduate students from University of Twente and Wuhan University for their annotation effort. The work is partially funded by ISPRS Scientific Initiative project SVSB (PI: Michael Ying Yang, co-PI: Alper Yilmaz) and National Natural Science Foundation of China (No. 61922065 and No. 61771350 ). The authors gratefully acknowledge the support. We also thank several graduate students from University of Twente and Wuhan University for their annotation effort.

FundersFunder number
ISPRS
National Natural Science Foundation of China61771350, 61922065
University of Twente
Wuhan University

    Keywords

    • Dataset
    • Deep learning
    • Semantic segmentation
    • UAV

    ASJC Scopus subject areas

    • Atomic and Molecular Physics, and Optics
    • Engineering (miscellaneous)
    • Computer Science Applications
    • Computers in Earth Sciences

    Fingerprint

    Dive into the research topics of 'UAVid: A semantic segmentation dataset for UAV imagery'. Together they form a unique fingerprint.

    Cite this