Real-time Semantic Segmentation with Context Aggregation Network

Michael Ying Yang, Saumya Kumaar, Ye Lyu, Francesco Nex

Research output: Contribution to journalArticlepeer-review

63 Citations (SciVal)

Abstract

With the increasing demand of autonomous systems, pixelwise semantic segmentation for visual scene understanding needs to be not only accurate but also efficient for potential real-time applications. In this paper, we propose Context Aggregation Network, a dual branch convolutional neural network, with significantly lower computational costs as compared to the state-of-the-art, while maintaining a competitive prediction accuracy. Building upon the existing dual branch architectures for high-speed semantic segmentation, we design a high resolution branch for effective spatial detailing and a context branch with light-weight versions of global aggregation and local distribution blocks, potent to capture both long-range and local contextual dependencies required for accurate semantic segmentation, with low computational overheads. We evaluate our method on two semantic segmentation datasets, namely Cityscapes dataset and UAVid dataset. For Cityscapes test set, our model achieves state-of-the-art results with mIOU of 75.9%, at 76 FPS on an NVIDIA RTX 2080Ti and 8 FPS on a Jetson Xavier NX. With regards to UAVid dataset, our proposed network achieves mIOU score of 63.5% with high execution speed (15 FPS).

Original languageEnglish
Pages (from-to)124-134
Number of pages11
JournalISPRS Journal of Photogrammetry and Remote Sensing
Volume178
Early online date19 Jun 2021
DOIs
Publication statusPublished - 31 Aug 2021

Keywords

  • Context aggregation network
  • Convolutional neural network
  • Real-time
  • Semantic segmentation

ASJC Scopus subject areas

  • Atomic and Molecular Physics, and Optics
  • Engineering (miscellaneous)
  • Computer Science Applications
  • Computers in Earth Sciences

Fingerprint

Dive into the research topics of 'Real-time Semantic Segmentation with Context Aggregation Network'. Together they form a unique fingerprint.

Cite this