Interactive Image Segmentation with Cross-Modality Vision Transformers

Kun Li, George Vosselman, Michael Ying Yang

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

Abstract

Interactive image segmentation aims to segment the target from the background with the manual guidance, which takes as input multimodal data such as images, clicks, scribbles, polygons, and bounding boxes. Recently, vision transformers have achieved a great success in several downstream visual tasks, and a few efforts have been made to bring this powerful architecture to interactive segmentation task. However, the previous works neglect the relations between two modalities and directly mock the way of processing purely visual information with self-attentions. In this paper, we propose a simple yet effective network for click-based interactive segmentation with cross-modality vision transformers. Cross-modality transformers exploit mutual information to better guide the learning process. The experiments on several benchmarks show that the proposed method achieves superior performance in comparison to the previous state-of-the-art models. In addition, the stability of our method in term of avoiding failure cases shows its potential to be a practical annotation tool. The code and pretrained models will be released under https://github.com/lik1996/iCMFormer.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
PublisherIEEE
Pages762-772
Number of pages11
ISBN (Electronic)9798350307443
DOIs
Publication statusPublished - 25 Dec 2023
Event2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 - Paris, France
Duration: 2 Oct 20236 Oct 2023

Publication series

NameProceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023

Conference

Conference2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Country/TerritoryFrance
CityParis
Period2/10/236/10/23

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Interactive Image Segmentation with Cross-Modality Vision Transformers'. Together they form a unique fingerprint.

Cite this