Cuboids revisited: Learning robust 3D shape fitting to single RGB images

Florian Kluger, Hanno Ackermann, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

15 Citations (SciVal)

Abstract

Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
PublisherIEEE
Pages13065-13074
Number of pages10
ISBN (Electronic)9781665445092
DOIs
Publication statusPublished - 2 Nov 2021
Event2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, USA United States
Duration: 19 Jun 202125 Jun 2021

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Country/TerritoryUSA United States
CityVirtual, Online
Period19/06/2125/06/21

Funding

Acknowledgements. This work was supported by the BMBF grant LeibnizAILab (01DD20003), by the DFG grant COVMAP (RO 2497/12-2), by the DFG Cluster of Excellence PhoenixD (EXC 2122), and by the Center for Digital Innovations (ZDIN).

FundersFunder number
Center for Digital Innovations
DFG Cluster of Excellence PhoenixDEXC 2122
ZDIN
Deutsche ForschungsgemeinschaftRO 2497/12-2
Bundesministerium für Bildung und Forschung01DD20003

    ASJC Scopus subject areas

    • Software
    • Computer Vision and Pattern Recognition

    Fingerprint

    Dive into the research topics of 'Cuboids revisited: Learning robust 3D shape fitting to single RGB images'. Together they form a unique fingerprint.

    Cite this