MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM

Binbin Xu, Wenbin Li, Dimos Tzoumanikas, Michael Bloesch, Andrew Davison, Stefan Leutenegger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Downloads (Pure)

Abstract

We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.
Original languageEnglish
Title of host publicationIEEE International Conference on Robotics and Automation
Publication statusAccepted/In press - 24 May 2019

Publication series

NameInternational Conference On Robotics and Automation
PublisherIEEE
ISSN (Electronic)2379-9544

Cite this

Xu, B., Li, W., Tzoumanikas, D., Bloesch, M., Davison, A., & Leutenegger, S. (Accepted/In press). MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. In IEEE International Conference on Robotics and Automation (International Conference On Robotics and Automation).

MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. / Xu, Binbin; Li, Wenbin; Tzoumanikas, Dimos; Bloesch, Michael; Davison, Andrew; Leutenegger, Stefan.

IEEE International Conference on Robotics and Automation. 2019. (International Conference On Robotics and Automation).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xu, B, Li, W, Tzoumanikas, D, Bloesch, M, Davison, A & Leutenegger, S 2019, MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. in IEEE International Conference on Robotics and Automation. International Conference On Robotics and Automation.
Xu B, Li W, Tzoumanikas D, Bloesch M, Davison A, Leutenegger S. MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. In IEEE International Conference on Robotics and Automation. 2019. (International Conference On Robotics and Automation).
Xu, Binbin ; Li, Wenbin ; Tzoumanikas, Dimos ; Bloesch, Michael ; Davison, Andrew ; Leutenegger, Stefan. / MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. IEEE International Conference on Robotics and Automation. 2019. (International Conference On Robotics and Automation).
@inproceedings{c33227aa2f3245288023afe5966bee47,
title = "MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM",
abstract = "We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.",
author = "Binbin Xu and Wenbin Li and Dimos Tzoumanikas and Michael Bloesch and Andrew Davison and Stefan Leutenegger",
year = "2019",
month = "5",
day = "24",
language = "English",
series = "International Conference On Robotics and Automation",
publisher = "IEEE",
booktitle = "IEEE International Conference on Robotics and Automation",

}

TY - GEN

T1 - MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM

AU - Xu, Binbin

AU - Li, Wenbin

AU - Tzoumanikas, Dimos

AU - Bloesch, Michael

AU - Davison, Andrew

AU - Leutenegger, Stefan

PY - 2019/5/24

Y1 - 2019/5/24

N2 - We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.

AB - We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.

M3 - Conference contribution

T3 - International Conference On Robotics and Automation

BT - IEEE International Conference on Robotics and Automation

ER -