We propose an algorithm for automatically obtaining a segmentation of a rigid object in a sequence of images that are calibrated for camera pose and intrinsic parameters. Until recently, the best segmentation results have been obtained by interactive methods that require manual labelling of image regions. Our method requires no user input but instead relies on the camera fixating on the object of interest during the sequence. We begin by learning a model of the object's colour, from the image pixels around the fixation points. We then extract image edges and combine these with the object colour information in a volumetric binary MRF model. The globally optimal segmentation of 3D space is obtained by a graph-cut optimisation. From this segmentation an improved colour model is extracted and the whole process is iterated until convergence. Our first finding is that the fixation constraint, which requires that the object of interest is more or less central in the image, is enough to determine what to segment and initialise an automatic segmentation process. Second, we find that by performing a single segmentation in 3D, we implicitly exploit a 3D rigidity constraint, expressed as silhouette coherency, which significantly improves silhouette quality over independent 2D segmentations. We demonstrate the validity of our approach by providing segmentation results on real sequences.
Campbell, N. D. F., Vogiatzis, G., Hernández, C., & Cipolla, R. (2010). Automatic 3D object segmentation in multiple views using volumetric graph-cuts. Image and Vision Computing, 28(1), 14-25. https://doi.org/10.1016/j.imavis.2008.09.005