This paper presents a multiview model of object categories, generally applicable to virtually any type of image features, and methods to efficiently perform, in a unified manner, detection, localization and continuous pose estimation in novel scenes. We represent appearance as distributions of low-level, fine-grained image features. Multiview models encode the appearance of objects at discrete viewpoints, and, in addition, how these viewpoints deform into one another as the viewpoint continuously varies (as detected from optical flow between training examples). Using a measure of similarity between an arbitrary test image and such a model at chosen viewpoints, we perform all tasks mentioned above with a common method. We leverage the simplicity of low-level image features, such as points extracted along edges, or coarse-scale gradients extracted densely over the images, by building probabilistic templates, i.e. distributions of features, learned from one or several training examples. We efficiently handle these distributions with probabilistic techniques such as kernel density estimation, Monte Carlo integration and importance sampling. We provide an extensive evaluation on a wide variety of benchmark datasets. We demonstrate performance on the "ETHZ Shape" dataset, with single (hand-drawn) and multiple training examples, well above baseline methods, on par with a number of more task-specific methods. We obtain remarkable performance on the recognition of more complex objects, notably the cars of the "3D Object" dataset of Savarese et al. with detection rates of 92.5% and an accuracy in pose estimation of 91%. We perform better than the state-of-the-art on continuous pose estimation with the "rotating cars" dataset of Ozuysal et al. We also demonstrate particular capabilities with a novel dataset featuring non-textured objects of undistinctive shapes, the pose of which can only be determined from shading, captured here by coarse scale intensity gradients.
Teney, D., & Piater, J. (2014). Multiview feature distributions for object detection and continuous pose estimation. Computer Vision and Image Understanding, 125, 265-282. https://doi.org/10.1016/j.cviu.2014.04.012