Multiview feature distributions for object detection and continuous pose estimation

D. Teney, J. Piater

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

This paper presents a multiview model of object categories, generally applicable to virtually any type of image features, and methods to efficiently perform, in a unified manner, detection, localization and continuous pose estimation in novel scenes. We represent appearance as distributions of low-level, fine-grained image features. Multiview models encode the appearance of objects at discrete viewpoints, and, in addition, how these viewpoints deform into one another as the viewpoint continuously varies (as detected from optical flow between training examples). Using a measure of similarity between an arbitrary test image and such a model at chosen viewpoints, we perform all tasks mentioned above with a common method. We leverage the simplicity of low-level image features, such as points extracted along edges, or coarse-scale gradients extracted densely over the images, by building probabilistic templates, i.e. distributions of features, learned from one or several training examples. We efficiently handle these distributions with probabilistic techniques such as kernel density estimation, Monte Carlo integration and importance sampling. We provide an extensive evaluation on a wide variety of benchmark datasets. We demonstrate performance on the "ETHZ Shape" dataset, with single (hand-drawn) and multiple training examples, well above baseline methods, on par with a number of more task-specific methods. We obtain remarkable performance on the recognition of more complex objects, notably the cars of the "3D Object" dataset of Savarese et al. with detection rates of 92.5% and an accuracy in pose estimation of 91%. We perform better than the state-of-the-art on continuous pose estimation with the "rotating cars" dataset of Ozuysal et al. We also demonstrate particular capabilities with a novel dataset featuring non-textured objects of undistinctive shapes, the pose of which can only be determined from shading, captured here by coarse scale intensity gradients.
Original languageEnglish
Pages (from-to)265-282
Number of pages18
JournalComputer Vision and Image Understanding
Volume125
DOIs
Publication statusPublished - 1 Aug 2014

Fingerprint

Railroad cars
Importance sampling
Optical flows
Object detection

Cite this

Multiview feature distributions for object detection and continuous pose estimation. / Teney, D.; Piater, J.

In: Computer Vision and Image Understanding, Vol. 125, 01.08.2014, p. 265-282.

Research output: Contribution to journalArticle

@article{7090ad48323646baaeb2281a76b9f186,
title = "Multiview feature distributions for object detection and continuous pose estimation",
abstract = "This paper presents a multiview model of object categories, generally applicable to virtually any type of image features, and methods to efficiently perform, in a unified manner, detection, localization and continuous pose estimation in novel scenes. We represent appearance as distributions of low-level, fine-grained image features. Multiview models encode the appearance of objects at discrete viewpoints, and, in addition, how these viewpoints deform into one another as the viewpoint continuously varies (as detected from optical flow between training examples). Using a measure of similarity between an arbitrary test image and such a model at chosen viewpoints, we perform all tasks mentioned above with a common method. We leverage the simplicity of low-level image features, such as points extracted along edges, or coarse-scale gradients extracted densely over the images, by building probabilistic templates, i.e. distributions of features, learned from one or several training examples. We efficiently handle these distributions with probabilistic techniques such as kernel density estimation, Monte Carlo integration and importance sampling. We provide an extensive evaluation on a wide variety of benchmark datasets. We demonstrate performance on the {"}ETHZ Shape{"} dataset, with single (hand-drawn) and multiple training examples, well above baseline methods, on par with a number of more task-specific methods. We obtain remarkable performance on the recognition of more complex objects, notably the cars of the {"}3D Object{"} dataset of Savarese et al. with detection rates of 92.5{\%} and an accuracy in pose estimation of 91{\%}. We perform better than the state-of-the-art on continuous pose estimation with the {"}rotating cars{"} dataset of Ozuysal et al. We also demonstrate particular capabilities with a novel dataset featuring non-textured objects of undistinctive shapes, the pose of which can only be determined from shading, captured here by coarse scale intensity gradients.",
author = "D. Teney and J. Piater",
year = "2014",
month = "8",
day = "1",
doi = "10.1016/j.cviu.2014.04.012",
language = "English",
volume = "125",
pages = "265--282",
journal = "Computer Vision and Image Understanding",
issn = "1077-3142",
publisher = "Elsevier Academic Press Inc",

}

TY - JOUR

T1 - Multiview feature distributions for object detection and continuous pose estimation

AU - Teney, D.

AU - Piater, J.

PY - 2014/8/1

Y1 - 2014/8/1

N2 - This paper presents a multiview model of object categories, generally applicable to virtually any type of image features, and methods to efficiently perform, in a unified manner, detection, localization and continuous pose estimation in novel scenes. We represent appearance as distributions of low-level, fine-grained image features. Multiview models encode the appearance of objects at discrete viewpoints, and, in addition, how these viewpoints deform into one another as the viewpoint continuously varies (as detected from optical flow between training examples). Using a measure of similarity between an arbitrary test image and such a model at chosen viewpoints, we perform all tasks mentioned above with a common method. We leverage the simplicity of low-level image features, such as points extracted along edges, or coarse-scale gradients extracted densely over the images, by building probabilistic templates, i.e. distributions of features, learned from one or several training examples. We efficiently handle these distributions with probabilistic techniques such as kernel density estimation, Monte Carlo integration and importance sampling. We provide an extensive evaluation on a wide variety of benchmark datasets. We demonstrate performance on the "ETHZ Shape" dataset, with single (hand-drawn) and multiple training examples, well above baseline methods, on par with a number of more task-specific methods. We obtain remarkable performance on the recognition of more complex objects, notably the cars of the "3D Object" dataset of Savarese et al. with detection rates of 92.5% and an accuracy in pose estimation of 91%. We perform better than the state-of-the-art on continuous pose estimation with the "rotating cars" dataset of Ozuysal et al. We also demonstrate particular capabilities with a novel dataset featuring non-textured objects of undistinctive shapes, the pose of which can only be determined from shading, captured here by coarse scale intensity gradients.

AB - This paper presents a multiview model of object categories, generally applicable to virtually any type of image features, and methods to efficiently perform, in a unified manner, detection, localization and continuous pose estimation in novel scenes. We represent appearance as distributions of low-level, fine-grained image features. Multiview models encode the appearance of objects at discrete viewpoints, and, in addition, how these viewpoints deform into one another as the viewpoint continuously varies (as detected from optical flow between training examples). Using a measure of similarity between an arbitrary test image and such a model at chosen viewpoints, we perform all tasks mentioned above with a common method. We leverage the simplicity of low-level image features, such as points extracted along edges, or coarse-scale gradients extracted densely over the images, by building probabilistic templates, i.e. distributions of features, learned from one or several training examples. We efficiently handle these distributions with probabilistic techniques such as kernel density estimation, Monte Carlo integration and importance sampling. We provide an extensive evaluation on a wide variety of benchmark datasets. We demonstrate performance on the "ETHZ Shape" dataset, with single (hand-drawn) and multiple training examples, well above baseline methods, on par with a number of more task-specific methods. We obtain remarkable performance on the recognition of more complex objects, notably the cars of the "3D Object" dataset of Savarese et al. with detection rates of 92.5% and an accuracy in pose estimation of 91%. We perform better than the state-of-the-art on continuous pose estimation with the "rotating cars" dataset of Ozuysal et al. We also demonstrate particular capabilities with a novel dataset featuring non-textured objects of undistinctive shapes, the pose of which can only be determined from shading, captured here by coarse scale intensity gradients.

UR - http://www.scopus.com/inward/record.url?scp=84901914732&partnerID=8YFLogxK

UR - http://dx.doi.org/10.1016/j.cviu.2014.04.012

U2 - 10.1016/j.cviu.2014.04.012

DO - 10.1016/j.cviu.2014.04.012

M3 - Article

VL - 125

SP - 265

EP - 282

JO - Computer Vision and Image Understanding

JF - Computer Vision and Image Understanding

SN - 1077-3142

ER -