Abstract

This paper explores ways of combining vision and touch for the purpose of object recognition. In particular, it focuses on scenarios when there are few tactile training samples (as these are usually costly to obtain) and when vision is artificially impaired. Whilst machine vision is a widely studied field, and machine touch has received some attention recently, the fusion of both modalities remains a relatively unexplored area. It has been suggested that, in the human brain, there exist shared multi-sensorial representations of objects. This provides robustness when one or more senses are absent or unreliable. Modern robotics systems can benefit from multi-sensorial input, in particular in contexts where one or more of the sensors perform poorly. In this paper, a recently proposed tactile recognition model was extended by integrating a simple vision system in three different ways: vector concatenation (vision feature vector and tactile feature vector), object label posterior averaging and object label posterior product. A comparison is drawn in terms of overall accuracy of recognition and in terms of how quickly (number of training samples) learning occurs. The conclusions reached are: (1) the most accurate system is “posterior product”, (2) multi-modal recognition has higher accuracy to either modality alone if all visual and tactile training data are pooled together, and (3) in the case of visual impairment, multi-modal recognition “learns faster”, i.e. requires fewer training samples to achieve the same accuracy as either other modality.
LanguageEnglish
Article number2
Number of pages10
JournalRobotics and Biomimetics
Volume4
DOIs
StatusPublished - 18 Apr 2017

Fingerprint

Object recognition
Labels
Computer vision
Brain
Robotics
Fusion reactions
Sensors

Cite this

Object recognition combining vision and touch. / Corradi, Tadeo; Hall, Peter; Iravani, Pejman.

In: Robotics and Biomimetics, Vol. 4, 2, 18.04.2017.

Research output: Contribution to journalArticle

@article{0aef7d3d3eb649ff9ef8cea7c15a48d0,
title = "Object recognition combining vision and touch",
abstract = "This paper explores ways of combining vision and touch for the purpose of object recognition. In particular, it focuses on scenarios when there are few tactile training samples (as these are usually costly to obtain) and when vision is artificially impaired. Whilst machine vision is a widely studied field, and machine touch has received some attention recently, the fusion of both modalities remains a relatively unexplored area. It has been suggested that, in the human brain, there exist shared multi-sensorial representations of objects. This provides robustness when one or more senses are absent or unreliable. Modern robotics systems can benefit from multi-sensorial input, in particular in contexts where one or more of the sensors perform poorly. In this paper, a recently proposed tactile recognition model was extended by integrating a simple vision system in three different ways: vector concatenation (vision feature vector and tactile feature vector), object label posterior averaging and object label posterior product. A comparison is drawn in terms of overall accuracy of recognition and in terms of how quickly (number of training samples) learning occurs. The conclusions reached are: (1) the most accurate system is “posterior product”, (2) multi-modal recognition has higher accuracy to either modality alone if all visual and tactile training data are pooled together, and (3) in the case of visual impairment, multi-modal recognition “learns faster”, i.e. requires fewer training samples to achieve the same accuracy as either other modality.",
author = "Tadeo Corradi and Peter Hall and Pejman Iravani",
year = "2017",
month = "4",
day = "18",
doi = "10.1186/s40638-017-0058-2",
language = "English",
volume = "4",
journal = "Robotics and Biomimetics",
issn = "2197-3768",
publisher = "SpringerOpen",

}

TY - JOUR

T1 - Object recognition combining vision and touch

AU - Corradi,Tadeo

AU - Hall,Peter

AU - Iravani,Pejman

PY - 2017/4/18

Y1 - 2017/4/18

N2 - This paper explores ways of combining vision and touch for the purpose of object recognition. In particular, it focuses on scenarios when there are few tactile training samples (as these are usually costly to obtain) and when vision is artificially impaired. Whilst machine vision is a widely studied field, and machine touch has received some attention recently, the fusion of both modalities remains a relatively unexplored area. It has been suggested that, in the human brain, there exist shared multi-sensorial representations of objects. This provides robustness when one or more senses are absent or unreliable. Modern robotics systems can benefit from multi-sensorial input, in particular in contexts where one or more of the sensors perform poorly. In this paper, a recently proposed tactile recognition model was extended by integrating a simple vision system in three different ways: vector concatenation (vision feature vector and tactile feature vector), object label posterior averaging and object label posterior product. A comparison is drawn in terms of overall accuracy of recognition and in terms of how quickly (number of training samples) learning occurs. The conclusions reached are: (1) the most accurate system is “posterior product”, (2) multi-modal recognition has higher accuracy to either modality alone if all visual and tactile training data are pooled together, and (3) in the case of visual impairment, multi-modal recognition “learns faster”, i.e. requires fewer training samples to achieve the same accuracy as either other modality.

AB - This paper explores ways of combining vision and touch for the purpose of object recognition. In particular, it focuses on scenarios when there are few tactile training samples (as these are usually costly to obtain) and when vision is artificially impaired. Whilst machine vision is a widely studied field, and machine touch has received some attention recently, the fusion of both modalities remains a relatively unexplored area. It has been suggested that, in the human brain, there exist shared multi-sensorial representations of objects. This provides robustness when one or more senses are absent or unreliable. Modern robotics systems can benefit from multi-sensorial input, in particular in contexts where one or more of the sensors perform poorly. In this paper, a recently proposed tactile recognition model was extended by integrating a simple vision system in three different ways: vector concatenation (vision feature vector and tactile feature vector), object label posterior averaging and object label posterior product. A comparison is drawn in terms of overall accuracy of recognition and in terms of how quickly (number of training samples) learning occurs. The conclusions reached are: (1) the most accurate system is “posterior product”, (2) multi-modal recognition has higher accuracy to either modality alone if all visual and tactile training data are pooled together, and (3) in the case of visual impairment, multi-modal recognition “learns faster”, i.e. requires fewer training samples to achieve the same accuracy as either other modality.

UR - https://doi.org/10.1186/s40638-017-0058-2

UR - https://doi.org/10.1186/s40638-017-0058-2

U2 - 10.1186/s40638-017-0058-2

DO - 10.1186/s40638-017-0058-2

M3 - Article

VL - 4

JO - Robotics and Biomimetics

T2 - Robotics and Biomimetics

JF - Robotics and Biomimetics

SN - 2197-3768

M1 - 2

ER -