TY - GEN
T1 - Coarse or Fine?
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
AU - Moltisanti, Davide
AU - Bilen, Hakan
AU - Sevilla-Lara, Laura
AU - Keller, Frank
PY - 2024/9/27
Y1 - 2024/9/27
N2 - We focus on the problem of recognising the end state of an action in an image, which is critical for understanding what action is performed and in which manner. We study this focusing on the task of predicting the coarseness of a cut, i.e., deciding whether an object was cut "coarsely"or "finely". No dataset with these annotated end states is available, so we propose an augmentation method to synthesise training data. We apply this method to cutting actions extracted from an existing action recognition dataset. Our method is object agnostic, i.e., it presupposes the location of the object but not its identity. Starting from less than a hundred images of a whole object, we can generate several thousands images simulating visually diverse cuts of different coarseness. We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects. Results demonstrate that the model successfully recognises the end state of the cutting action despite the domain gap between training and testing, and that the model generalises well to unseen objects.
AB - We focus on the problem of recognising the end state of an action in an image, which is critical for understanding what action is performed and in which manner. We study this focusing on the task of predicting the coarseness of a cut, i.e., deciding whether an object was cut "coarsely"or "finely". No dataset with these annotated end states is available, so we propose an augmentation method to synthesise training data. We apply this method to cutting actions extracted from an existing action recognition dataset. Our method is object agnostic, i.e., it presupposes the location of the object but not its identity. Starting from less than a hundred images of a whole object, we can generate several thousands images simulating visually diverse cuts of different coarseness. We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects. Results demonstrate that the model successfully recognises the end state of the cutting action despite the domain gap between training and testing, and that the model generalises well to unseen objects.
KW - adverb recognition
KW - fine-grained recognition
KW - object end-state recognition
UR - http://www.scopus.com/inward/record.url?scp=85206433772&partnerID=8YFLogxK
U2 - 10.1109/CVPRW63382.2024.00126
DO - 10.1109/CVPRW63382.2024.00126
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85206433772
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 1191
EP - 1200
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PB - IEEE
CY - U. S. A.
Y2 - 16 June 2024 through 22 June 2024
ER -