In this paper, we examine the problem of internet video categorization. Specifically, we explore the representation of a video as a ldquobag of wordsrdquo using various combinations of spatial and temporal descriptors. The descriptors incorporate both spatial and temporal gradients as well as optical flow information. We achieve state-of-the-art results on a standard human activity recognition database and demonstrate promising category recognition performance on two new databases of approximately 1000 and 1500 online user-submitted videos, which we will be making available to the community.
|Publication status||Published - Jun 2008|
|Event||CVPRW '08: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008 - Anchorage|
Duration: 23 Jun 2008 → 28 Jun 2008
|Conference||CVPRW '08: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2008|
|Period||23/06/08 → 28/06/08|