In this paper we present a new method for categorizing video sequences capturing different scene classes. This can be seen as a generalization of previous work on scene classification from single images. A scene is represented by a collection of 3D points with an appearance based codeword attached to each point. The cloud of points is recovered by using a robust SFM algorithm applied on the video sequence. A hierarchical structure of histograms located at different locations and at different scales is used to capture the typical spatial distribution of 3D points and codewords in the working volume. The scene is classified by SVM equipped with a histogram matching kernel, similar to [21, 10, 16]. Results on a challenging dataset of 5 scene categories show competitive classification accuracy and superior performance with respect to a state-of-the-art 2D pyramid matching methods  applied to individual image frames.
|Number of pages||8|
|Publication status||Published - Sep 2009|
|Event||ICCV 2009: IEEE 12th International Conference on Computer Vision - Kyoto|
Duration: 29 Sep 2009 → 2 Oct 2009
|Conference||ICCV 2009: IEEE 12th International Conference on Computer Vision|
|Period||29/09/09 → 2/10/09|