Abstract | ||
---|---|---|
Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/DICTA47822.2019.8946036 | 2019 Digital Image Computing: Techniques and Applications (DICTA) |
Keywords | Field | DocType |
dynamic scene recognition,feature aggregation,deep neural networks,video classification | Computer vision,Set cover problem,Pattern recognition,Computer science,Artificial intelligence,Concatenation,Feature aggregation,Discriminative model,Deep neural networks | Conference |
ISBN | Citations | PageRank |
978-1-7281-3858-9 | 0 | 0.34 |
References | Authors | |
24 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiaoming Peng | 1 | 95 | 20.72 |
Abdesselam Bouzerdoum | 2 | 883 | 89.51 |