Abstract | ||
---|---|---|
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an "universal" generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets--PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K--with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/s11263-013-0636-x | International Journal of Computer Vision |
Keywords | Field | DocType |
Image classification,Large-scale classification,Bag-of-Visual words,Fisher vector,Fisher kernel,Product quantization | ENCODE,Finite set,Pattern recognition,Bag-of-words model in computer vision,Computer science,Artificial intelligence,Quantization (signal processing),Contextual image classification,Fisher kernel,Machine learning,Mixture model,Encoding (memory) | Journal |
Volume | Issue | ISSN |
105 | 3 | 0920-5691 |
Citations | PageRank | References |
561 | 12.56 | 56 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jorge Sánchez | 1 | 2790 | 149.07 |
Florent Perronnin | 2 | 5448 | 291.48 |
Thomas Mensink | 3 | 2354 | 116.33 |
J. J. Verbeek | 4 | 3944 | 181.44 |