Title
Word Image Representation Based On Visual Embeddings And Spatial Constraints For Keyword Spotting On Historical Documents
Abstract
This paper proposed a visual embeddings approach to capturing semantic relatedness between visual words. To be specific, visual words are extracted and collected from a word image collection under the Bag-of-Visual-Words framework. And then, a deep learning procedure is used for mapping visual words into embedding vectors in a semantic space. To integrate spatial constraints into the representation of word images, one word image is segmented into several sub-regions with equal size along rows and columns. After that, each sub-region can be represented as an average of embedding vectors, which is the centroid of the embedding vectors of all visual words within the same sub-region. By this way, one word image can be converted into a fixed-length vector by concatenating the corresponding average embedding vectors from its all sub-regions. Euclidean distance can be calculated to measure similarity between word images. Experimental results demonstrate that the proposed representation approach outperforms Bag-of-Visual-Words, visual language model, spatial pyramid matching, latent Dirichlet allocation, average visual word embeddings and recurrent neural network.
Year
DOI
Venue
2018
10.1109/ICPR.2018.8545573
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)
Keywords
Field
DocType
visual word, visual embeddings, spatial constraints, word image representation, query-by-example
Semantic similarity,Visual language,Latent Dirichlet allocation,Embedding,Pattern recognition,Computer science,Euclidean distance,Image segmentation,Keyword spotting,Artificial intelligence,Visual Word
Conference
ISSN
Citations 
PageRank 
1051-4651
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Hongxi Wei1355.71
Hui Zhang2136.39
Guanglai Gao37824.57