Title
Multimodal Classification of Document Embedded Images.
Abstract
Images embedded in documents carry extremely rich information that is vital in its content extraction and knowledge construction. Interpreting the information in diagrams, scanned tables and other types of images, enriches the underlying concepts, but requires a classifier that can recognize the huge variability of potential embedded image types and enable their relationship reconstruction. Here we tested different deep learning-based approaches for image classification on a dataset of 32K images extracted from documents and divided in 62 categories for which we obtain accuracy of (sim 85%). We also investigate to what extent textual information improves classification performance when combined with visual features. The textual features were obtained either from text embedded in the images or image captions. Our findings suggest that textual information carry relevant information with respect to the image category and that multimodal classification provides up to 7% better accuracy than single data type classification.
Year
Venue
Field
2017
GREC
Content extraction,Pattern recognition,Textual information,Computer science,Data type,Artificial intelligence,Deep learning,Contextual image classification,Classifier (linguistics)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
9
4
Name
Order
Citations
PageRank
Matheus P. Viana1415.03
Quoc-Bao Nguyen2455.88
John R. Smith34939487.88
Maria Gabrani427.92