Multimodal Classification of Document Embedded Images. - Citegraph

Paper Info

Title
Multimodal Classification of Document Embedded Images.

Abstract
Images embedded in documents carry extremely rich information that is vital in its content extraction and knowledge construction. Interpreting the information in diagrams, scanned tables and other types of images, enriches the underlying concepts, but requires a classifier that can recognize the huge variability of potential embedded image types and enable their relationship reconstruction. Here we tested different deep learning-based approaches for image classification on a dataset of 32K images extracted from documents and divided in 62 categories for which we obtain accuracy of (sim 85%). We also investigate to what extent textual information improves classification performance when combined with visual features. The textual features were obtained either from text embedded in the images or image captions. Our findings suggest that textual information carry relevant information with respect to the image category and that multimodal classification provides up to 7% better accuracy than single data type classification.

Year	Venue	Field
2017	GREC	Content extraction,Pattern recognition,Textual information,Computer science,Data type,Artificial intelligence,Deep learning,Contextual image classification,Classifier (linguistics)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
9	4

Authors (4 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Matheus P. Viana	1	41	5.03
Quoc-Bao Nguyen	2	45	5.88
John R. Smith	3	4939	487.88
Maria Gabrani	4	2	7.92

1