Multimodal Document Image Classification - Citegraph

Paper Info

Title
Multimodal Document Image Classification

Abstract
State-of-the-art methods for document image classification rely on visual features extracted by deep convolutional neural networks (CNNs). These methods do not utilize rich semantic information present in the text of the document, which can be extracted using Optical Character Recognition (OCR). We first study the performance of state-of-the-art text classification approaches when applied to noisy text obtained from OCR. We then show that fusing this textual information with visual CNN methods produces state-of-the-art results on the RVL-CDIP classification dataset.

Year	DOI	Venue
2019	10.1109/ICDAR.2019.00021	2019 International Conference on Document Analysis and Recognition (ICDAR)
Keywords	Field	DocType
Classification,Document Image,Multimodal	Pattern recognition,Computer science,Convolutional neural network,Textual information,Noisy text,Optical character recognition,Semantic information,Artificial intelligence,Contextual image classification	Conference
ISSN	ISBN	Citations
1520-5363	978-1-7281-3015-6	1
PageRank	References	Authors
0.35	0	2

Authors (2 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Rajiv Jain	1	4	5.16
Curtis Wigington	2	6	3.17

1