Title
Multimodal Document Image Classification
Abstract
State-of-the-art methods for document image classification rely on visual features extracted by deep convolutional neural networks (CNNs). These methods do not utilize rich semantic information present in the text of the document, which can be extracted using Optical Character Recognition (OCR). We first study the performance of state-of-the-art text classification approaches when applied to noisy text obtained from OCR. We then show that fusing this textual information with visual CNN methods produces state-of-the-art results on the RVL-CDIP classification dataset.
Year
DOI
Venue
2019
10.1109/ICDAR.2019.00021
2019 International Conference on Document Analysis and Recognition (ICDAR)
Keywords
Field
DocType
Classification,Document Image,Multimodal
Pattern recognition,Computer science,Convolutional neural network,Textual information,Noisy text,Optical character recognition,Semantic information,Artificial intelligence,Contextual image classification
Conference
ISSN
ISBN
Citations 
1520-5363
978-1-7281-3015-6
1
PageRank 
References 
Authors
0.35
0
2
Name
Order
Citations
PageRank
Rajiv Jain145.16
Curtis Wigington263.17