Abstract | ||
---|---|---|
This paper presents a novel two-stream approach for document image classification. The proposed approach leverages textual and visual modalities to classify document images into ten categories, including letter, memo, news article, etc. In order to alleviate dependency of textual stream on performance of underlying OCR (which is the case with general content based document image classifiers), we utilize a filter based feature-ranking algorithm. This algorithm ranks the features of each class based on their ability to discriminate document images and selects a set of top u0027Ku0027 features that are retained for further processing. In parallel, the visual stream uses deep CNN models to extract structural features of document images.Finally, textual and visual streams are concatenated together using an average ensembling method. Experimental results reveal that the proposed approach outperforms the state-of-the-art system with a significant margin of 4.5% on publicly available Tobacco-3482 dataset. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICDAR.2019.00227 | ICDAR |
Field | DocType | Citations |
Modalities,Pattern recognition,Computer science,Concatenation,Artificial intelligence,Contextual image classification | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Muhammad Nabeel Asim | 1 | 0 | 1.35 |
Muhammad Usman Ghani Khan | 2 | 0 | 0.34 |
Muhammad Imran Malk | 3 | 158 | 15.97 |
Khizar Razzaque | 4 | 0 | 0.34 |
Andreas Dengel | 5 | 1926 | 280.42 |
Sheraz Ahmed | 6 | 105 | 28.32 |