Title
A Multimodal Approach to Exploit Similarity in Documents.
Abstract
Automated document classification process extracts information with a systematic analysis of the content of documents. This is an active research field of growing importance due to the large amount of electronic documents produced in the world wide web and available thanks to diffused technologies including mobile ones. Several application areas benefit from automated document classification, including document archiving, invoice processing in business environments, press releases and research engines. Current tools classify or "tag" either text or images separately. In this paper we show how, by linking image and text-based contents together, a technology improves fundamental document management tasks like retrieving information from a database or automated documents. We present an investigation of a model of conceptual spaces for investigation using joint information sources from the text and the images forming complex documents. We present a formal model and the computable algorithms and the dataset from which we took a subset to make experiments and relative tests and results.
Year
DOI
Venue
2014
10.1007/978-3-319-07455-9_51
Lecture Notes in Computer Science
Field
DocType
Volume
Document classification,Ontology,Data mining,Information retrieval,Computer science,Document management system,Document clustering,Invoice processing,Document engineering,Exploit,Cluster analysis
Conference
8481
ISSN
Citations 
PageRank 
0302-9743
2
0.36
References 
Authors
12
2
Name
Order
Citations
PageRank
Matteo Cristani125934.75
Claudio Tomazzoli22511.36