Title
Document processing for automatic knowledge acquisition
Abstract
The knowledge acquisition bottleneck has become the major impediment to the development and application of effective information systems. To remove this bottleneck, new document processing techniques must be introduced to automatically acquire knowledge from various types of documents. By presenting a survey on the techniques and problems involved, this paper aims at serving as a catalyst to stimulate research in automatic knowledge acquisition through document processing. In this study, a document is considered to have two structures: geometric structure and logical structure. These play a key role in the process of the knowledge acquisition, which can be viewed as a process of acquiring the above structures. Extracting the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure is regarded as document understanding. Both areas are described in this paper, and the basic concept of document structure and its measurement based on entropy analysis is introduced. Logical structure and geometric models are proposed. Both top-down and bottom-up approaches and their entropy analyses are presented. Different techniques are discussed with practical examples. Mapping methods, such as tree transformation, document formatting knowledge and document format description language, are described
Year
DOI
Venue
1994
10.1109/69.273022
IEEE Trans. Knowl. Data Eng.
Keywords
Field
DocType
automatic knowledge acquisition,information systems,mapping methods,new document processing technique,visual databases,document format description language,tree transformation,document formatting knowledge,geometric structure,deductive databases,document analysis,knowledge acquisition,entropy analysis,document understanding,top-down approaches,geometric models,knowledge acquisition bottleneck,document processing,document handling,bottom-up approaches,document structure,logical structure,geometric model,text analysis,artificial intelligence,entropy,solid modeling,indexing terms,data engineering,top down,impedance,bottom up,information system
Data mining,Computer science,Document management system,Document clustering,Document Structure Description,Logical data model,Natural language processing,Artificial intelligence,Information retrieval,Document processing,Document layout analysis,Disk formatting,Knowledge acquisition
Journal
Volume
Issue
ISSN
6
1
1041-4347
Citations 
PageRank 
References 
84
129.21
46
Authors
3
Name
Order
Citations
PageRank
Y. Y. Tang1416165.12
C. D. Yan284129.21
Ching Y. Suen375691127.54