Title
Estimating Legal Document Structure by Considering Style Information and Table of Contents.
Abstract
Text analytics is used to analyze diverse documents. For example, legal documents (such as contracts, ordinances, regulations, and global standards) must be analyzed for corporations to manage their business risk and meet compliance requirements. However, since documents are often stored or published as documents without a common structure, they need to be preprocessed to analyze them in subsequent text analytics. In particular, the following two forms of preprocessing are useful for text analytics: (1) extracting text, and (2) estimating document structure (such as chapters, sections, and subsections), which is used to define the range of topics or articles in a document. This paper presents a preprocessing method to estimate document structure from documents without a common structure. The proposed method follows rule-based approach, and consists of three algorithms: (1) one is based on style information, such as bold font; (2) another is based on numbered objects, such as sections; and (3) the other is based on a document's Table of Contents, which summarizes the document's structure. The accuracy of the proposed method is also evaluated by using 102 documents. The proposed method was found to be able to estimate document structure with 96.6% accuracy.
Year
DOI
Venue
2016
10.1007/978-3-319-61572-1_18
Lecture Notes in Artificial Intelligence
Keywords
Field
DocType
Document structure,Article extraction,Article comparison,Text analytics,Law articles
Text mining,Business risks,Information retrieval,Computer science,Document Structure Description,Font,Table of contents,Preprocessor
Conference
Volume
ISSN
Citations 
10247
0302-9743
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Yoichi Hatsutori100.34
Katsumasa Yoshikawa200.68
haruki imai301.35