Estimating Legal Document Structure by Considering Style Information and Table of Contents. - Citegraph

Paper Info

Title
Estimating Legal Document Structure by Considering Style Information and Table of Contents.

Abstract
Text analytics is used to analyze diverse documents. For example, legal documents (such as contracts, ordinances, regulations, and global standards) must be analyzed for corporations to manage their business risk and meet compliance requirements. However, since documents are often stored or published as documents without a common structure, they need to be preprocessed to analyze them in subsequent text analytics. In particular, the following two forms of preprocessing are useful for text analytics: (1) extracting text, and (2) estimating document structure (such as chapters, sections, and subsections), which is used to define the range of topics or articles in a document. This paper presents a preprocessing method to estimate document structure from documents without a common structure. The proposed method follows rule-based approach, and consists of three algorithms: (1) one is based on style information, such as bold font; (2) another is based on numbered objects, such as sections; and (3) the other is based on a document's Table of Contents, which summarizes the document's structure. The accuracy of the proposed method is also evaluated by using 102 documents. The proposed method was found to be able to estimate document structure with 96.6% accuracy.

Year	DOI	Venue
2016	10.1007/978-3-319-61572-1_18	Lecture Notes in Artificial Intelligence
Keywords	Field	DocType
Document structure,Article extraction,Article comparison,Text analytics,Law articles	Text mining,Business risks,Information retrieval,Computer science,Document Structure Description,Font,Table of contents,Preprocessor	Conference
Volume	ISSN	Citations
10247	0302-9743	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yoichi Hatsutori	1	0	0.34
Katsumasa Yoshikawa	2	0	0.68
haruki imai	3	0	1.35

1