Title
Reference Extraction from Vietnamese Legal Documents
Abstract
Legal and regulatory texts are ubiquitous and important in our life. Automated processing of such documents using natural language processing and information retrieval techniques is desired. Many legal text processing problems require information extraction as a base component. In this paper, we address the task of extracting references from law and regulatory documents, which are necessary for recognition of the relations between documents and document parts, and other problems. We formulate the task as a sequence labeling problem and introduce several extraction models, consisting of both traditional (conditional random fields) and more advanced (deep neural networks) methods. In addition to features learned by deep networks, we investigate various types of manually engineered features that reflect the characteristics of legal documents. Our best model that combines bidirectional long short-term memory networks and conditional random fields achieves 95.35% in the F1 score on a corpus consisting of more than 11 thousand sentences from Vietnamese law and regulatory documents.
Year
DOI
Venue
2019
10.1145/3368926.3369731
Proceedings of the Tenth International Symposium on Information and Communication Technology
Keywords
Field
DocType
Bidirectional Long Short-Term Memory Networks, Conditional Random Fields, Legal Text, Reference Extraction
Conditional random field,F1 score,Sequence labeling,Computer science,Information extraction,Natural language processing,Artificial intelligence,Vietnamese,Deep neural networks,Text processing
Conference
ISBN
Citations 
PageRank 
978-1-4503-7245-9
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Ngo Xuan Bach100.34
Nguyen Thi Thanh Thuy200.34
Dang Bao Chien300.34
Trieu Khuong Duy400.34
To Minh Hien500.34
Tu Minh Phuong613719.47