Abstract | ||
---|---|---|
Legal and regulatory texts are ubiquitous and important in our life. Automated processing of such documents using natural language processing and information retrieval techniques is desired. Many legal text processing problems require information extraction as a base component. In this paper, we address the task of extracting references from law and regulatory documents, which are necessary for recognition of the relations between documents and document parts, and other problems. We formulate the task as a sequence labeling problem and introduce several extraction models, consisting of both traditional (conditional random fields) and more advanced (deep neural networks) methods. In addition to features learned by deep networks, we investigate various types of manually engineered features that reflect the characteristics of legal documents. Our best model that combines bidirectional long short-term memory networks and conditional random fields achieves 95.35% in the F1 score on a corpus consisting of more than 11 thousand sentences from Vietnamese law and regulatory documents.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3368926.3369731 | Proceedings of the Tenth International Symposium on Information and Communication Technology |
Keywords | Field | DocType |
Bidirectional Long Short-Term Memory Networks, Conditional Random Fields, Legal Text, Reference Extraction | Conditional random field,F1 score,Sequence labeling,Computer science,Information extraction,Natural language processing,Artificial intelligence,Vietnamese,Deep neural networks,Text processing | Conference |
ISBN | Citations | PageRank |
978-1-4503-7245-9 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ngo Xuan Bach | 1 | 0 | 0.34 |
Nguyen Thi Thanh Thuy | 2 | 0 | 0.34 |
Dang Bao Chien | 3 | 0 | 0.34 |
Trieu Khuong Duy | 4 | 0 | 0.34 |
To Minh Hien | 5 | 0 | 0.34 |
Tu Minh Phuong | 6 | 137 | 19.47 |