Abstract | ||
---|---|---|
We propose a method that assists legislative drafters in locating inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on a BERT (Bidirectional Encoder Representations from Transformers) model. We apply three techniques in training the BERT classifier, specifically, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. These techniques cope with two levels of infrequency: legal term-level infrequency that causes class imbalance and legal term set-level infrequency that causes underfitting. Concretely, preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, repetitive soft undersampling improves performance on infrequent legal terms without sacrificing performance on frequent legal terms, and classifier unification improves performance on infrequent legal term sets by sharing common knowledge among legal term sets. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or a language model, and that all three training techniques contribute to performance improvement. |
Year | DOI | Keywords |
---|---|---|
2019 | 10.1109/BigData47090.2019.9006511 | Index Terms-legal term,term correction,Japanese,BERT |
Field | DocType | Citations |
Computer science,Unification,Undersampling,Common knowledge,Artificial intelligence,Encoder,Classifier (linguistics),Random forest,Language model,Machine learning,Performance improvement | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Takahiro Yamakoshi | 1 | 0 | 1.35 |
Takahiro Komamizu | 2 | 12 | 10.01 |
Yasuhiro Ogawa | 3 | 56 | 8.47 |
et al. | 4 | 0 | 0.34 |