Title
Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier
Abstract
We propose a method that assists legislative drafters in locating inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on a BERT (Bidirectional Encoder Representations from Transformers) model. We apply three techniques in training the BERT classifier, specifically, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. These techniques cope with two levels of infrequency: legal term-level infrequency that causes class imbalance and legal term set-level infrequency that causes underfitting. Concretely, preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, repetitive soft undersampling improves performance on infrequent legal terms without sacrificing performance on frequent legal terms, and classifier unification improves performance on infrequent legal term sets by sharing common knowledge among legal term sets. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or a language model, and that all three training techniques contribute to performance improvement.
Year
DOI
Keywords
2019
10.1109/BigData47090.2019.9006511
Index Terms-legal term,term correction,Japanese,BERT
Field
DocType
Citations 
Computer science,Unification,Undersampling,Common knowledge,Artificial intelligence,Encoder,Classifier (linguistics),Random forest,Language model,Machine learning,Performance improvement
Conference
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Takahiro Yamakoshi101.35
Takahiro Komamizu21210.01
Yasuhiro Ogawa3568.47
et al.400.34