Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier - Citegraph

Paper Info

Title
Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier

Abstract
We propose a method that assists legislative drafters in locating inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on a BERT (Bidirectional Encoder Representations from Transformers) model. We apply three techniques in training the BERT classifier, specifically, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. These techniques cope with two levels of infrequency: legal term-level infrequency that causes class imbalance and legal term set-level infrequency that causes underfitting. Concretely, preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, repetitive soft undersampling improves performance on infrequent legal terms without sacrificing performance on frequent legal terms, and classifier unification improves performance on infrequent legal term sets by sharing common knowledge among legal term sets. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or a language model, and that all three training techniques contribute to performance improvement.

Year	DOI	Keywords
2019	10.1109/BigData47090.2019.9006511	Index Terms-legal term,term correction,Japanese,BERT
Field	DocType	Citations
Computer science,Unification,Undersampling,Common knowledge,Artificial intelligence,Encoder,Classifier (linguistics),Random forest,Language model,Machine learning,Performance improvement	Conference	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Takahiro Yamakoshi	1	0	1.35
Takahiro Komamizu	2	12	10.01
Yasuhiro Ogawa	3	56	8.47
et al.	4	0	0.34

1