Title
Neural Coupled Sequence Labeling for Heterogeneous Annotation Conversion
Abstract
Supervised statistical models rely on large-scale high-quality labeled data, which is important for model training but expensive to construct. Therefore, instead of constructing new dataset, researchers have attempted to make full use of various existing heterogeneous datasets to boost model performance, considering it is ubiquitous that the same task may have multiple annotated data following different and incompatible annotation guidelines. Representative methods include the guide-feature method which use the knowledge projected from the source-side to the target-side as extra features for target model guidance, and the multi-task learning (MTL) method which simultaneously train on multiple heterogeneous annotations with shared parameters to gain resource-share knowledge. Though effective, the guide-feature method fails to directly use the source-side data as training data, and the MTL method ignores the implicit mappings between heterogeneous datasets. Compared with the above methods, directly converting the heterogeneous datasets into homogeneous datasets for target model training is a more straightforward and effective way to fully exploit heterogeneous resources. In this work, we propose a neural coupled sequence labeling model for heterogeneous annotation conversion. First, for each token, we map a given one-side tag into a set of bundled tags by concatenating the tag with all the possible tags at the other side. Then, we build a neural coupled model over the bundled tag space. Finally, we convert heterogeneous annotations into homogeneous annotations by performing constraint decoding on the coupled model. We also propose a pruning strategy to address the oversize issue of the bundled tag space, which improves efficiency without hurting model performance.Experiments for part-of-speech (POS) tagging, word segmentation (WS), and WS&POS tagging tasks show that our proposed neural coupled model consistently outperforms several benchmark models for all the three tasks by large margin.
Year
DOI
Venue
2022
10.1109/TASLP.2022.3165370
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords
DocType
Volume
Annotations, Tagging, Task analysis, Data models, Guidelines, Labeling, Artificial neural networks, Annotation conversion, coupled sequence labeling, heterogeneous annotations, neural network, part-of-speech tagging, word segmentation
Journal
30
ISSN
Citations 
PageRank 
2329-9290
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Chen Gong101.01
Zhenghua Li232528.48
Min Zhang31849157.00