Neural Coupled Sequence Labeling for Heterogeneous Annotation Conversion - Citegraph

Paper Info

Title
Neural Coupled Sequence Labeling for Heterogeneous Annotation Conversion

Abstract
Supervised statistical models rely on large-scale high-quality labeled data, which is important for model training but expensive to construct. Therefore, instead of constructing new dataset, researchers have attempted to make full use of various existing heterogeneous datasets to boost model performance, considering it is ubiquitous that the same task may have multiple annotated data following different and incompatible annotation guidelines. Representative methods include the guide-feature method which use the knowledge projected from the source-side to the target-side as extra features for target model guidance, and the multi-task learning (MTL) method which simultaneously train on multiple heterogeneous annotations with shared parameters to gain resource-share knowledge. Though effective, the guide-feature method fails to directly use the source-side data as training data, and the MTL method ignores the implicit mappings between heterogeneous datasets. Compared with the above methods, directly converting the heterogeneous datasets into homogeneous datasets for target model training is a more straightforward and effective way to fully exploit heterogeneous resources. In this work, we propose a neural coupled sequence labeling model for heterogeneous annotation conversion. First, for each token, we map a given one-side tag into a set of bundled tags by concatenating the tag with all the possible tags at the other side. Then, we build a neural coupled model over the bundled tag space. Finally, we convert heterogeneous annotations into homogeneous annotations by performing constraint decoding on the coupled model. We also propose a pruning strategy to address the oversize issue of the bundled tag space, which improves efficiency without hurting model performance.Experiments for part-of-speech (POS) tagging, word segmentation (WS), and WS&POS tagging tasks show that our proposed neural coupled model consistently outperforms several benchmark models for all the three tasks by large margin.

Year	DOI	Venue
2022	10.1109/TASLP.2022.3165370	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords	DocType	Volume
Annotations, Tagging, Task analysis, Data models, Guidelines, Labeling, Artificial neural networks, Annotation conversion, coupled sequence labeling, heterogeneous annotations, neural network, part-of-speech tagging, word segmentation	Journal	30
ISSN	Citations	PageRank
2329-9290	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chen Gong	1	0	1.01
Zhenghua Li	2	325	28.48
Min Zhang	3	1849	157.00

1