Title
An Empirical Investigation of Word Class-Based Features for Natural Language Understanding.
Abstract
There are many studies that show using class-based features improves the performance of natural language processing (NLP) tasks such as syntactic part-of-speech tagging, dependency parsing, sentiment analysis, and slot filling in natural language understanding (NLU), but not much has been reported on the underlying reasons for the performance improvements. In this paper, we investigate the effects of the word class-based features for the exponential family of models specifically focusing on NLU tasks, and demonstrate that the performance improvements could be attributed to the regularization effect of the class-based features on the underlying model. Our hypothesis is based on empirical observation that shrinking the sum of parameter magnitudes in an exponential model tends to improve performance. We show on several semantic tagging tasks that there is a positive correlation between the model size reduction by the addition of the class-based features and the model performance on a held-out dataset. We also demonstrate that class-based features extracted from different data sources using alternate word clustering methods can individually contribute to the performance gain. Since the proposed features are generated in an unsupervised manner without significant computational overhead, the improvements in performance largely come for free and we show that such features provide gains for a wide range of tasks from semantic classification and slot tagging in NLU to named entity recognition (NER).
Year
DOI
Venue
2016
10.1109/TASLP.2015.2511925
IEEE/ACM Trans. Audio, Speech & Language Processing
Keywords
Field
DocType
Predictive models,Tagging,Semantics,Feature extraction,Logistics,Data models,Computational modeling
Data modeling,Computer science,Dependency grammar,Natural language understanding,Natural language processing,Artificial intelligence,Cluster analysis,Conditional random field,Pattern recognition,Sentiment analysis,Speech recognition,Named-entity recognition,Machine learning,Semantics
Journal
Volume
Issue
ISSN
24
6
2329-9290
Citations 
PageRank 
References 
3
0.43
24
Authors
4
Name
Order
Citations
PageRank
Asli Çelikyilmaz140739.06
Ruhi Sarikaya269864.49
Minwoo Jeong3422.49
Anoop Deoras424029.36