Title
MSC+: Language pattern learning for word sense induction and disambiguation
Abstract
Identifying the correct meaning of words in context or discovering new word senses is particularly useful for several tasks such as question answering, information extraction, information retrieval, and text summarization. However, specially in the context of user-generated contents and on-line communication (e.g. Twitter), new meanings are continuously crafted by speakers as the result of existing words being used in novel contexts. Consequently, lexical semantics inventories and systems have difficulties to cope with semantic drifting problems. In this work, we propose an approach to induce and disambiguate word senses of some target words in collections of short texts, such as tweets, through the use of fuzzy lexico-semantic patterns that we define as sequences of Morpho-semantic Components (MSC). We learn these patterns, that we call MSC+ patterns, from text data automatically. Experimental results show that instances of some MSC+ patterns arise in a number of tweets, but sometimes using different words to convey the sense of the respective MSC in some tweets where pattern instances appear. The exploitation of MSC+ patterns when they induce semantics on target words enable effective word sense disambiguation mechanisms leading to improvements in the state of the art.
Year
DOI
Venue
2020
10.1016/j.knosys.2019.105017
Knowledge-Based Systems
Keywords
Field
DocType
Lexical semantics,Information extraction,Linguistic pattern mining,Word sense induction,Word sense disambiguation
Automatic summarization,Pattern learning,Question answering,Word-sense induction,Lexical semantics,Computer science,Fuzzy logic,Information extraction,Natural language processing,Artificial intelligence,Machine learning,Semantics
Journal
Volume
ISSN
Citations 
188
0950-7051
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Fábio Bif Goularte100.34
Danielly Sorato201.35
Silvia Nassar386.52
Renato Fileto415120.64
Horacio Saggion51119112.62