Title
Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects.
Abstract
This paper describes a pilot study in lexical encoding of multi-word expressions (MWEs) in 4 Latin American dialects of Spanish: Costa Rican, Colombian, Mexican and Peruvian. We describe the variability of MWE usage across dialects. We adapt an existing data model to a dialect-aware encoding, so as to represent dialect-related specificities, while avoiding redundancy of the data common for all dialects. A dozen of linguistic properties of MWEs can be expressed in this model, both on the level of a whole MWE and of its individual components. We describe the resulting lexical resource containing several dozens of MWEs in four dialects and we propose a method for constructing a web corpus as a support for crowdsourcing examples of MWE occurrences. The resource is available under an open license and paves the way towards a large-scale dialect-aware language resource construction, which should prove useful in both traditional and novel NLP applications.
Year
Venue
Keywords
2016
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
multi-word expressions,lexical encoding,Spanish dialects
Field
DocType
Citations 
Expression (mathematics),Crowdsourcing,Computer science,Redundancy (engineering),Natural language processing,Artificial intelligence,Data model,License,Encoding (memory)
Conference
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Diana Bogantes100.34
Eric Rodríguez200.34
Alejandro Arauco300.34
Alejandro Rodríguez400.34
Agata Savary59219.55