Title | ||
---|---|---|
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French |
Abstract | ||
---|---|---|
This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information about Arabic affixes and morphotactics into a PCFG-LA parser and obtain state-of-the-art accuracy. We also show that these morphological clues can be learnt automatically from an annotated corpus. |
Year | Venue | Keywords |
---|---|---|
2010 | SPMRL@NAACL-HLT | arabic affix,morphological expressiveness,morphological clue,exhibit different level,annotated corpus,statistical latent-variable,pcfg-la parser,state-of-the-art accuracy,complex morphological clue,unknown word,language-independent technique |
Field | DocType | Citations |
Word lists by frequency,Arabic,Computer science,Latent variable,Speech recognition,Natural language processing,Artificial intelligence,Parsing,Expressivity | Conference | 30 |
PageRank | References | Authors |
0.97 | 14 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohammed Attia | 1 | 146 | 16.51 |
jennifer foster | 2 | 454 | 38.25 |
Deirdre Hogan | 3 | 183 | 11.18 |
Joseph Le Roux | 4 | 175 | 16.34 |
Lamia Tounsi | 5 | 167 | 10.46 |
Josef van Genabith | 6 | 1037 | 105.64 |