Abstract | ||
---|---|---|
In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets. |
Year | Venue | Field |
---|---|---|
2019 | north american chapter of the association for computational linguistics | Computer science,Artificial intelligence,Natural language processing |
DocType | Volume | Citations |
Journal | abs/1905.09755 | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bora Edizel | 1 | 4 | 1.82 |
Aleksandra Piktus | 2 | 0 | 2.37 |
Piotr Bojanowski | 3 | 848 | 28.36 |
Rui Ferreira | 4 | 0 | 0.34 |
Grave, Edouard | 5 | 860 | 33.43 |
Fabrizio Silvestri | 6 | 1819 | 107.29 |