Title | ||
---|---|---|
Unsupervised learning of morphology for building lexicon for a highly inflectional language |
Abstract | ||
---|---|---|
Words play a crucial role in aspects of natural language understanding such as syntactic and semantic processing. Usually, a natural language understanding system either already knows the words that appear in the text, or is able to automatically learn relevant information about a word upon encountering it. Usually, a capable system---human or machine, knows a subset of the entire vocabulary of a language and morphological rules to determine attributes of words not seen before. Developing a knowledge base of legal words and morphological rules is an important task in computational linguistics. In this paper, we describe initial experiments following an approach based on unsupervised learning of morphology from a text corpus, especially developed for this purpose. It is a method for conveniently creating a dictionary and a morphology rule base, and is, especially suitable for highly inflectional languages like Assamese. Assamese is a major Indian language of the Indic branch of the Indo-European family of languages. It is used by around 15 million people. |
Year | DOI | Venue |
---|---|---|
2002 | 10.3115/1118647.1118648 | SIGMORPHON |
Keywords | Field | DocType |
unsupervised learning,capable system,morphological rule,text corpus,morphology rule base,major indian language,knowledge base,natural language understanding system,indic branch,natural language,inflectional language | Assamese,Computer science,Computational linguistics,Text corpus,Natural language understanding,Lexicon,Language identification,Artificial intelligence,Natural language processing,Linguistics,Vocabulary,Stop words | Conference |
Citations | PageRank | References |
11 | 0.91 | 2 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Utpal Sharma | 1 | 57 | 8.50 |
Jugal Kalita | 2 | 249 | 21.60 |
Rajib Das | 3 | 12 | 1.62 |