Title
Unsupervised learning of morphology for building lexicon for a highly inflectional language
Abstract
Words play a crucial role in aspects of natural language understanding such as syntactic and semantic processing. Usually, a natural language understanding system either already knows the words that appear in the text, or is able to automatically learn relevant information about a word upon encountering it. Usually, a capable system---human or machine, knows a subset of the entire vocabulary of a language and morphological rules to determine attributes of words not seen before. Developing a knowledge base of legal words and morphological rules is an important task in computational linguistics. In this paper, we describe initial experiments following an approach based on unsupervised learning of morphology from a text corpus, especially developed for this purpose. It is a method for conveniently creating a dictionary and a morphology rule base, and is, especially suitable for highly inflectional languages like Assamese. Assamese is a major Indian language of the Indic branch of the Indo-European family of languages. It is used by around 15 million people.
Year
DOI
Venue
2002
10.3115/1118647.1118648
SIGMORPHON
Keywords
Field
DocType
unsupervised learning,capable system,morphological rule,text corpus,morphology rule base,major indian language,knowledge base,natural language understanding system,indic branch,natural language,inflectional language
Assamese,Computer science,Computational linguistics,Text corpus,Natural language understanding,Lexicon,Language identification,Artificial intelligence,Natural language processing,Linguistics,Vocabulary,Stop words
Conference
Citations 
PageRank 
References 
11
0.91
2
Authors
3
Name
Order
Citations
PageRank
Utpal Sharma1578.50
Jugal Kalita224921.60
Rajib Das3121.62