Title
Towards Unsupervised Extraction of Verb Paradigms from Large Corpora
Abstract
A verb paradigm is a set of inflectional cate- gories for a single verb lemma. To obtain verb paradigms we extracted left and right bigrams for the 400 most frequent verbs from over 100 million words of text, calculated the Kullback Leibler distance for each pair of verbs for left and right contexts separately, and ran a hier- archical clustering algorithm for each context. Our new method for finding unsupervised cut points in the cluster trees produced results that compared favorably with results obtained using supervised methods, such as gain ratio, a re- vised gain ratio and number of correctly classi- fied items. Left context clusters correspond to inflectional categories, and right context clus- ters correspond to verb lemmas. For our test data, 91.5% of the verbs are correctly classi- fied for inflectional category, 74.7% are correctly classified for lemma, and the correct joint classi- fication for lemma and inflectional category was obtained for 67.5% of the verbs. These results are derived only from distributional information without use of morphological information.
Year
Venue
Field
1998
VLC@COLING/ACL
Verb,Computer science,Artificial intelligence,Natural language processing
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
3
3
Name
Order
Citations
PageRank
Cornelia H. Parkes100.34
Alexander M. Malek200.34
Mitchell P. Marcus33098854.76