Abstract | ||
---|---|---|
The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part of speech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser. |
Year | Venue | Keywords |
---|---|---|
1990 | AAAI | recursive unlabeled bracketing,low error rate,sample output,constituent boundary,mutual information,parsing algorithm,mutual information value,natural language,traditional grammar-based parsing method,information-theoretic measure,generalized mutual information statistic |
Field | DocType | ISBN |
Top-down parsing language,Top-down parsing,S-attributed grammar,Computer science,Speech recognition,Bottom-up parsing,Parsing expression grammar,Artificial intelligence,Natural language processing,Parsing,Parser combinator,Pointwise mutual information | Conference | 0-262-51057-X |
Citations | PageRank | References |
52 | 97.13 | 3 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
David M. Magerman | 1 | 726 | 512.15 |
Mitchell P. Marcus | 2 | 3098 | 854.76 |