Title | ||
---|---|---|
Using Mutual Information to Identify New Features for Text documents of Various Domains |
Abstract | ||
---|---|---|
The task of identifying proper names, unknown words and new terms, is an important step in text processing systems. This paper describes a method of using mutual information to collect possible segments as candidates of these three feature types in a document scope. Then the construction and context of each possible feature is examined to determine its type, canonical form and meaning. Adding very little domain-specific knowledge, this method adapts to various domains easily. |
Year | Venue | Field |
---|---|---|
2003 | PACLIC | Information retrieval,Computer science,Canonical form,Mutual information,Proper noun,Text processing |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
1 | 1 |