Title | ||
---|---|---|
A three-step preprocessing algorithm for minimizing e-mail document's atypical characteristics |
Abstract | ||
---|---|---|
Documents that are widely in use today included many atypical characteristics. In particular, non-standardization appears more frequently in e-mail documents than other documents due to the extensive use of informal expressions such as slang and abbreviation. Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. We suggest a three-step preprocessing algorithm by stages for accurate automatic classification for each e-mail category. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1007/11540007_68 | FSKD (2) |
Keywords | Field | DocType |
extensive use,e-mail category,atypical characteristic,e-mail document,accurate automatic classification,informal expression,automatic document classification,three-step preprocessing algorithm | Document classification,Bayesian algorithm,Expression (mathematics),Information retrieval,Computer science,Electronic document,Classifier (linguistics),Standardization,Preprocessing algorithm | Conference |
Volume | ISSN | ISBN |
3614 | 0302-9743 | 3-540-28331-5 |
Citations | PageRank | References |
0 | 0.34 | 4 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ok-Ran Jeong | 1 | 181 | 22.02 |
Dong-Sub Cho | 2 | 20 | 7.56 |