Title
A three-step preprocessing algorithm for minimizing e-mail document's atypical characteristics
Abstract
Documents that are widely in use today included many atypical characteristics. In particular, non-standardization appears more frequently in e-mail documents than other documents due to the extensive use of informal expressions such as slang and abbreviation. Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. We suggest a three-step preprocessing algorithm by stages for accurate automatic classification for each e-mail category. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics.
Year
DOI
Venue
2005
10.1007/11540007_68
FSKD (2)
Keywords
Field
DocType
extensive use,e-mail category,atypical characteristic,e-mail document,accurate automatic classification,informal expression,automatic document classification,three-step preprocessing algorithm
Document classification,Bayesian algorithm,Expression (mathematics),Information retrieval,Computer science,Electronic document,Classifier (linguistics),Standardization,Preprocessing algorithm
Conference
Volume
ISSN
ISBN
3614
0302-9743
3-540-28331-5
Citations 
PageRank 
References 
0
0.34
4
Authors
2
Name
Order
Citations
PageRank
Ok-Ran Jeong118122.02
Dong-Sub Cho2207.56