Text Categorization for Vietnamese Documents - Citegraph

Paper Info

Title
Text Categorization for Vietnamese Documents

Abstract
Many machine learning methods have been proposed for text categorization, but most research has applied them to English documents. Vietnamese is a different language with different features and it is not clear whether the standard methods will work on the categorization of Vietnamese documents. This paper describes morphological level document representations that are appropriate for Vietnamese text documents and investigates the effectiveness of several standard learning algorithms including Naïve Bayes, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) with four different kernel functions. The results show that it is possible to build effective and efficient classifiers for Vietnamese text categorization using our representations and the standard algorithms, and demonstrate that the performance can be improved by using infogain for feature selection and using an external dictionary for filtering the vocabulary.

Year	DOI	Venue
2009	10.1109/WI-IAT.2009.327	Web Intelligence/IAT Workshops
Keywords	Field	DocType
different language,english document,standard method,vietnamese documents,different kernel function,text categorization,vietnamese text document,standard algorithm,vietnamese text categorization,classification,vietnamese document,machine learning,different feature,vietnamese language processing,support vector machines,natural languages,feature selection,kernel,kernel function,support vector machine,dictionaries	Categorization,Standard algorithms,Naive Bayes classifier,Information retrieval,Feature selection,Computer science,Support vector machine,Natural language,Artificial intelligence,Natural language processing,Vietnamese,Vocabulary	Conference
Volume	ISBN	Citations
3	978-1-4244-5331-3	2
PageRank	References	Authors
0.43	3	3

Authors (3 rows)

Cited by (2 rows)

References (3 rows)

Name	Order	Citations	PageRank
Giang-Son Nguyen	1	6	1.28
Xiaoying Gao	2	220	32.95
Peter Andreae	3	358	31.85

1