Title
Analysing part-of-speech for portuguese text classification
Abstract
This paper proposes and evaluates the use of linguistic information in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Support Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de São Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong reduction of the number of features needed in the text classification.
Year
DOI
Venue
2006
10.1007/11671299_57
CICLing
Keywords
Field
DocType
analysing part-of-speech,different measure,portuguese language,pre-processing phase,different datasets,text classification,portuguese attorney general,portuguese text classification,linguistic knowledge,text classification task,part-of-speech information,linguistic information,part of speech,support vector machine
Rule-based machine translation,Content analysis,Information processing,Computer science,Support vector machine,Portuguese,Part of speech,Natural language,Natural language processing,Artificial intelligence,Classifier (linguistics)
Conference
Volume
ISSN
ISBN
3878
0302-9743
3-540-32205-1
Citations 
PageRank 
References 
1
0.36
9
Authors
4
Name
Order
Citations
PageRank
Teresa Gonçalves13716.42
Cassiana Silva210.36
Paulo Quaresma341560.46
Renata Vieira48211.44