Abstract | ||
---|---|---|
This paper proposes and evaluates the use of linguistic information in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Support Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de São Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong reduction of the number of features needed in the text classification. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1007/11671299_57 | CICLing |
Keywords | Field | DocType |
analysing part-of-speech,different measure,portuguese language,pre-processing phase,different datasets,text classification,portuguese attorney general,portuguese text classification,linguistic knowledge,text classification task,part-of-speech information,linguistic information,part of speech,support vector machine | Rule-based machine translation,Content analysis,Information processing,Computer science,Support vector machine,Portuguese,Part of speech,Natural language,Natural language processing,Artificial intelligence,Classifier (linguistics) | Conference |
Volume | ISSN | ISBN |
3878 | 0302-9743 | 3-540-32205-1 |
Citations | PageRank | References |
1 | 0.36 | 9 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Teresa Gonçalves | 1 | 37 | 16.42 |
Cassiana Silva | 2 | 1 | 0.36 |
Paulo Quaresma | 3 | 415 | 60.46 |
Renata Vieira | 4 | 82 | 11.44 |