Analysing part-of-speech for portuguese text classification - Citegraph

Paper Info

Title
Analysing part-of-speech for portuguese text classification

Abstract
This paper proposes and evaluates the use of linguistic information in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Support Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de São Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong reduction of the number of features needed in the text classification.

Year	DOI	Venue
2006	10.1007/11671299_57	CICLing
Keywords	Field	DocType
analysing part-of-speech,different measure,portuguese language,pre-processing phase,different datasets,text classification,portuguese attorney general,portuguese text classification,linguistic knowledge,text classification task,part-of-speech information,linguistic information,part of speech,support vector machine	Rule-based machine translation,Content analysis,Information processing,Computer science,Support vector machine,Portuguese,Part of speech,Natural language,Natural language processing,Artificial intelligence,Classifier (linguistics)	Conference
Volume	ISSN	ISBN
3878	0302-9743	3-540-32205-1
Citations	PageRank	References
1	0.36	9
Authors
4

Authors (4 rows)

Cited by (1 rows)

References (9 rows)

Name	Order	Citations	PageRank
Teresa Gonçalves	1	37	16.42
Cassiana Silva	2	1	0.36
Paulo Quaresma	3	415	60.46
Renata Vieira	4	82	11.44

1