The Class Imbalance Problem in Construction of Training Datasets for Authorship Attribution. - Citegraph

Paper Info

Title
The Class Imbalance Problem in Construction of Training Datasets for Authorship Attribution.

Abstract
The paper presents research on class imbalance in the context of construction of training sets for authorship recognition. In experiments the sets are artificially imbalanced, then balanced by under-sampling and over-sampling. The prepared sets are used in learning of two predictors: connectionist and rule-based, and their performance observed. The tests show that for artificial neural networks in several cases the predictive accuracy is not degraded but in fact improved, while one rule classifier is highly sensitive to class balance as it never performs better than for the original balanced set and in many cases worse.

Year	DOI	Venue
2015	10.1007/978-3-319-23437-3_46	MAN-MACHINE INTERACTIONS 4, ICMMI 2015
Keywords	Field	DocType
Class imbalance,Sampling strategy,Authorship attribution	Balanced set,Psychology,Attribution,Artificial intelligence,Classifier (linguistics),Artificial neural network,Machine learning,Connectionism	Conference
Volume	ISSN	Citations
391	2194-5357	3
PageRank	References	Authors
0.41	9	1

Authors (1 rows)

Cited by (3 rows)

References (9 rows)

Name	Order	Citations	PageRank
Urszula Stanczyk	1	19	3.75

1