Title
Authorship classification: a discriminative syntactic tree mining approach
Abstract
In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an author's writing style. However, due to the high computational complexity for extracting and computing syntactic features, only simple variations of basic syntactic features such as function words, POS(Part of Speech) tags, and rewrite rules were considered. In this paper, we propose a new feature set of k-embedded-edge subtree patterns that holds more syntactic information than previous feature sets. We also propose a novel approach to directly mining them from a given set of syntactic trees. We show that this approach reduces the computational burden of using complex syntactic structures as the feature set. Comprehensive experiments on real-world datasets demonstrate that our approach is reliable and more accurate than previous studies.
Year
DOI
Venue
2011
10.1145/2009916.2009979
SIGIR
Keywords
Field
DocType
authorship classification,syntactic tree,novel approach,syntactic information,previous feature set,feature set,writing style,new feature,syntactic feature,discriminative syntactic tree mining,complex syntactic structure,basic syntactic,computational complexity,text mining
Data mining,Computer science,Part of speech,Feature set,Artificial intelligence,Natural language processing,Syntax,Discriminative model,Tree mining,Information retrieval,Writing style,Tree (data structure),Computational complexity theory
Conference
Citations 
PageRank 
References 
13
0.59
27
Authors
5
Name
Order
Citations
PageRank
Sangkyum Kim117810.54
Hyungsul Kim218613.18
Tim Weninger357646.14
Jiawei Han4430853824.48
Hyun Duk Kim51578.05