Title
Detecting Family Resemblance: Automated Genre Classification
Abstract
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
Year
DOI
Venue
2007
10.2481/dsj.6.S172
Data Science Journal
Keywords
DocType
Volume
automated genre classification,information management,scientific information,information extraction,metadata
Journal
6
Citations 
PageRank 
References 
4
0.53
14
Authors
2
Name
Order
Citations
PageRank
Yunhyong Kim1898.98
Seamus Ross2388.53