Title
Structure-sensitive learning of text types
Abstract
In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees.We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only.
Year
DOI
Venue
2007
10.1007/978-3-540-76928-6_68
Australian Conference on Artificial Intelligence
Keywords
Field
DocType
structural feature,logical document structure,dom tree,text type,newspaper article,well-known susanne corpus,predefined structural feature,document structure
Parse tree,Computer science,Document Structure Description,Text types,Tree kernel,Natural language processing,Artificial intelligence,Kernel (image processing)
Conference
Volume
ISSN
ISBN
4830
0302-9743
3-540-76926-9
Citations 
PageRank 
References 
3
0.46
5
Authors
6
Name
Order
Citations
PageRank
Peter Geibel128626.62
Ulf Krumnack28713.03
Olga Pustylnikov3142.48
Alexander Mehler418636.63
Helmar Gust514322.86
Kai-uwe Kühnberger621128.67