Abstract | ||
---|---|---|
In this paper, we discuss the structure based classification of documents based on their logical document structure, i.e., their DOM trees.We describe a method using predefined structural features and also four tree kernels suitable for such structures. We evaluate the methods experimentally on a corpus containing the DOM trees of newspaper articles, and on the well-known SUSANNE corpus. We will demonstrate that, for the two corpora, many text types can be learned based on structural features only. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1007/978-3-540-76928-6_68 | Australian Conference on Artificial Intelligence |
Keywords | Field | DocType |
structural feature,logical document structure,dom tree,text type,newspaper article,well-known susanne corpus,predefined structural feature,document structure | Parse tree,Computer science,Document Structure Description,Text types,Tree kernel,Natural language processing,Artificial intelligence,Kernel (image processing) | Conference |
Volume | ISSN | ISBN |
4830 | 0302-9743 | 3-540-76926-9 |
Citations | PageRank | References |
3 | 0.46 | 5 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peter Geibel | 1 | 286 | 26.62 |
Ulf Krumnack | 2 | 87 | 13.03 |
Olga Pustylnikov | 3 | 14 | 2.48 |
Alexander Mehler | 4 | 186 | 36.63 |
Helmar Gust | 5 | 143 | 22.86 |
Kai-uwe Kühnberger | 6 | 211 | 28.67 |