Title
Variants of tree kernels for XML documents
Abstract
In this paper, we discuss tree kernels that can be applied for the classification of XML documents based on their DOM trees. DOM trees are ordered trees, in which every node might be labeled by a vector of attributes including its XML tag and the textual content. We describe four new kernels suitable for this kind of trees: a tree kernel derived from the well-known parse tree kernel, the set tree kernel that allows permutations of children, the string tree kernel being an extension of the so-called partial tree kernel, and the soft tree kernel, which is based on the set tree kernel and takes into a account a "fuzzy" comparison of child positions. We present first results on an artificial data set, a corpus of newspaper articles, for which we want to determine the type (genre) of an article based on its structure alone, and the well-known SUSANNE corpus.
Year
DOI
Venue
2007
10.1007/978-3-540-76631-5_81
MICAI
Keywords
Field
DocType
string tree kernel,set tree kernel,dom tree,soft tree kernel,xml tag,so-called partial tree kernel,tree kernel,new kernel,xml document,well-known parse tree kernel
Tree traversal,Pattern recognition,Computer science,K-ary tree,Binary tree,Tree kernel,Artificial intelligence,Tree structure,String kernel,Search tree,Interval tree
Conference
Volume
ISSN
ISBN
4827
0302-9743
3-540-76630-8
Citations 
PageRank 
References 
0
0.34
12
Authors
3
Name
Order
Citations
PageRank
Peter Geibel128626.62
Helmar Gust214322.86
Kai-uwe Kühnberger321128.67