Title
Towards Structure-sensitive Hypertext Categorization
Abstract
Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function learning based on the bag-of-features approach. This scenario faces the problem of a many-to-many relation between websites and their hidden logical document structure. The paper argues that this relation is a prevalent characteristic which interferes any effort of applying the classical apparatus of categorization to web genres. This is confirmed by a threefold experiment in hypertext categorization. In order to outline a solution to this problem, the paper sketches an alternative method of unsupervised learning which aims at bridging the gap between statistical and structural pattern recognition (Bunke et al. 2001) in the area of web mining.
Year
DOI
Venue
2005
10.1007/3-540-31314-1_49
Studies in Classification Data Analysis and Knowledge Organization
Keywords
Field
DocType
pattern recognition,web mining,unsupervised learning
Hypertext,Categorization,Text mining,Web mining,Information retrieval,Computer science,Bridging (networking),Document Structure Description,Function learning,Unsupervised learning
Conference
ISSN
Citations 
PageRank 
1431-8814
5
0.43
References 
Authors
11
3
Name
Order
Citations
PageRank
Alexander Mehler118636.63
Rüdiger Gleim2396.27
Matthias Dehmer3863104.05