Title
Joint Prediction of Topics in a URL Hierarchy.
Abstract
We study the problem of jointly predicting topics for all web pages within URL hierarchies. We employ a graphical model in which latent variables represent the predominant topic within a subtree of the URL hierarchy. The model is built around a generative process that infers how web site administrators hierarchically structure web site according to topic, and how web page content is generated depending on the page topic. The resulting predictive model is linear in a joint feature map of content, topic labels, and the latent variables. Inference reduces to message passing in a tree-structured graph; parameter estimation is carried out using concave-convex optimization. We present a case study on web page classification for a targeted advertising application.
Year
DOI
Venue
2014
10.1007/978-3-662-44848-9_33
ECML/PKDD (1)
Field
DocType
Citations 
Information retrieval,Web page,Inference,Computer science,Tree (data structure),Latent variable,Rewrite engine,Graphical model,Hierarchy,Message passing
Conference
2
PageRank 
References 
Authors
0.39
14
4
Name
Order
Citations
PageRank
Michael Groβhans120.39
Christoph Sawade2556.21
Tobias Scheffer31862139.64
Niels Landwehr450631.54