Abstract | ||
---|---|---|
We study the problem of jointly predicting topics for all web pages within URL hierarchies. We employ a graphical model in which latent variables represent the predominant topic within a subtree of the URL hierarchy. The model is built around a generative process that infers how web site administrators hierarchically structure web site according to topic, and how web page content is generated depending on the page topic. The resulting predictive model is linear in a joint feature map of content, topic labels, and the latent variables. Inference reduces to message passing in a tree-structured graph; parameter estimation is carried out using concave-convex optimization. We present a case study on web page classification for a targeted advertising application. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-662-44848-9_33 | ECML/PKDD (1) |
Field | DocType | Citations |
Information retrieval,Web page,Inference,Computer science,Tree (data structure),Latent variable,Rewrite engine,Graphical model,Hierarchy,Message passing | Conference | 2 |
PageRank | References | Authors |
0.39 | 14 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michael Groβhans | 1 | 2 | 0.39 |
Christoph Sawade | 2 | 55 | 6.21 |
Tobias Scheffer | 3 | 1862 | 139.64 |
Niels Landwehr | 4 | 506 | 31.54 |