Abstract | ||
---|---|---|
We consider the problem of estimating occurrence rates of rare eventsfor extremely sparse data, using pre-existing hierarchies to perform inference at multiple resolutions. In particular, we focus on the problem of estimating click rates for (webpage, advertisement) pairs (called impressions) where both the pages and the ads are classified into hierarchies that capture broad contextual information at different levels of granularity. Typically the click rates are low and the coverage of the hierarchies is sparse. To overcome these difficulties we devise a sampling method whereby we analyze aspecially chosen sample of pages in the training set, and then estimate click rates using a two-stage model. The first stage imputes the number of (webpage, ad) pairs at all resolutions of the hierarchy to adjust for the sampling bias. The second stage estimates clickrates at all resolutions after incorporating correlations among sibling nodes through a tree-structured Markov model. Both models are scalable and suited to large scale data mining applications. On a real-world dataset consisting of 1/2 billion impressions, we demonstrate that even with 95% negative (non-clicked) events in the training set, our method can effectively discriminate extremely rare events in terms of their click propensity. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1145/1281192.1281198 | KDD |
Keywords | Field | DocType |
rare event,estimating rate,rare eventsfor,training set,sparse data,sampling bias,click propensity,multiple resolution,stage estimates clickrates,click rate,sampling method,large scale data mining,internet advertising,tree structure,hierarchy,maximum entropy,sampling methods,markov model,clickthrough rate,imputation | Data mining,Markov model,Computer science,Inference,Sampling bias,Sampling (statistics),Artificial intelligence,Imputation (statistics),Principle of maximum entropy,Machine learning,Rare events,Sparse matrix | Conference |
Citations | PageRank | References |
31 | 3.44 | 4 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Deepak Agarwal | 1 | 1391 | 83.44 |
Andrei Broder | 2 | 7357 | 920.20 |
Deepayan Chakrabarti | 3 | 2624 | 175.06 |
Dejan Diklic | 4 | 45 | 4.95 |
Vanja Josifovski | 5 | 2265 | 148.84 |
Mayssam Sayyadian | 6 | 162 | 12.33 |