How to build high quality L2R training data: Unsupervised compression-based selective sampling for learning to rank - Citegraph

Paper Info

Title
How to build high quality L2R training data: Unsupervised compression-based selective sampling for learning to rank

Abstract
Learning to Rank (L2R) improves ranking quality but relies on the existence of manually labeled training sets, which are expensive and cumbersome to generate. Using automated labeling (e.g., clickthrough data) imposes its own challenges. Active learning (AL) can be used to gather high-quality training data by producing very informative yet small training sets. Cover, a method we have previously developed, allows for unsupervised sampling of training sets as good as those created using AL. In this paper we provide an extensive analysis of how and why Cover works. We revisit the method in a more formal way, with theorems and proofs, and provide additional empirical evidence of its practicality. We answer questions related to why Cover works so well and how its properties are related to AL methods. We show how certain characteristics of Cover’s clustering step allows it to more thoroughly explore the feature space by selecting query-document pairs that are representative and diverse, allowing L2R methods to produce effective models. The main novel contribution is a detailed analysis of the method’s inner workings and information-theoretic properties, allowing us to advance the understanding of L2R fundamentals through the lens of training set building.

Year	DOI	Venue
2022	10.1016/j.ins.2022.04.012	Information Sciences
Keywords	DocType	Volume
Active learning,Learning to rank,Ranking dataset creation,Dataset compression	Journal	601
ISSN	Citations	PageRank
0020-0255	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Rodrigo M. Silva	1	0	0.34
Guilherme C. M. Gomes	2	0	0.34
Mario S. Alvim	3	0	0.34
Marcos André Gonçalves	4	2740	191.03

1