A Test Collection for Relevance and Sensitivity - Citegraph

Paper Info

Title
A Test Collection for Relevance and Sensitivity

Abstract
Recent interest in the design of information retrieval systems that can balance an ability to find relevant content with an ability to protect sensitive content creates a need for test collections that are annotated for both relevance and sensitivity. This paper describes the development of such a test collection that is based on the Avocado Research Email Collection. Four people created search topics as a basis for assessing relevance, and two personas describing the sensitivities of representative (but fictional) content creators were created as a basis for assessing sensitivity. These personas were based on interviews with potential donors of historically significant email collections and with archivists who currently manage access to such collections. Two annotators then created relevance and sensitivity judgments for 65 topics, divided approximately equally between the two personas. Annotator agreement statistics indicate fairly good external reliability for both relevance and sensitivity annotations, and a baseline sensitivity classifier trained and evaluated using cross-validation achieved better than 80% $F_1$, suggesting that the resulting collection will likely be useful as a basis for comparing alternative retrieval systems that seek to balance relevance and sensitivity.

Year	DOI	Venue
2020	10.1145/3397271.3401284	SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020
DocType	ISBN	Citations
Conference	978-1-4503-8016-4	0
PageRank	References	Authors
0.34	0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mahmoud F. Sayed	1	25	4.07
William Cox	2	12	2.71
Jonah Lynn Rivera	3	0	0.34
Caitlin Christian-Lamb	4	0	0.34
Modassir Iqbal	5	0	0.34
Douglas W. Oard	6	2484	246.11
Katie Shilton	7	763	51.86

1