Title
A Test Collection for Relevance and Sensitivity
Abstract
Recent interest in the design of information retrieval systems that can balance an ability to find relevant content with an ability to protect sensitive content creates a need for test collections that are annotated for both relevance and sensitivity. This paper describes the development of such a test collection that is based on the Avocado Research Email Collection. Four people created search topics as a basis for assessing relevance, and two personas describing the sensitivities of representative (but fictional) content creators were created as a basis for assessing sensitivity. These personas were based on interviews with potential donors of historically significant email collections and with archivists who currently manage access to such collections. Two annotators then created relevance and sensitivity judgments for 65 topics, divided approximately equally between the two personas. Annotator agreement statistics indicate fairly good external reliability for both relevance and sensitivity annotations, and a baseline sensitivity classifier trained and evaluated using cross-validation achieved better than 80% $F_1$, suggesting that the resulting collection will likely be useful as a basis for comparing alternative retrieval systems that seek to balance relevance and sensitivity.
Year
DOI
Venue
2020
10.1145/3397271.3401284
SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-8016-4
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Mahmoud F. Sayed1254.07
William Cox2122.71
Jonah Lynn Rivera300.34
Caitlin Christian-Lamb400.34
Modassir Iqbal500.34
Douglas W. Oard62484246.11
Katie Shilton776351.86