Title
Towards a privacy preserving cohort discovery framework for clinical research networks.
Abstract
Display Omitted We design a privacy-preserving cohort discovery framework for distributed networks.We show how real-word cohort specifications can be translated within the framework.Cohort queries can be executed in a timely manner on a database of 7 million records.A parallelized design with efficient block indexing improves the query-response time. BackgroundThe last few years have witnessed an increasing number of clinical research networks (CRNs) focused on building large collections of data from electronic health records (EHRs), claims, and patient-reported outcomes (PROs). Many of these CRNs provide a service for the discovery of research cohorts with various health conditions, which is especially useful for rare diseases.Supporting patient privacy can enhance the scalability and efficiency of such processes; however, current practice mainly relies on policy, such as guidelines defined in the Health Insurance Portability and Accountability Act (HIPAA), which are insufficient for CRNs (e.g., HIPAA does not require encryption of data which can mitigate insider threats). By combining policy with privacy enhancing technologies we can enhance the trustworthiness of CRNs. The goal of this research is to determine if searchable encryption can instill privacy in CRNs without sacrificing their usability. MethodsWe developed a technique, implemented in working software to enable privacy-preserving cohort discovery (PPCD) services in large distributed CRNs based on elliptic curve cryptography (ECC). This technique also incorporates a block indexing strategy to improve the performance (in terms of computational running time) of PPCD. We evaluated the PPCD service with three real cohort definitions: (1) elderly cervical cancer patients who underwent radical hysterectomy, (2) oropharyngeal and tongue cancer patients who underwent robotic transoral surgery, and (3) female breast cancer patients who underwent mastectomy) with varied query complexity. These definitions were tested in an encrypted database of 7.1 million records derived from the publically available Healthcare Cost and Utilization Project (HCUP) Nationwide Inpatient Sample (NIS). We assessed the performance of the PPCD service in terms of (1) accuracy in cohort discovery, (2) computational running time, and (3) privacy afforded to the underlying records during PPCD. ResultsThe empirical results indicate that the proposed PPCD can execute cohort discovery queries in a reasonable amount of time, with query runtime in the range of 165262s for the 3 use cases, with zero compromise in accuracy. We further show that the search performance is practical because it supports a highly parallelized design for secure evaluation over encrypted records. Additionally, our security analysis shows that the proposed construction is resilient to standard adversaries. ConclusionsPPCD services can be designed for clinical research networks. The security construction presented in this work specifically achieves high privacy guarantees by preventing both threats originating from within and beyond the network.
Year
DOI
Venue
2017
10.1016/j.jbi.2016.12.008
Journal of Biomedical Informatics
Keywords
Field
DocType
Clinical research network (CRN),Data privacy,OneFlorida Clinical Data Research Network (CDRN),Patient-Centered Clinical Research Network (PCORnet),Privacy-preserving cohort discovery,Searchable encryption
Data mining,Health Insurance Portability and Accountability Act,Computer security,Computer science,Usability,Search engine indexing,Encryption,Security analysis,Privacy-enhancing technologies,Information privacy,Scalability
Journal
Volume
Issue
ISSN
66
C
1532-0464
Citations 
PageRank 
References 
0
0.34
11
Authors
7
Name
Order
Citations
PageRank
Jiawei Yuan131018.11
Bradley Malin2728.24
François Modave34510.01
Yi Guo41210.16
William Hogan510.69
Elizabeth Shenkman643.87
Jiang Bian715043.09