Title
Scalability of Privacy-Preserving Linear Regression in Epidemiological Studies
Abstract
In many hospitals, data related to patients are observed and collected to a central database for medical research. For instance, DPC dataset, which stands for Disease, Procedure and Combination, covers medical records for more than 7 million patients in more than 1000 hospitals. Using the distributed DPC data set, a number of epidemiological studied are feasible to reveal useful knowledge on medical treatments. Hence, cryptography helps to preserve the privacy of personal data. The study called as Privacy-Preserving Data Mining (PPDM) aims to perform a data mining algorithm with preserving confidentiality of datasets. This paper studies the scalability of privacy-preserving data mining in epidemiological study. As for the data-mining algorithm, we focus to a linear regression since it is used in many applications and simple to be evaluated. We try to identify the linear model to estimate a length of hospital stay from distributed dataset related to the patient and the disease information. Our contributions of this paper include (1) to propose privacy-preserving protocols for linear regression with horizontally or vertically partitioned datasets, and (2) to clarify the limitation of size of problem to be performed. These information are useful to determine the dominant element in PPDM and to figure out the direction of study for further improvement.
Year
DOI
Venue
2015
10.1109/AINA.2015.229
AINA
Keywords
Field
DocType
privacy,dpc,data mining
Data mining,Confidentiality,Linear model,Cryptography,Computer science,Central database,Medical record,Medical research,Scalability,Linear regression
Conference
ISSN
Citations 
PageRank 
1550-445X
0
0.34
References 
Authors
7
4
Name
Order
Citations
PageRank
hiroaki kikuchi12216.34
H. Hashimoto241976.51
Hideo Yasunaga3243.69
Takamichi Saito45618.52