On the Feasibility of Distributed Kernel Regression for Big Data - Citegraph

Paper Info

Title
On the Feasibility of Distributed Kernel Regression for Big Data

Abstract
In Big Data applications, massive datasets with huge numbers of observations are frequently encountered. To deal with such massive datasets, a divide-and-conquer scheme (e.g., MapReduce) is often used for the analysis of Big Data. With such a strategy, a large dataset (e.g., a centralized real database or a virtual database with distributed data sources) is first divided into smaller manageable segments; the final output is then aggregated from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR) problem, where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and challenging issues in the Big Data setup. Using a sample dependent kernel dictionary, we show that, with proper data segmentation, DKR leads to an estimator that is generalization consistent to the unknown regression function. This result theoretically justifies DKR and sheds light on more advanced distributive algorithms for processing Big Data. The promising performance of the method is supported by both simulation and real data examples.

Year	DOI	Venue
2016	10.1109/TKDE.2016.2594060	IEEE Trans. Knowl. Data Eng.
Keywords	DocType	Volume
Big data,Kernel,Distributed databases,Distributed algorithms,Estimation,Data models,Algorithm design and analysis	Journal	28
Issue	ISSN	Citations
11	1041-4347	3
PageRank	References	Authors
0.56	21	3

Authors (3 rows)

Cited by (3 rows)

References (21 rows)

Name	Order	Citations	PageRank
chen xu	1	3	0.90
yongquan zhang	2	3	0.56
Runze Li	3	112	20.80

1