Title | ||
---|---|---|
Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality. |
Abstract | ||
---|---|---|
We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the m machines receives n data points from a d-dimensional Gaussian distribution with unknown mean θ which is promised to be k-sparse. The machines communicate by message passing and aim to estimate the mean θ. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed sparse linear regression problem: to achieve the statistical minimax error, the total communication is at least Ω(min{n,d}m), where n is the number of observations that each machine receives and d is the ambient dimension. These lower results improve upon Shamir (NIPS'14) and Steinhardt-Duchi (COLT'15) by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a distributed data processing inequality, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1145/2897518.2897582 | STOC |
Keywords | DocType | Volume |
Communication complexity,Information complexity,statistical estimation | Conference | abs/1506.07216 |
ISSN | Citations | PageRank |
0737-8017 | 19 | 0.77 |
References | Authors | |
13 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mark Braverman | 1 | 810 | 61.60 |
Ankit Garg | 2 | 125 | 16.19 |
Tengyu Ma | 3 | 596 | 41.23 |
Huy L. Nguyen | 4 | 376 | 32.33 |
David P. Woodruff | 5 | 2156 | 142.38 |