Abstract | ||
---|---|---|
With the development of modern technologies, it is possible to gather an extraordinarily large number of observations. Due to the storage or transmission burden, big data are usually scattered at multiple locations. It is difficult to transfer all of data to the central server for analysis. A distributed subdata selection method for big data linear regression model is proposed. Particularly, a two-step subsampling strategy with optimal subsampling probabilities and optimal allocation sizes is developed. The subsample-based estimator effectively approximates the ordinary least squares estimator from the full data. The convergence rate and asymptotic normality of the proposed estimator are established. Simulation studies and an illustrative example about airline data are provided to assess the performance of the proposed method. (C) 2020 Elsevier B.V. All rights reserved. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.csda.2020.107072 | COMPUTATIONAL STATISTICS & DATA ANALYSIS |
Keywords | DocType | Volume |
Allocation sizes, Big data, Distributed subsampling, Optimal subsampling, Regression diagnostic | Journal | 153 |
ISSN | Citations | PageRank |
0167-9473 | 1 | 0.35 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Haixiang Zhang | 1 | 64 | 12.19 |
haiying | 2 | 4 | 3.72 |