Distributed Subdata Selection For Big Data Via Sampling-Based Approach - Citegraph

Paper Info

Title
Distributed Subdata Selection For Big Data Via Sampling-Based Approach

Abstract
With the development of modern technologies, it is possible to gather an extraordinarily large number of observations. Due to the storage or transmission burden, big data are usually scattered at multiple locations. It is difficult to transfer all of data to the central server for analysis. A distributed subdata selection method for big data linear regression model is proposed. Particularly, a two-step subsampling strategy with optimal subsampling probabilities and optimal allocation sizes is developed. The subsample-based estimator effectively approximates the ordinary least squares estimator from the full data. The convergence rate and asymptotic normality of the proposed estimator are established. Simulation studies and an illustrative example about airline data are provided to assess the performance of the proposed method. (C) 2020 Elsevier B.V. All rights reserved.

Year	DOI	Venue
2021	10.1016/j.csda.2020.107072	COMPUTATIONAL STATISTICS & DATA ANALYSIS
Keywords	DocType	Volume
Allocation sizes, Big data, Distributed subsampling, Optimal subsampling, Regression diagnostic	Journal	153
ISSN	Citations	PageRank
0167-9473	1	0.35
References	Authors
0	2

Authors (2 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Haixiang Zhang	1	64	12.19
haiying	2	4	3.72

1