Title
Reuse-centric k-means configuration
Abstract
K-means configuration is to find a configuration of k-means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k-means. This paper proposes reuse-centric k-means configuration to accelerate k-means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k-means–based data classification tasks show that reuse-centric k-means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential.
Year
DOI
Venue
2021
10.1016/j.is.2021.101787
Information Systems
Keywords
DocType
Volume
K-means,Algorithm configuration,Computation reuse
Journal
100
ISSN
Citations 
PageRank 
0306-4379
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Lijun Zhang100.34
Hui Guan200.34
Yufei Ding314323.07
Xipeng Shen42025118.55
H. Krim5594126.35