Title
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search.
Abstract
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta [Das16]. The main result is that one of the most popular algorithms used in practice, average linkage agglomerative clustering, has a small constant approximation ratio for this objective. Furthermore, this paper establishes that using bisecting k-means divisive clustering has a very poor lower bound on its approximation ratio for the same objective. However, we show that there are divisive algorithms that perform well with respect to this objective by giving two constant approximation algorithms. This paper is some of the first work to establish guarantees on widely used hierarchical algorithms for a natural objective function. This objective and analysis give insight into what these popular algorithms are optimizing and when they will perform well.
Year
Venue
Field
2017
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017)
Hierarchical clustering,k-means clustering,Approximation algorithm,Mathematical optimization,Correlation clustering,Computer science,Upper and lower bounds,Artificial intelligence,Local search (optimization),Cluster analysis,Brown clustering,Machine learning
DocType
Volume
ISSN
Conference
30
1049-5258
Citations 
PageRank 
References 
3
0.40
8
Authors
2
Name
Order
Citations
PageRank
Benjamin Moseley155450.11
Joshua R. Wang2695.83