Micro-Clustering By Data Polishing - Citegraph

Paper Info

Title
Micro-Clustering By Data Polishing

Abstract
We address the problem of un-supervised soft-clustering that we call micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, whereas standard clustering methods find boundaries at which records are few. The existing methods have several weak points; generation of intractable amounts of clusters, biased size distributions, lack of robustness, etc. We propose a new methodology data polishing. Data polishing clarifies the cluster structures in the data by perturbating the data according to feasible hypothesis. More precisely, for graph clustering problems, data polishing replaces dense subgraphs that would correspond to clusters by cliques, and deletes edges not included in any dense subgraph. The clusters are clarified as maximal cliques, thus are easy to find, and the number of maximal cliques is reduced to tractable numbers. We also propose an efficient algorithm so that the computation is done in few minutes even for large scale data. The computational experiments demonstrate the efficiency of our formulation and algorithm, i.e., the number of solutions is small, such as 1,000, the members of each group are deeply related, and the computation time is short.

Year	Venue	Keywords
2017	2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)	clustering, data cleaning, pattern mining, algorithm
Field	DocType	ISSN
Cluster (physics),Data mining,Polishing,Computer science,Theoretical computer science,Robustness (computer science),Cluster analysis,Clustering coefficient,Big data,Computation	Conference	2639-1589
Citations	PageRank	References
0	0.34	0
Authors
6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Takeaki Uno	1	1319	107.99
Hiroki Maegawa	2	0	0.34
Takanobu Nakahara	3	0	0.34
Yukinobu Hamuro	4	43	7.76
Ryo Yoshinaka	5	172	26.19
Makoto Tatsuta	6	111	22.36

1