Title
Scaling the Growing Neural Gas for Visual Cluster Analysis
Abstract
The growing neural gas (GNG) is an unsupervised topology learning algorithm that models a data space through interconnected units that stand on the most populated areas of that space. Its output is a graph that can be visually represented on a two-dimensional plane, disclosing cluster patterns in datasets. It is common, however, for GNG to result in highly connected graphs when trained on high-dimensional data, which in turn leads to highly cluttered 2D representations that may fail to disclose meaningful patterns. Moreover, its sequential learning limits its potential for faster executions on local datasets, and, more importantly, its potential for training on distributed datasets while leveraging from the computational resources of the infrastructures in which they reside. This paper presents two methods that improve GNG for the visualization of cluster patterns in largescale and high-dimensional datasets. The first one focuses on providing more accurate and meaningful 2D visual representations for cluster patterns of high-dimensional datasets, by avoiding connections that lead to high-dimensional graphs in the modeled topology which may, in turn, result in overplotting and clutter. The second method presented in this paper enables the use of GNG on big and distributed datasets with faster execution times, by modeling and merging separate parts of a dataset using the MapReduce model. Quantitative and qualitative evaluations show that the first method leads to the creation of lower dimensional graph structures that provide more meaningful (and sometimes more accurate) cluster representations with less overplotting and clutter; and that the second method preserves the accuracy and meaning of the cluster representations while enabling its execution in large-scale and distributed settings. (C) 2021 The Author(s). Published by Elsevier Inc.
Year
DOI
Venue
2021
10.1016/j.bdr.2021.100254
BIG DATA RESEARCH
Keywords
DocType
Volume
Growing neural gas, Big data, Visual analytics, Unsupervised learning, Exploratory data analysis
Journal
26
ISSN
Citations 
PageRank 
2214-5796
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Elio Ventocilla151.12
Rafael M. Martins200.34
Fernando Paulovich300.34
Maria Riveiro400.34