Title
Local representativeness in vector data
Abstract
The amount of large-scale real data around us is increasing in size very quickly, as is the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, the direct application of which on original data would be unfeasible. Conventional sampling methods provide non-deterministic results trying to preserve selected characteristics of the input dataset. We present a novel, simple, straightforward and deterministic approach with the same goal. It is not sampling in the true sense but a reduction of vector data, which maintains very well internal data structures (clusters and density). The approach is based on analyzing the nearest neighbors. Our suggested x-representativeness then takes into account the local density of the data and nearest neighbors of individual data objects. Following that, we also present experiments with two different datasets. The aim of these experiments is to show that the x-representativeness can be used to deterministically reduce the datasets to differently sized samples of representatives, while maintaining properties of the original datasets.
Year
DOI
Venue
2014
10.1109/SMC.2014.6974025
Systems, Man and Cybernetics
Keywords
Field
DocType
data compression,data mining,data structures,sampling methods,analytical methods,data clusters,data objects,data size reduction,deterministic approach,deterministic dataset reduction,internal data structures,large-scale real data,local data density,local representativeness,nearest neighbor analysis,sampling methods,vector data reduction,x-representativeness,data mining,density bias,nearest neighbor,sampling
Data mining,Computer science,Representativeness heuristic,Artificial intelligence,Machine learning
Conference
Citations 
PageRank 
References 
1
0.37
9
Authors
3
Name
Order
Citations
PageRank
Sarka Zehnalova184.83
Milos Kudelka211623.81
Jan Platos328658.72