Local representativeness in vector data - Citegraph

Paper Info

Title
Local representativeness in vector data

Abstract
The amount of large-scale real data around us is increasing in size very quickly, as is the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, the direct application of which on original data would be unfeasible. Conventional sampling methods provide non-deterministic results trying to preserve selected characteristics of the input dataset. We present a novel, simple, straightforward and deterministic approach with the same goal. It is not sampling in the true sense but a reduction of vector data, which maintains very well internal data structures (clusters and density). The approach is based on analyzing the nearest neighbors. Our suggested x-representativeness then takes into account the local density of the data and nearest neighbors of individual data objects. Following that, we also present experiments with two different datasets. The aim of these experiments is to show that the x-representativeness can be used to deterministically reduce the datasets to differently sized samples of representatives, while maintaining properties of the original datasets.

Year	DOI	Venue
2014	10.1109/SMC.2014.6974025	Systems, Man and Cybernetics
Keywords	Field	DocType
data compression,data mining,data structures,sampling methods,analytical methods,data clusters,data objects,data size reduction,deterministic approach,deterministic dataset reduction,internal data structures,large-scale real data,local data density,local representativeness,nearest neighbor analysis,sampling methods,vector data reduction,x-representativeness,data mining,density bias,nearest neighbor,sampling	Data mining,Computer science,Representativeness heuristic,Artificial intelligence,Machine learning	Conference
Citations	PageRank	References
1	0.37	9
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (9 rows)

Name	Order	Citations	PageRank
Sarka Zehnalova	1	8	4.83
Milos Kudelka	2	116	23.81
Jan Platos	3	286	58.72

1