Title
Graph-Based Data Relevance Estimation for Large Storage Systems
Abstract
In storage systems, the relevance of files to users can be taken into account to determine storage control policies to reduce cost, while retaining high reliability and performance. The relevance of a file can be estimated by applying supervised learning and using the metadata as features. However, supervised learning requires many training samples to achieve an acceptable estimation accuracy. In this paper, we propose a novel graph-based learning system for the relevance estimation of files using a small training set. First, files are grouped into different file-sets based on the available metadata. Then a parameterized similarity metric among files is introduced for each file-set using the knowledge of the metadata. Finally, message passing over a bipartite graph is applied for relevance estimation. The proposed system is tested on various datasets and compared with logistic regression.
Year
DOI
Venue
2018
10.1109/BigDataCongress.2018.00040
2018 IEEE International Congress on Big Data (BigData Congress)
Keywords
Field
DocType
Big data,similarity learning,message passing,data relevance,estimation,data storage
Training set,Data mining,Metadata,Graph,Parameterized complexity,Computer science,Bipartite graph,Supervised learning,Logistic regression,Message passing
Conference
ISSN
ISBN
Citations 
2379-7703
978-1-5386-7233-4
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Vinodh Venkatesan1627.82
Taras Lehinevych230.73
Giovanni Cherubini3529.18
Andrii Glybovets400.34
Mark A. Lantz5223.00