Graph-Based Data Relevance Estimation for Large Storage Systems - Citegraph

Paper Info

Title
Graph-Based Data Relevance Estimation for Large Storage Systems

Abstract
In storage systems, the relevance of files to users can be taken into account to determine storage control policies to reduce cost, while retaining high reliability and performance. The relevance of a file can be estimated by applying supervised learning and using the metadata as features. However, supervised learning requires many training samples to achieve an acceptable estimation accuracy. In this paper, we propose a novel graph-based learning system for the relevance estimation of files using a small training set. First, files are grouped into different file-sets based on the available metadata. Then a parameterized similarity metric among files is introduced for each file-set using the knowledge of the metadata. Finally, message passing over a bipartite graph is applied for relevance estimation. The proposed system is tested on various datasets and compared with logistic regression.

Year	DOI	Venue
2018	10.1109/BigDataCongress.2018.00040	2018 IEEE International Congress on Big Data (BigData Congress)
Keywords	Field	DocType
Big data,similarity learning,message passing,data relevance,estimation,data storage	Training set,Data mining,Metadata,Graph,Parameterized complexity,Computer science,Bipartite graph,Supervised learning,Logistic regression,Message passing	Conference
ISSN	ISBN	Citations
2379-7703	978-1-5386-7233-4	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Vinodh Venkatesan	1	62	7.82
Taras Lehinevych	2	3	0.73
Giovanni Cherubini	3	52	9.18
Andrii Glybovets	4	0	0.34
Mark A. Lantz	5	22	3.00

1