Abstract | ||
---|---|---|
In storage systems, the relevance of files to users can be taken into account to determine storage control policies to reduce cost, while retaining high reliability and performance. The relevance of a file can be estimated by applying supervised learning and using the metadata as features. However, supervised learning requires many training samples to achieve an acceptable estimation accuracy. In this paper, we propose a novel graph-based learning system for the relevance estimation of files using a small training set. First, files are grouped into different file-sets based on the available metadata. Then a parameterized similarity metric among files is introduced for each file-set using the knowledge of the metadata. Finally, message passing over a bipartite graph is applied for relevance estimation. The proposed system is tested on various datasets and compared with logistic regression. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/BigDataCongress.2018.00040 | 2018 IEEE International Congress on Big Data (BigData Congress) |
Keywords | Field | DocType |
Big data,similarity learning,message passing,data relevance,estimation,data storage | Training set,Data mining,Metadata,Graph,Parameterized complexity,Computer science,Bipartite graph,Supervised learning,Logistic regression,Message passing | Conference |
ISSN | ISBN | Citations |
2379-7703 | 978-1-5386-7233-4 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vinodh Venkatesan | 1 | 62 | 7.82 |
Taras Lehinevych | 2 | 3 | 0.73 |
Giovanni Cherubini | 3 | 52 | 9.18 |
Andrii Glybovets | 4 | 0 | 0.34 |
Mark A. Lantz | 5 | 22 | 3.00 |