Abstract | ||
---|---|---|
Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to very large data sets. Kernel methods easily deal with metric similarity matrices, also at large scale, but costly transformations are necessary starting with non-metric (dis-) similarities. We propose an integrative combination of Nyström approximation, potential double centering and eigenvalue correction to obtain valid kernel matrices at linear costs. Accordingly effective kernel approaches, become accessible for these data. Evaluation at several larger (dis-)similarity data sets shows that the proposed method achieves much better runtime performance than the standard strategy while keeping competitive model accuracy. Our main contribution is an efficient linear technique, to convert (potentially non-metric) large scale dissimilarity matrices into approximated positive semi-definite kernel matrices. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/978-3-642-39140-8_4 | SIMBAD |
Keywords | Field | DocType |
valid kernel matrix,linear cost,kernel method,large scale dissimilarity matrix,effective kernel approach,large scale,data analysis,large data set,domain specific data property,similarity data,approximated positive semi-definite kernel,metric proximity,complex data object | Kernel (linear algebra),Pairwise comparison,Mathematical optimization,Data set,Matrix (mathematics),Support vector machine,Complex data type,Algorithm,Kernel method,Eigenvalues and eigenvectors,Mathematics | Conference |
Citations | PageRank | References |
14 | 0.54 | 21 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Frank-Michael Schleif | 1 | 427 | 46.59 |
Andrej Gisbrecht | 2 | 195 | 15.60 |