Title
Data analysis of (non-)metric proximities at linear costs
Abstract
Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to very large data sets. Kernel methods easily deal with metric similarity matrices, also at large scale, but costly transformations are necessary starting with non-metric (dis-) similarities. We propose an integrative combination of Nyström approximation, potential double centering and eigenvalue correction to obtain valid kernel matrices at linear costs. Accordingly effective kernel approaches, become accessible for these data. Evaluation at several larger (dis-)similarity data sets shows that the proposed method achieves much better runtime performance than the standard strategy while keeping competitive model accuracy. Our main contribution is an efficient linear technique, to convert (potentially non-metric) large scale dissimilarity matrices into approximated positive semi-definite kernel matrices.
Year
DOI
Venue
2013
10.1007/978-3-642-39140-8_4
SIMBAD
Keywords
Field
DocType
valid kernel matrix,linear cost,kernel method,large scale dissimilarity matrix,effective kernel approach,large scale,data analysis,large data set,domain specific data property,similarity data,approximated positive semi-definite kernel,metric proximity,complex data object
Kernel (linear algebra),Pairwise comparison,Mathematical optimization,Data set,Matrix (mathematics),Support vector machine,Complex data type,Algorithm,Kernel method,Eigenvalues and eigenvectors,Mathematics
Conference
Citations 
PageRank 
References 
14
0.54
21
Authors
2
Name
Order
Citations
PageRank
Frank-Michael Schleif142746.59
Andrej Gisbrecht219515.60