Title
Graph kernels for chemical compounds using topological and three-dimensional local atom pair environments
Abstract
Approaches that can predict the biological activity or properties of a chemical compound are an important application of machine learning. In this paper, we introduce a new kernel function for measuring the similarity between chemical compounds and for learning their related properties and activities. The method is based on local atom pair environments which can be rapidly computed by using the topological all-shortest paths matrix and the geometrical distance matrix of a molecular graph as lookup tables. The local atom pair environments are stored in prefix search trees, so called tries, for an efficient comparison. The kernel can be either computed as an optimal assignment kernel or as a corresponding convolution kernel over all local atom similarities. We implemented the Tanimoto kernel, min kernel, minmax kernel and the dot product kernel as local kernels, which are computed recursively by traversing the tries. We tested the approach on eight structure-activity and structure-property molecule benchmark data sets from the literature. The models were trained with @e- support vector regression and support vector classification. The local atom pair kernels showed to be at least competitive to state-of-the-art kernels in seven out of eight cases in a direct comparison. A comparison against literature results using similar experimental setups as in the original works confirmed these findings. The method is easy to implement and has robust default parameters.
Year
DOI
Venue
2010
10.1016/j.neucom.2010.03.008
Neurocomputing
Keywords
Field
DocType
graph kernel,local kernel,local atom pair environment,optimal assignment kernel,local atom pair kernel,three-dimensional local atom pair,new kernel function,corresponding convolution kernel,minmax kernel,chemical compound,min kernel,tanimoto kernel,support vector regression,biological activity,structure activity relationship,kernel function,support vector machine,distance matrix,lookup table,shortest path,machine learning,cheminformatics,three dimensional
Graph kernel,Topology,Radial basis function kernel,Kernel embedding of distributions,Tree kernel,Polynomial kernel,Artificial intelligence,Kernel method,String kernel,Variable kernel density estimation,Mathematics,Machine learning
Journal
Volume
Issue
ISSN
74
1-3
Neurocomputing
Citations 
PageRank 
References 
7
0.49
29
Authors
5
Name
Order
Citations
PageRank
Georg Hinselmann1968.12
Nikolas Fechner21038.38
Andreas Jahn370.49
Matthias Eckert470.49
Andreas Zell51419137.58