Title
Classification of molecular structures made easy
Abstract
Several problems in bioinformatics and chem-informatics concern the classification of molecules. Relevant instances are automatic cancer detection/classification, machine-learning pathologic prediction, automatic predictive toxicology, etc. Molecules may be represented in terms of graphical structures in a natural way: each node in the graph can be used to represent an atom, whilst the edges of the graph represent the atom-atom bonds. Labels (in the form of real-valued vectors) are associated with nodes and edges in order to express physical and chemical properties of the corresponding atoms and bonds, respectively. These structured data are expected to contain more information than a traditional (flat) feature vector, information that may strengthen the classification capabilities of a machine learner. This paper investigates the application of a novel Bayesian/connectionist classifier to this graphical pattern recognition task. The approach is much simpler than state-of-the-art machine learning paradigms for graphical/relational learning. It relies on the idea of describing the graph in terms of a binary relation. The posterior probability of a class given the relation is estimated as a function of probabilistic quantities modeled with a neural network, trained over individual vertex pairs in the graph. The popular and challenging Mutagenesis dataset is considered for the experimental evaluation. Despite its simplicity, the technique turns out to yield the highest recognition accuracies to date on the complete (friendly + unfriendly) dataset, outperforming complex machines (relational and graph neural nets, kernels for graphs, inductive logic programming techniques, etc.). Some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained.
Year
DOI
Venue
2008
10.1109/IJCNN.2008.4634258
IJCNN
Keywords
Field
DocType
Bayes methods,biology computing,graph theory,molecular biophysics,pattern classification,probability,Bayesian/connectionist classifier,Mutagenesis dataset,bioinformatics,cheminformatics,graphical pattern recognition,molecular structure,molecules classification,neural network,posterior probability
Graph theory,Feature vector,Graph database,Pattern recognition,Statistical relational learning,Computer science,Mixed graph,Artificial intelligence,Graphical model,Artificial neural network,Cheminformatics,Machine learning
Conference
ISSN
Citations 
PageRank 
1098-7576
1
0.35
References 
Authors
13
2
Name
Order
Citations
PageRank
Edmondo Trentin128629.25
Ernesto Di Iorio2155.01