Title
Gradual in silico filtering for druglike substances.
Abstract
The suitability of decision trees in comparison to support vector machines for the classification of chemical compounds into drugs and nondrugs was investigated. To account for the requirements upon screening virtual compound libraries, schemes for successive filtering steps with gradual increasing computational cost are outlined. The obtained prediction accuracy was similar between decision trees and support vector machine approaches for the applied compound data sets. By using rapidly computable variables such as druglikeness indices, XlogP, and the molar refractivity, at least 39% of the nondrugs can be filtered out, while retaining more than 83% of the actual drugs. Computationally more demanding descriptors such as specific substructure queries and quantum chemically derived variables can be postponed to subsequent classification schemes for the reduced set of compounds, whereby up to 92% of the nondrugs can be sorted out without loosing considerably more drugs. Using all available computed descriptors simultaneously in the first step did not yield significantly better results. Furthermore, the generated decision trees are used to derive guidelines for the design of druglike substances. The numerical margins found at the branching points suggest several criteria that separate drugs from nondrugs: a molecular weight higher than 230, a molar refractivity higher than 40, and the presence of one or more rings as well as one or more functional groups. Also reported are additionally required parameters to compute values for XlogP, SlogP, and the molar refractivity of boron and silicon containing compounds.
Year
DOI
Venue
2008
10.1021/ci700351y
JOURNAL OF CHEMICAL INFORMATION AND MODELING
Field
DocType
Volume
Decision tree,Data mining,Data set,Molar refractivity,Support vector machine,Classification scheme,Filter (signal processing),Druglikeness,Bioinformatics,Mathematics,In silico
Journal
48
Issue
ISSN
Citations 
3
1549-9596
5
PageRank 
References 
Authors
0.42
0
4
Name
Order
Citations
PageRank
Nadine Schneider1302.67
Christine Jäckels250.42
Claudia Andres350.76
Michael C Hutter4163.69