Title
An automated PLS search for biologically relevant QSAR descriptors.
Abstract
An automated PLS engine, WB-PLS, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-PLS extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without scaling. Cross-validation, variable importance on projections (VIP) above 0.8 and q2 > or = 0.3 were applied for model significance. Among cross-validation methods, leave-one-in-seven-out (CV7) is a better measure of model significance, compared to leave-one-out (measuring redundancy) and leave-half-out (too restrictive). SMARTS counts overlap with 2D descriptors (having a more quantitative nature), whereas MDL keys overlap with in-house fingerprints (both are more qualitative). The SMARTS counts is the most effective descriptor system, when compared to the other three. At the individual level, size-related descriptors and topological indices (in the 2D property space), and branched SMARTS, aromatic and ring atom types and halogens are found to be most relevant according to the VIP criterion.
Year
DOI
Venue
2004
10.1007/s10822-004-4060-8
Journal of computer-aided molecular design
Keywords
Field
DocType
data-mining,fingerprints,PLS,QSAR,SMARTS,SMILES,topological indices,WOMBAT
Quantitative structure–activity relationship,Data mining,Chemistry,Fingerprint,Redundancy (engineering)
Journal
Volume
Issue
ISSN
18
7-9
0920-654X
Citations 
PageRank 
References 
13
1.18
9
Authors
3
Name
Order
Citations
PageRank
Marius Olah1182.20
Cristian Bologa2336.11
Tudor I Oprea335946.89