Title
Data Mining the NCI60 to Predict Generalized Cytotoxicity.
Abstract
Elimination of cytotoxic compounds in the early and later stages of drug discovery can help reduce the costs of research and development. Through the application of principal components analysis (PCA), we were able to data mine and prove that similar to 89% of the total log GI(50) variance is due to the nonspecific cytotoxic nature of substances. Furthermore, PCA led to the identification of groups of structurally unrelated Substances showing very specific toxicity profiles, such as a set of 45 substances toxic only to the Leukemia_SR cancer cell line. In an effort to predict nonspecific cytotoxicity on the basis of the mean log GI(50), we created a decision tree using MACCS keys that can correctly classify over 83% of the substances as cytotoxic/noncytotoxic in silico, on the basis of the cutoff of mean log GI(50) = -5.0. Finally, we have established a linear model using least-squares in which nine of the 59 available NCI60 cancer cell lines can be used to predict the mean log GI(50). The model has R-2 = 0.99 and a root-mean-square deviation between the observed and calculated mean log GI(50) (RMSE) = 0.09. Our predictive models can be applied to flag generally cytotoxic molecules in Virtual and real chemical libraries, thus saving time and effort.
Year
DOI
Venue
2008
10.1021/ci800097k
JOURNAL OF CHEMICAL INFORMATION AND MODELING
Keywords
Field
DocType
data mining
Data mining,Decision tree,Drug discovery,Cytotoxicity,Linear model,Chemistry,Bioinformatics,Cytotoxic T cell,Principal component analysis,In silico
Journal
Volume
Issue
ISSN
48
7
1549-9596
Citations 
PageRank 
References 
5
0.67
2
Authors
4
Name
Order
Citations
PageRank
Adam C. Lee1162.18
Kerby Shedden213417.42
Gustavo R. Rosania3102.58
Gordon M. Crippen412124.00