Title
How Does the Quality of Phospholipidosis Data Influence the Predictivity of Structural Alerts?
Abstract
The ability of drugs to induce phospholipidosis (PLD) is linked directly to their molecular substructures: hydrophobic, cyclic moieties with hydrophilic, peripheral amine groups. These structural properties can be captured and coded into SMILES arbitrary target specification (SMARTS) patterns. Such structural alerts, which are capable of identifying potential PLD inducers, should ideally be developed on a relatively large but reliable data set. We had previously developed a model based on SMARTS patterns consisting of 32 structural fragments using information from 450 chemicals. In the present study, additional PLD structural alerts have been developed based on a newer and larger data set combining two data sets published recently by the United States Food and Drug Administration (US FDA). To assess the predictive performance of the updated SMARTS model, two publicly available data sets were considered. These data sets were constructed using different criteria and hence represent different standards for overall quality. In the first data set high quality was assured as all negative chemicals were confirmed by the gold standard method for the detection of PLD-transmission electron microscopy (EM). The second data set was constructed from seven previously published data sets and then curated by removing compounds where conflicting results were found for PLD activity. Evaluation of the updated SMARTS model showed a strong, positive correlation between predictive performance of the alerts and the quality of the data set used for the assessment. The results of this study confirm the importance of using high quality data for modeling and evaluation, especially in the case of PLD, where species, tissue, and dose dependence of results are additional confounding factors.
Year
DOI
Venue
2014
10.1021/ci500233k
JOURNAL OF CHEMICAL INFORMATION AND MODELING
DocType
Volume
Issue
Journal
54
8
ISSN
Citations 
PageRank 
1549-9596
2
0.40
References 
Authors
3
3
Name
Order
Citations
PageRank
Katarzyna R. Przybylak130.82
Abdullah Rzgallah Alzahrani220.40
Mark T. D. Cronin33110.12