Title
Chespa: Streamlining Expansive Chemical Space Evaluation Of Molecular Sets
Abstract
Thousands of chemical properties can be calculated for small molecules, which can be used to place the molecules within the context of a broader "chemical space." These definitions vary based on compounds of interest and the goals for the given chemical space definition. Here, we introduce a customizable Python module, chespa, built to easily assess different chemical space definitions through clustering of compounds in these spaces and visualizing trends of these clusters. To demonstrate this, chespa currently streamlines prediction of various molecular descriptors (predicted chemical properties, molecular substructures, AI-based chemical space, and chemical class ontology) in order to test six different chemical space definitions. Furthermore, we investigated how these varying definitions trend with mass spectrometry (MS)based observability, that is, the ability of a molecule to be observed with MS (e.g., as a function of the molecule ionizability), using an example data set from the U.S. EPA's nontargeted analysis collaborative trial, where blinded samples had been analyzed previously, providing 1398 data points. Improved understanding of observability would offer many advantages in small-molecule identification, such as (i) a priori selection of experimental conditions based on suspected sample composition, (ii) the ability to reduce the number of candidate structures during compound identification by removing those less likely to ionize, and, in turn, (iii) a reduced false discovery rate and increased confidence in identifications. Factors controlling observability are not fully understood, making prediction of this property nontrivial and a prime candidate for chemical space analysis.
Year
DOI
Venue
2020
10.1021/acs.jcim.0c00899
JOURNAL OF CHEMICAL INFORMATION AND MODELING
DocType
Volume
Issue
Journal
60
12
ISSN
Citations 
PageRank 
1549-9596
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Jamie R Nuñez100.34
Monee Mcgrady200.34
Yasemin Yesiltepe300.34
Ryan S. Renslow432.05
Thomas O Metz5667.25