Title
Mining hidden knowledge for drug safety assessment: topic modeling of LiverTox as a case study.
Abstract
Given the significant impact on public health and drug development, drug safety has been a focal point and research emphasis across multiple disciplines in addition to scientific investigation, including consumer advocates, drug developers and regulators. Such a concern and effort has led numerous databases with drug safety information available in the public domain and the majority of them contain substantial textual data. Text mining offers an opportunity to leverage the hidden knowledge within these textual data for the enhanced understanding of drug safety and thus improving public health.In this proof-of-concept study, topic modeling, an unsupervised text mining approach, was performed on the LiverTox database developed by National Institutes of Health (NIH). The LiverTox structured one document per drug that contains multiple sections summarizing clinical information on drug-induced liver injury (DILI). We hypothesized that these documents might contain specific textual patterns that could be used to address key DILI issues. We placed the study on drug-induced acute liver failure (ALF) which was a severe form of DILI with limited treatment options.After topic modeling of the "Hepatotoxicity" sections of the LiverTox across 478 drug documents, we identified a hidden topic relevant to Hy's law that was a widely-accepted rule incriminating drugs with high risk of causing ALF in humans. Using this topic, a total of 127 drugs were further implicated, 77 of which had clear ALF relevant terms in the "Outcome and management" sections of the LiverTox. For the rest of 50 drugs, evidence supporting risk of ALF was found for 42 drugs from other public databases.In this case study, the knowledge buried in the textual data was extracted for identification of drugs with potential of causing ALF by applying topic modeling to the LiverTox database. The knowledge further guided identification of drugs with the similar potential and most of them could be verified and confirmed. This study highlights the utility of topic modeling to leverage information within textual drug safety databases, which provides new opportunities in the big data era to assess drug safety.
Year
DOI
Venue
2014
10.1186/1471-2105-15-S17-S6
BMC Bioinformatics
Keywords
Field
DocType
Acute Liver Failure, Topic Modeling, Latent Dirichlet Allocation, Phenelzine, Tolcapone
Public health,Data science,Data mining,Latent Dirichlet allocation,Public domain,Biology,Drug development,Topic model,Bioinformatics,Drug
Journal
Volume
Issue
ISSN
15 Suppl 17
S-17
1471-2105
Citations 
PageRank 
References 
2
0.39
2
Authors
7
Name
Order
Citations
PageRank
Ke Yu120.39
Jie Zhang24715.01
Minjun Chen3252.55
Xiaowei Xu46441683.89
Ayako Suzuki520.39
Katarina Ilic620.39
Weida Tong768450.10