Title
Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge.
Abstract
The massive and unprecedented volume of scientific literature readily available in the domain of biomedicine has presented us with challenges and opportunities to accelerate hypothesis generation. Advanced text mining techniques are required to leverage this abundant textual representation in order to provide timely access to explicit facts and aid in elucidating association among implicit facts. The problem of inferring novel knowledge from these implicit facts by logically connecting independent fragments of literature is known as Literature Based Discovery(LBD). In LBD, to discover hidden links, it is important to determine the relevancy between concepts using appropriate information measures. In this paper, to discover interesting and inherent links latent in large corpora, nine distinct methods, comprising variants of statistical information measures and derived semantic knowledge from domain ontology, are designed and compared. For better understanding of results, we split methods into three groups. The first group includes traditional information measures such as Mutual information, Chi-Square and those used in association rule mining; the second group incorporates popular null-invariant correlation measures: All_Confidence, Kulczynski, and Cosine; the third group consists of null-invariant measures combined with our proposed notion of semantic relatedness. We have also proposed a new strategy of effective preprocessing, which is capable of removing terms that are spurious, semantically unrelated or have meager chances of constituting a new discovery. A series of experiments are performed and analyzed for those proposed methods. In addition, we also provide an organized list of final concepts deemed worthy of scientific investigation or experimentation. Overall, our research presents a comprehensive analysis and perspective of how different statistical information measures and semantic knowledge affect the knowledge discovery procedure.
Year
DOI
Venue
2016
10.1145/2975167.2975200
BCB
Keywords
Field
DocType
Literature based discovery, Semantic knowledge, Information measures, MeSH Terms
Semantic similarity,Body of knowledge,Scientific literature,Domain knowledge,Computer science,Information science,Association rule learning,Literature-based discovery,Knowledge extraction,Bioinformatics
Conference
Citations 
PageRank 
References 
1
0.36
5
Authors
2
Name
Order
Citations
PageRank
Kishlay Jha1497.83
Wei Jin28325.25