Title
Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.
Abstract
The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies. Database URL: https://bigodatamining.github.io/software/201901/
Year
DOI
Venue
2019
10.1093/database/baz030
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION
Field
DocType
Volume
Data science,Scientific literature,Data mining,Computer science
Journal
2019
ISSN
Citations 
PageRank 
1758-0463
0
0.34
References 
Authors
12
7
Name
Order
Citations
PageRank
Hong-Jie Dai128821.58
Chen-Kai Wang201.35
Nai-Wen Chang3243.94
Ming-Siang Huang472.13
Jitendra Jonnagaddala54610.28
Feng-Duo Wang600.68
Wen-Lian Hsu71701198.40