Title
Gene Set Databases - A Fountain of Knowledge or a Siren Call?
Abstract
Gene set analysis is a well-established approach for analyzing high-throughput gene expression data. The choice of gene set database used for gene set analysis may affect the outcome of the analysis. Therefore, understanding characteristics of these databases is vital to the success of gene set analysis. Due to the sheer size of the gene set databases, a comprehensive qualitative evaluation of them is impractical. In this paper, we quantitatively study several well-established gene set databases. We propose and use a quantitative measure for assessing the similarity between gene set databases. Also, we introduce presence score, for quantifying the degree to which a given gene is represented in a database, and permeability score, for quantifying the degree to which genes in a given list co-occur in the gene sets of a database. A maximum achievable coverage score is defined based on the permeability score. Using the maximum achievable coverage score, we propose a methodology to statistically determine whether a phenotype of interest is well-represented in a given database. To study the effect of the choice of gene set database on the result of gene set analysis and show the utility of the maximum achievable coverage score, we conduct an experiment using two widely used gene set analysis methods and three expression datasets. The results suggest that the choice of gene set database might profoundly affect the outcome of the analysis. Also, our findings show that the permeability score and maximum achievable coverage can be used to guide the selection of an appropriate gene set database for a given study.
Year
DOI
Venue
2019
10.1145/3307339.3342146
BCB
Field
DocType
ISBN
World Wide Web,Computer science,Fountain,Artificial intelligence,Siren (mythology),Machine learning
Conference
978-1-4503-6666-3
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Farhad Maleki1114.12
Katie Ovens201.69
Ian McQuillan39724.72
Elham Rezaei400.34
Alan M. Rosenberg500.34
Anthony J. Kusalik611319.69