Title
Predicting gene-disease associations from the heterogeneous network using graph embedding
Abstract
The discovery of gene-disease associations is important for the prevention, diagnosis and treatment of diseases. The studies on gene-disease associations have produced diverse data, which can facilitate the gene-disease association prediction. Integrating diverse information is critical for developing high-accuracy prediction models. In this paper, we propose a heterogeneous network-based method that enhances gene-disease association prediction by using graph embedding and ensemble learning, abbreviated as “HNEEM”. A heterogeneous network is constructed based on gene-disease associations, gene-chemical associations and disease-chemical associations, to combine diverse information. The network uses genes, diseases and chemicals as nodes, and uses their associations as edges. The graph embedding methods are utilized to extract representation vectors of nodes in the heterogeneous network, and the feature vectors of genes and diseases are merged to represent gene-disease pairs, and the random forest is employed to build the prediction model based on gene-disease pairs. We consider six types of graph embedding methods, and take the individual graph embedding method-generated features to build prediction models and use them as base predictors, and then combine base predictors to develop the ensemble learning model HNEEM. We comprehensively compare different graph embedding methods, and results demonstrate that the graph embedding methods produce satisfying results in the gene-disease association prediction, and integrating different graph embedding methods can make further improvements. In computational experiments, HNEEM produces better results compared to the state-of-the-art gene-disease perdition methods, and HNEEM is robust to the data richness as well. Moreover, the usefulness of the proposed method HNEEM is validated by the case studies. In conclusion, HNEEM is a promising method for predicting gene-disease associations.
Year
DOI
Venue
2019
10.1109/BIBM47256.2019.8983134
2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Keywords
Field
DocType
gene-disease association,heterogeneous network,graph embedding
Feature vector,Computer science,Graph embedding,Artificial intelligence,Heterogeneous network,Predictive modelling,Random forest,Ensemble learning,Machine learning
Conference
ISSN
ISBN
Citations 
2156-1125
978-1-7281-1868-0
1
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Xiaochan Wang110.34
Yuchong Gong250.75
Jing Yi331.06
Wen Zhang431.75