Title
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery.
Abstract
Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.
Year
DOI
Venue
2019
10.1186/s12859-019-2914-2
BMC Bioinformatics
Keywords
Field
DocType
Knowledge graph, Heterogeneous network, Biomedical knowledge discovery, Representation learning, Graph embedding, Node embedding, Edge semantics, Applied machine learning, Data science, Linked data, Semantic web, Network science, Systems biology
Graph,Data mining,Stochastic gradient descent,Embedding,Stochastic matrix,Computer science,Homogeneous,Theoretical computer science,Knowledge extraction,Semantics,Feature learning
Journal
Volume
Issue
ISSN
20
1
1471-2105
Citations 
PageRank 
References 
3
0.38
7
Authors
11
Name
Order
Citations
PageRank
Zheng Gao193.84
Gang Fu22079.67
Chunping Ouyang363.35
Satoshi Tsutsui4205.84
Xiaozhong Liu536848.27
Jeremy J. Yang6434.62
Christopher Gessner730.38
Brian Foote830.38
David Wild9584.61
Qi Yu1030.72
Ying Ding112396144.65