Title
Heterogeneous subgraph features for information networks
Abstract
Networks play an increasingly important role in modelling real-world systems due to their utility in representing complex connections. For predictive analyses, the engineering of node features in such networks is of fundamental importance to machine learning applications, where the lack of external information often introduces the need for features that are based purely on network topology. Existing feature extraction approaches have so far focused primarily on networks with just one type of node and thereby disregarded the information contained in the topology of heterogeneous networks, or used domain specific approaches that incorporate node labels based on external knowledge. Here, we generalize the notion of heterogeneity and present an approach for the efficient extraction and representation of heterogeneous subgraph features. We evaluate their performance for rank- and label-prediction tasks and explore the implications of feature importance for prominent subgraphs. Our experiments reveal that heterogeneous subgraph features reach the predictive power of manually engineered features that incorporate domain knowledge. Furthermore, we find that heterogeneous subgraph features outperform state-of-the-art neural node embeddings in both tasks and across all data sets.
Year
DOI
Venue
2018
10.1145/3210259.3210266
GRADES/NDA@SIGMOD/PODS
Field
DocType
ISBN
Data mining,Data set,Information networks,Predictive power,Domain knowledge,Computer science,Network topology,Feature extraction,Feature engineering,Heterogeneous network
Conference
978-1-4503-5695-4
Citations 
PageRank 
References 
1
0.35
24
Authors
7
Name
Order
Citations
PageRank
Andreas Spitz1499.19
Diego Costa271.12
Kai Chen3208.16
Jan Greulich410.35
Johanna Geiss5213.51
Stefan Wiesberg6244.14
Michael Gertz7375.55