Title
Incompleteness in Networks: Biases, Skewed Results, and Some Solutions
Abstract
Most network analysis is conducted on existing incomplete samples of much larger complete, fully observed graphs. For example, many researchers obtain graphs from online data repositories without knowing how these graphs were collected. Thus, these graphs can be poor representations of the fully observed networks. More complete data would lead to more accurate analyses, but data acquisition can be at best costly and at worst error-prone. For example, think of an adversary that deliberately poisons the answer to a query. Given a query budget for identifying additional nodes and edges, how can one improve the observed graph sample so that it is a more accurate representation of the complete, fully observed network? How does the approach change if one is interested in learning the best function (e.g. node classifier) on the network for a down-stream task? This is a novel problem that is related to, but distinct from, topics such as graph sampling and crawling. Given the prevailing use of graph samples in the research literature, this problem is of considerable importance, even though it has been ignored. In this tutorial, we discuss latent biases in incomplete networks and present methods for enriching such networks through active probing of nodes and edges. We focus on active learning and sequential decision-making formulations of this problem (a.k.a. the network discovery problem). We present distinctions between learning to grow the network (a.k.a. active exploration) vs. learning the "best" function on the network (a.k.a. active learning). In addition, we will discuss issues surrounding adversarial machine learning when querying for more data to reduce incompleteness.
Year
DOI
Keywords
2019
10.1145/3292500.3332276
active exploration, active learning, incomplete networks, online learning, sequential decision making
Field
DocType
ISSN
Data mining,Computer science,Artificial intelligence,Machine learning
Conference
978-1-4503-6201-6
ISBN
Citations 
PageRank 
978-1-4503-6201-6
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Tina Eliassi-Rad11597108.63
Rajmonda Caceres274.88
Timothy LaRock301.35