Title
On data collection, graph construction, and sampling in Twitter.
Abstract
We present a detailed study on data collection, graph construction, and sampling in Twitter. We observe that sampling on semantic graphs (i.e., graphs with multiple edge types) presents fundamentally distinct challenges from sampling on traditional graphs. The purpose of our work is to present new challenges and initial solutions for sampling semantic graphs. Novel elements of our work include the following: (1) We provide a thorough discussion of problems encountered with naive breadth-first search on semantic graphs. We argue that common sampling methods such as breadth-first search face specific challenges on semantic graphs that are not encountered on graphs with homogeneous edge types. (2) We present two competing methods for creating semantic graphs from data collects, corresponding to the interactions between sampling of different edge types. (3) We discuss new metrics specific to graphs with multiple edge types, and discuss the effect of the sampling method on these metrics. (4) We discuss issues and potential solutions pertaining to sampling semantic graphs.
Year
DOI
Venue
2016
10.5555/3192424.3192611
ASONAM '16: Advances in Social Networks Analysis and Mining 2016 Davis California August, 2016
Field
DocType
ISBN
Data collection,Data mining,Graph,Social network,Computer science,Homogeneous,Theoretical computer science,Sampling (statistics),Artificial intelligence,Information propagation,Machine learning,Semantics
Conference
978-1-5090-2846-7
Citations 
PageRank 
References 
2
0.42
11
Authors
4
Name
Order
Citations
PageRank
Jeremy D. Wendt132.13
Randy Wells220.42
Richard V. Field Jr.341.47
Sucheta Soundarajan412015.00