Title
Mining The Network Of The Programmers: A Data-Driven Analysis Of Github
Abstract
GitHub is a worldwide popular website for version control and source code management. In addition, since its users can follow each other, it also forms a professional social network of millions of users. In this work, we perform a data-driven study for analyzing the GitHub network. By introducing a distributed crawling framework, we first collect profiles and behavioral data of more than 2 million GitHub users. To the best of our knowledge, this is the largest and latest public dataset of GitHub. Then, we build the social graph of these users and conduct a thorough analysis of the network structure. Moreover, we investigate the user behavior patterns, particularly the patterns of the "commit" activities. Finally, we utilize machine learning methods to discover important users in the network with a high accuracy and a low overhead. Our inspiring findings are helpful for GitHub to provide better services for its users.
Year
DOI
Venue
2017
10.1145/3127404.3127431
12TH CHINESE CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CHINESECSCW 2017)
Keywords
Field
DocType
GitHub, professional social networks, PageRank, machine learning, spatial-temporal analysis
Data science,PageRank,World Wide Web,Social network,Social graph,Crawling,Data-driven,Commit,Computer science,Human–computer interaction,Behavioral data,Network structure
Conference
Citations 
PageRank 
References 
0
0.34
10
Authors
5
Name
Order
Citations
PageRank
Yezhou Ma100.34
Huiying Li220.73
Jiyao Hu372.16
Rong Xie400.34
Yang Chen537533.50