Title
A Data Set for Social Diversity Studies of GitHub Teams
Abstract
Like any other team oriented activity, the software development process is effected by social diversity in the programmer teams. The effect of team diversity can be significant, but also complex, especially in decentralized teams. Discerning the precise contribution of diversity on teams' effectiveness requires quantitative studies of large data sets. Here we present for the first time a large data set of social diversity attributes of programmers in GitHub teams. Using alias resolution, location data, and gender inference techniques, we collected a team social diversity data set of 23,493 GitHub projects. We illustrate how the data set can be used in practice with a series of case studies, and we hope its availability will foster more interest in studying diversity issues in software teams.
Year
DOI
Venue
2015
10.1109/MSR.2015.77
MSR
Field
DocType
Volume
Data mining,Alias,Data set,Programmer,Computer science,Inference,Knowledge management,Cultural diversity,Software,Software development process,Team software process
Conference
2
ISBN
Citations 
PageRank 
978-0-7695-5594-2
25
0.86
References 
Authors
17
3
Name
Order
Citations
PageRank
Bogdan Vasilescu193548.75
Alexander Serebrenik21745150.69
Vladimir Filkov3150375.32