Title
Surveying the MOOC Data Set Universe
Abstract
This paper is a survey of the availability of open data sets generated from Massively Open Online Courses (MOOCs). This log data allows researchers to analyze and predict student performance. Often, the goal of the analysis is to focus on at-risk students who are not likely to finish a course. There is a growing gap between the average researcher (who does not have access to proprietary data) and the ready availability of data sets for analysis. Most research papers studying and predicting student performance in MOOCs are done on proprietary data sets that are not anonymized (de-identified) or released for general study. There are no standardized tools that provide a gateway to access usable data sets; instead, the researcher must navigate a maze of sites with different data structures and varying data access policies. To our knowledge, no open data sets are being produced, and have not been since 2016. The authors survey the history of MOOC data sharing, identify the few available open data sets, and discuss a path forward to increase the reproducibility of MOOC research.
Year
DOI
Venue
2019
10.1109/LWMOOCS47620.2019.8939594
2019 IEEE Learning With MOOCS (LWMOOCS)
Keywords
DocType
ISBN
MOOC,weblog,analysis,edx2bigquery,Google BigQuery,anonymized data set,de-identification,MOOCdb,Moodle,Educational Data Mining,Learning Analytics,Learning at Scale,Limeade
Conference
978-1-7281-2550-3
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
James J. Lohse100.34
Christine A. McManus200.34
David Joyner398.40