Abstract | ||
---|---|---|
Apache Spark is one of the most popular big data tools. Despite its popularity, there are no studies regarding its overall usage among software developers. As such, essential questions remain unanswered. For instance, it is not known what the common issues faced by Spark users are, what the most popular Spark libraries are, or what technologies are most commonly used together with Spark. In this paper, we mine Stack Overflow questions and try to shed some light into the above issues. Specifically, we first apply Latent Dirichlet Allocation (LDA) to Stack Overflow questions and obtain the main topics of discussion. By computing previously proposed metrics and a novel modification, we provide insights into Spark usage while taking question view count into account. Further insights are then given by applying newly proposed metrics to the question tags. Temporal trends are finally discussed after analyzing the proposed metrics over time. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/BigDataCongress.2018.00037 | 2018 IEEE International Congress on Big Data (BigData Congress) |
Keywords | Field | DocType |
Apache Spark,mining software repositories,Stack Overflow,topic modeling | Data science,Latent Dirichlet allocation,Spark (mathematics),Computer science,Popularity,Software,Stack overflow,Big data,Database,Market research | Conference |
ISSN | ISBN | Citations |
2379-7703 | 978-1-5386-7233-4 | 1 |
PageRank | References | Authors |
0.35 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Leonardo Jiménez Rodríguez | 1 | 1 | 0.35 |
Xiaoran Wang | 2 | 1 | 0.69 |
Jilong Kuang | 3 | 38 | 17.00 |