Abstract | ||
---|---|---|
We conjecture that meaningful analysis of large-scale provenance can be preserved by analyzing provenance data in limited memory while the data is still in motion; that the provenance needs not be fully resident before analysis can occur. As a proof of concept, this paper defines a stream model for reasoning about provenance data in motion for Big Data provenance. We propose a novel streaming algorithm for the backward provenance query, and apply it to the live provenance captured from agent-based simulations. The performance test demonstrates high throughput, low latency and good scalability, in a distributed stream processing framework built on Apache Kafka and Spark Streaming. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-40593-3_4 | IPAW |
Keywords | Field | DocType |
Live data provenance, Stream processing, Agent-Based model | Data mining,Spark (mathematics),Streaming algorithm,Computer science,Provenance,Proof of concept,Latency (engineering),Stream processing,Big data,Database,Scalability | Conference |
Volume | ISSN | Citations |
9672 | 0302-9743 | 2 |
PageRank | References | Authors |
0.36 | 21 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
peng chen | 1 | 31 | 3.28 |
Tom Evans | 2 | 2 | 0.36 |
Beth Plale | 3 | 1837 | 142.80 |