Title
Analysis of Memory Constrained Live Provenance.
Abstract
We conjecture that meaningful analysis of large-scale provenance can be preserved by analyzing provenance data in limited memory while the data is still in motion; that the provenance needs not be fully resident before analysis can occur. As a proof of concept, this paper defines a stream model for reasoning about provenance data in motion for Big Data provenance. We propose a novel streaming algorithm for the backward provenance query, and apply it to the live provenance captured from agent-based simulations. The performance test demonstrates high throughput, low latency and good scalability, in a distributed stream processing framework built on Apache Kafka and Spark Streaming.
Year
DOI
Venue
2016
10.1007/978-3-319-40593-3_4
IPAW
Keywords
Field
DocType
Live data provenance, Stream processing, Agent-Based model
Data mining,Spark (mathematics),Streaming algorithm,Computer science,Provenance,Proof of concept,Latency (engineering),Stream processing,Big data,Database,Scalability
Conference
Volume
ISSN
Citations 
9672
0302-9743
2
PageRank 
References 
Authors
0.36
21
3
Name
Order
Citations
PageRank
peng chen1313.28
Tom Evans220.36
Beth Plale31837142.80