Title
An adaptive information-theoretic approach for identifying temporal correlations in big data sets
Abstract
In the past two decades, new developments in computing, sensing and crowdsourced data have resulted in an explosion in the availability of quantitative information. The possibilities of analyzing this so-called “big data” to inform research and the decision-making process are virtually endless. In general analyses have to be done across multiple data sets in order to bring out the most value of big data. A first important step is to identify temporal correlations between data sets. Given the characteristics of big data in term of volume and velocity, techniques that identify correlations not only need to be scalable, but also need to help users in ordering the correlation across temporal resolutions so that they can focus on important relationships. There is a large body of work in this area, however, most of them either only deal with small data sets, using a fixed temporal resolution, or does not provide a quantifiable measure of a correlation significance. In this paper, we present a method based on mutual information to identify correlations in large data sets. Discovered correlations are suggested to users in an order based on their significance. Our method supports an adaptive streaming technique that minimizes duplicated computation and is implemented on top of Apache Spark for scalability using big data platforms. We also provide a comprehensive evaluation using real-world data sets from NYC Open Data, and compare our findings against a recent study.
Year
DOI
Venue
2016
10.1109/BigData.2016.7840659
2016 IEEE International Conference on Big Data (Big Data)
Keywords
Field
DocType
temporal correlation,mutual information,Big Data,adaptive sliding window,streaming
Data mining,Open data,Time series,Data set,Spark (mathematics),Small data,Computer science,Mutual information,Artificial intelligence,Big data,Machine learning,Scalability
Conference
ISBN
Citations 
PageRank 
978-1-4673-9006-4
0
0.34
References 
Authors
14
3
Name
Order
Citations
PageRank
Nguyen Ho101.69
Huy T. Vo2103561.10
Mai Vu333225.56