Title
Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem.
Abstract
In this technological era, every person, authorities, entrepreneurs, businesses, and many things around us are connected to the internet, forming Internet of thing (IoT). This generates a massive amount of diverse data with very high-speed, termed as big data. However, this data is very useful that can be used as an asset for the businesses, organizations, and authorities to predict future in various aspects. However, efficiently processing Big Data while making real-time decisions is a quite challenging task. Some of the tools like Hadoop are used for Big Datasets processing. On the other hand, these tools could not perform well in the case of real-time high-speed stream processing. Therefore, in this paper, we proposed an efficient and real-time Big Data stream processing approach while mapping Hadoop MapReduce equivalent mechanism on graphics processing units (GPUs). We integrated a parallel and distributed environment of Hadoop ecosystem and a real-time streaming processing tool, i.e., Spark with GPU to make the system more powerful in order to handle the overwhelming amount of high-speed streaming. We designed a MapReduce equivalent algorithm for GPUs for a statistical parameter calculation by dividing overall Big Data files into fixed-size blocks. Finally, the system is evaluated while considering the efficiency aspect (processing time and throughput) using (1) large-size city traffic video data captured by static as well as moving vehicles’ cameras while identifying vehicles and (2) large text-based files, like twitter data files, structural data, etc. Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation.
Year
DOI
Venue
2018
10.1007/s10766-017-0513-2
International Journal of Parallel Programming
Keywords
Field
DocType
Big Data, Hadoop, Spark, GPU, MapReduce
Graphics,Spark (mathematics),Distributed Computing Environment,Computer science,Parallel computing,Throughput,Data file,Stream processing,Big data,Database,The Internet
Journal
Volume
Issue
ISSN
46
3
0885-7458
Citations 
PageRank 
References 
8
0.54
22
Authors
5
Name
Order
Citations
PageRank
muhammad mazhar ullah rathore130121.15
Hojae Son2101.30
Awais Ahmad337945.85
Anand Paul452746.32
Gwanggil Jeon5596117.99