Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem. - Citegraph

Paper Info

Title
Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem.

Abstract
In this technological era, every person, authorities, entrepreneurs, businesses, and many things around us are connected to the internet, forming Internet of thing (IoT). This generates a massive amount of diverse data with very high-speed, termed as big data. However, this data is very useful that can be used as an asset for the businesses, organizations, and authorities to predict future in various aspects. However, efficiently processing Big Data while making real-time decisions is a quite challenging task. Some of the tools like Hadoop are used for Big Datasets processing. On the other hand, these tools could not perform well in the case of real-time high-speed stream processing. Therefore, in this paper, we proposed an efficient and real-time Big Data stream processing approach while mapping Hadoop MapReduce equivalent mechanism on graphics processing units (GPUs). We integrated a parallel and distributed environment of Hadoop ecosystem and a real-time streaming processing tool, i.e., Spark with GPU to make the system more powerful in order to handle the overwhelming amount of high-speed streaming. We designed a MapReduce equivalent algorithm for GPUs for a statistical parameter calculation by dividing overall Big Data files into fixed-size blocks. Finally, the system is evaluated while considering the efficiency aspect (processing time and throughput) using (1) large-size city traffic video data captured by static as well as moving vehicles’ cameras while identifying vehicles and (2) large text-based files, like twitter data files, structural data, etc. Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation.

Year	DOI	Venue
2018	10.1007/s10766-017-0513-2	International Journal of Parallel Programming
Keywords	Field	DocType
Big Data, Hadoop, Spark, GPU, MapReduce	Graphics,Spark (mathematics),Distributed Computing Environment,Computer science,Parallel computing,Throughput,Data file,Stream processing,Big data,Database,The Internet	Journal
Volume	Issue	ISSN
46	3	0885-7458
Citations	PageRank	References
8	0.54	22
Authors
5

Authors (5 rows)

Cited by (8 rows)

References (22 rows)

Name	Order	Citations	PageRank
muhammad mazhar ullah rathore	1	301	21.15
Hojae Son	2	10	1.30
Awais Ahmad	3	379	45.85
Anand Paul	4	527	46.32
Gwanggil Jeon	5	596	117.99

1