Abstract | ||
---|---|---|
In this technological era, every person, authorities, entrepreneurs, businesses, and many things around us are connected to the internet, forming Internet of thing (IoT). This generates a massive amount of diverse data with very high-speed, termed as big data. However, this data is very useful that can be used as an asset for the businesses, organizations, and authorities to predict future in various aspects. However, efficiently processing Big Data while making real-time decisions is a quite challenging task. Some of the tools like Hadoop are used for Big Datasets processing. On the other hand, these tools could not perform well in the case of real-time high-speed stream processing. Therefore, in this paper, we proposed an efficient and real-time Big Data stream processing approach while mapping Hadoop MapReduce equivalent mechanism on graphics processing units (GPUs). We integrated a parallel and distributed environment of Hadoop ecosystem and a real-time streaming processing tool, i.e., Spark with GPU to make the system more powerful in order to handle the overwhelming amount of high-speed streaming. We designed a MapReduce equivalent algorithm for GPUs for a statistical parameter calculation by dividing overall Big Data files into fixed-size blocks. Finally, the system is evaluated while considering the efficiency aspect (processing time and throughput) using (1) large-size city traffic video data captured by static as well as moving vehicles’ cameras while identifying vehicles and (2) large text-based files, like twitter data files, structural data, etc. Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/s10766-017-0513-2 | International Journal of Parallel Programming |
Keywords | Field | DocType |
Big Data, Hadoop, Spark, GPU, MapReduce | Graphics,Spark (mathematics),Distributed Computing Environment,Computer science,Parallel computing,Throughput,Data file,Stream processing,Big data,Database,The Internet | Journal |
Volume | Issue | ISSN |
46 | 3 | 0885-7458 |
Citations | PageRank | References |
8 | 0.54 | 22 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
muhammad mazhar ullah rathore | 1 | 301 | 21.15 |
Hojae Son | 2 | 10 | 1.30 |
Awais Ahmad | 3 | 379 | 45.85 |
Anand Paul | 4 | 527 | 46.32 |
Gwanggil Jeon | 5 | 596 | 117.99 |