Title
Toward a new approach for sorting extremely large data files in the big data era
Abstract
The extensive amount of data and contents generated today will require a paradigm shift in processing and management techniques for these data. One of the important data processing operations is the data sorting. Using multiple passes in external merge sort has a great influence on speeding up the sorting of extremely large data files. Since in large files, the swapping time is dominant in many applications, algorithms that minimize the swapping operations are normally superior to those which only focus on CPU time optimizations. In sorting extremely large files, external algorithms, such as the merge sort, are normally used. It is shown that using multiple passes over the data set, as proposed in our algorithm, has resulted in a great improvement in the number of swaps, thus, reducing the overall sorting time. Moreover, the proposed technique is suitable to be used with the emerging parallelization techniques such as GPUs. The reported results show the superiority of the proposed technique for “CPU only” and hybrid CPU–GPU implementations.
Year
DOI
Venue
2019
10.1007/s10586-018-2860-1
Cluster Computing
Keywords
Field
DocType
Big data, Sorting, External merge sort, Large file processing, Hybrid CPU–GPU
Swap (computer programming),Data processing,Merge sort,Computer science,CPU time,Parallel computing,Implementation,Sorting,Data file,Big data,Distributed computing
Journal
Volume
Issue
ISSN
22
SP3.0
1573-7543
Citations 
PageRank 
References 
0
0.34
18
Authors
5
Name
Order
Citations
PageRank
Ali Shatnawi1539.46
Yathrip Alzahouri200.34
Mohammed A. Shehab31046.94
Yaser Jararweh443.46
Mahmoud Al-Ayyoub573063.41