Title
Moving huge scientific datasets over the Internet
Abstract
Modern scientific experiments can generate hundreds of gigabytes to terabytes or even petabytes of data that may be maintained in large numbers of relatively small files. Frequently, these data must be disseminated to remote collaborators or computational centers for data analysis. Moving this dataset with high performance and strong robustness and providing a simple interface for users are challenging tasks. We present a data transfer framework comprising a high-performance data transfer library based on GridFTP, an extensible data scheduler with four data scheduling policies, and a GUI that allows users to transfer their dataset easily, reliably, and securely. This system incorporates automatic tuning mechanisms to select at runtime the number of concurrent threads to be used for transfers. Also included are restart mechanisms for handling client, network, and server failures. Experimental results indicate that our data transfer system can significantly improve data transfer performance and can recover well from failures. Copyright © 2011 John Wiley & Sons, Ltd.
Year
DOI
Venue
2011
10.1002/cpe.1779
Concurrency and Computation: Practice and Experience
Keywords
Field
DocType
high-performance data transfer library,data analysis,computational center,extensible data scheduler,data transfer performance,data transfer framework,huge scientific datasets,automatic tuning mechanism,high performance,data transfer system,John Wiley
Data-intensive computing,Petabyte,Computer science,e-Science,Gigabyte,Parallel computing,Thread (computing),Robustness (computer science),GridFTP,The Internet,Distributed computing
Journal
Volume
Issue
ISSN
23
18
1532-0626
Citations 
PageRank 
References 
1
0.35
12
Authors
4
Name
Order
Citations
PageRank
Wantao Liu1738.29
Brian Tieman2579.77
Rajkumar Kettimuthu377070.13
Foster Ian4229382663.24