Title
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
Abstract
Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads. We present one example of an abstraction—All-Pairs—that fits the needs of several applications in biometrics, bioinformatics, and data mining. We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieve performance orders of magnitude better than the obvious but naive approach, and is both faster and more efficient than a tuned conventional approach. This abstraction has been in production use for one year on a 500 CPU campus grid at the University of Notre Dame and has been used to carry out a groundbreaking analysis of biometric data.
Year
DOI
Venue
2010
10.1109/TPDS.2009.49
IEEE Trans. Parallel Distrib. Syst.
Keywords
Field
DocType
data mining,cpu campus grid,campus grid,biometric data,data-intensive computing,easy access,high-level abstraction,campus grids,efficient execution,conventional approach,easy expression,optimized all-pairs abstraction,biometrics,production,grid computing,bioinformatics,data intensive computing,cloud computing,data analysis
Grid computing,Abstraction,End user,Data-intensive computing,Workload,Computer science,Biometrics,Grid,Distributed computing,Cloud computing
Journal
Volume
Issue
ISSN
21
1
1045-9219
Citations 
PageRank 
References 
35
2.80
23
Authors
6
Name
Order
Citations
PageRank
Christopher Moretti116111.40
Hoang Bui211810.45
Karen Hollingsworth3734.56
Brandon Rich4433.42
Patrick J. Flynn54405307.04
Douglas Thain61530127.00