Title
Enabling rapid development of parallel tree search applications
Abstract
Virtual observatories will give astronomers easy access to anunprecedented amount of data. Extracting scientific knowledge from these data will increasingly demand both efficient algorithms as well as the power of parallel computers. Nearly all efficient analyses of large astronomical datasets use trees as their fundamental data structure. Writing efficient tree-based techniques, a task that is time-consuming even on single-processor computers, is exceedingly cumbersome on massively parallel platforms (MPPs). Most applications that run on MPPs are simulation codes, since the expense of developing them is offset by the fact that they will be used for many years by many researchers. In contrast, data analysis codes change far more rapidly, are often unique to individual researchers, and therefore accommodate little reuse. Consequently, the economics of the current high-performance computing development paradigm for MPPs does not favor data analysis applications. We have therefore built a library, called Ntropy, that provides a flexible, extensible, and easy-to-use way of developing tree-based data analysis algorithms for both serial and parallel platforms. Our experience has shown that not only does our library save development time, it can also deliver excellent serial performance and parallel scalability. Furthermore, Ntropy makes it easy for an astronomer with little or noparallel programming experience to quickly scale their application to a distributed multiprocessor environment. By minimizing development time for efficient and scalable data analysis, we enable wide-scale knowledge discovery on massive datasets.
Year
DOI
Venue
2007
10.1145/1273404.1273410
CLADE@HPDC
Keywords
Field
DocType
scalable data analysis,efficient tree-based technique,rapid development,data analysis application,development time,efficient algorithm,fundamental data structure,parallel tree search application,tree-based data analysis algorithm,data analysis,parallel libraries,data analysis code,parallel development tools,efficient analysis,massive astrophysical datasets,parallel platform,parallel computer,astrophysics,knowledge discovery,scientific knowledge,data structure
Data structure,Reuse,Computer science,Massively parallel,Multiprocessing,Ntropy,Knowledge extraction,Offset (computer science),Scalability,Distributed computing
Conference
Citations 
PageRank 
References 
2
0.45
7
Authors
3
Name
Order
Citations
PageRank
Jeffrey P. Gardner11219.54
Andrew Connolly261.41
Cameron McBride320.45