Title
A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection
Abstract
Protein sequence homology detection is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting homology between two protein sequences is computationally inexpensive, detecting pairwise homology at a large-scale becomes prohibitive, requiring millions of CPU hours. Yet, there is currently no efficient method available to parallelize this kernel. In this paper, we present the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for large-scale protein sequence data. Our method, called pGraph, is designed using a hierarchical multiple-master multiple-worker model, where the processor space is partitioned into subgroups and the hierarchy helps in ensuring the workload is load balanced fashion despite the inherent irregularity that may originate in the input. Experimental evaluation demonstrates that our method scales linearly on all input sizes tested (up to 640K sequences) on a 1,024 node supercomputer. In addition to demonstrating strong scaling, we present an extensive study of the various components of the system and related parametric studies.
Year
DOI
Venue
2010
10.1109/ICPP.2010.41
ICPP
Keywords
Field
DocType
hierarchical master-worker paradigm,fundamental problem,pgraph,cpu hour,parallel sequence graph construction,proteins,processor space,method scales linearly,scalable parallel algorithm,protein sequence homology detection,cpu hours,parallel algorithms,large-scale protein sequence data,computational molecular biology,multiple-master multiple-worker model,large-scale protein sequence homology,efficient method,pairwise homology,protein sequence,parallel protein sequence homology detection,input size,protein molecule,molecular biology,parallel algorithm,load balance,parallel processing,indexes,evaluation,algorithms
Kernel (linear algebra),Pairwise comparison,Protein sequencing,Supercomputer,Computer science,Load balancing (computing),Parallel algorithm,Parallel computing,Parametric statistics,Scalability,Distributed computing
Conference
ISSN
ISBN
Citations 
0190-3918 E-ISBN : 978-0-7695-4156-3
978-0-7695-4156-3
0
PageRank 
References 
Authors
0.34
3
3
Name
Order
Citations
PageRank
Changjun Wu1302.67
Kalyanaraman, Ananth222131.95
William R. Cannon36910.68