Abstract | ||
---|---|---|
In the last few years many computer and laboratory improvements in the production and analysis of DNA sequences have made possible the complete sequencing of whole genomes. This provides a wealth of raw genomes that needs to be processed and annotated. All eukaryotic genomes examined and published thus far contain repetitive DNA. The amount of repetitive DNA in any specific eukaryotic genome ranges from 5% to 80%. These repeats consist mainly of transposable elements and tandem repeats which need to be identified, classified and annotated in order to sequence and annotate an entire genome. This paper discusses the design and implementation of a distributed cluster and grid based workflow to classify transposable elements. We show experimental results for representative species genomes on a cluster and grid. The performance and results of the workflow with regard to turnaround time, scalability, load balancing, resource utilization and fault tolerance are shown and discussed. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1109/CCGRID.2006.127 | CCGrid |
Keywords | Field | DocType |
DNA,biology computing,fault tolerant computing,genetics,grid computing,resource allocation,workstation clusters,bioinformatics,cluster based classification,distributed cluster,distributed workflow,eukaryotic genomes,fault tolerance,grid based classification,load balancing,resource utilization,transposable elements,In cluster,bioinformatics.,distributed workflow,elements,transposable | Genome,Tandem repeat,Grid computing,Repeated sequence,Transposable element,Computer science,Genomics,DNA sequencing,Distributed computing,DNA computing | Conference |
Volume | ISBN | Citations |
2 | 0-7695-2585-7 | 4 |
PageRank | References | Authors |
0.60 | 13 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nirmal Ranganathan | 1 | 4 | 0.94 |
Cedric Feschotte | 2 | 9 | 1.89 |
David Levine | 3 | 118 | 9.73 |