Abstract | ||
---|---|---|
With the growing acceptance of the Open Archive Initiative (OAI) [16] framework, a number of digital libraries are becoming OAI compliant. This is making it feasible to build an effective federated digital library, which harvests metadata from the OAI-compliant libraries and provides a unified search service over the aggregated metadata. Arc [10] is an example of such a federated digital library. Assuming that a rapid increase (e.g., several orders of magnitude) in the adoption of OAI-PMH [16] occurs, we now have a different problem: how to efficiently discover, harvest and index the burgeoning OAI-PMH corpus. In this project, we are working on using Grid and cluster technology to address these performance issues. In this paper, we focus on the use of Grid for parallelizing the harvesting task for an OAI-based federated digital library. We propose a Grid-based architecture for parallel harvesting that supports: dynamic allocation of harvesting nodes, scheduling of harvesting tasks to maximize the performance, and uniform load distribution for the indexing node. We have implemented and evaluated the proposed architecture on a Grid based on the GT3 toolkit |
Year | DOI | Venue |
---|---|---|
2005 | 10.1145/1062261.1062281 | Conf. Computing Frontiers |
Keywords | Field | DocType |
digital library,harvesting node,parallel harvesting,grid-based architecture,harvesting task,oai compliant,effective federated digital library,oai-compliant library,federated digital library,oai-based federated digital library,indexation,grid,load distribution | Metadata,Architecture,Computer science,Parallel search,Scheduling (computing),Search engine indexing,Digital library,Database,Grid | Conference |
ISBN | Citations | PageRank |
1-59593-019-1 | 4 | 0.40 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kurt Maly | 1 | 567 | 139.93 |
Mohammad Zubair | 2 | 587 | 89.90 |
Vamshi Chilukamarri | 3 | 4 | 0.40 |
Pratik Kothari | 4 | 4 | 0.40 |