Abstract | ||
---|---|---|
Discovering the correct dataset in an efficient fashion is critical for effective simulations in the atmospheric sciences. Unlike text-based web documents, many of the large scientific datasets often contain binary encoded data that is hard to discover using popular search engines. In the atmospheric sciences, there has been a significant growth in public data hosting services. However, the ability to index and search has been limited by the metadata provided by the data host. We have developed an infrastructure-Atmospheric Data Discovery System (ADDS)-that provides an efficient data discovery environment for observational datasets in the atmospheric sciences. To support complex querying capabilities, we automatically extract and index fine-grained metadata. Datasets are indexed based on periodic crawling of popular sites and also of files requested by the users. Users are allowed to access subsets of a large dataset through our data customization feature. Our focus is the overall architecture, data subsetting scheme, and a performance evaluation of our system. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1016/j.future.2011.05.010 | Future Generation Comp. Syst. |
Keywords | Field | DocType |
efficient data discovery environment,large-scale datasets,atmospheric sciences,discovery,towards efficient data search,cloud computing,efficient fashion,correct dataset,data host,large dataset,large-scale atmospheric datasets,binary encoded data,data customization feature,atmospheric science,public data,index fine-grained metadata,metadata | Metadata,Metadata repository,Data mining,Data discovery,Crawling,Search engine,Information retrieval,Computer science,Data element,Personalization,Cloud computing | Journal |
Volume | Issue | ISSN |
28 | 1 | Future Generation Computer Systems |
Citations | PageRank | References |
4 | 0.40 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sangmi Lee Pallickara | 1 | 170 | 24.46 |
Shrideep Pallickara | 2 | 837 | 92.72 |
Milija Zupanski | 3 | 10 | 1.36 |