When big data leads to lost data - Citegraph

Paper Info

Title
When big data leads to lost data

Abstract
For decades, scientists bemoaned the scarcity of observational data to analyze and against which to test their models. Exponential growth in data volumes from ever-cheaper environmental sensors has provided scientists with the answer to their prayers: "big data". Now, scientists face a new challenge: with terabytes, petabytes or exabytes of data at hand, stored in thousands of heterogeneous datasets, how can scientists find the datasets most relevant to their research interests? If they cannot find the data, then they may as well never have collected it; that data is lost to them. Our research addresses this challenge, using an existing scientific archive as our test-bed. We approach this problem in a new way: by adapting Information Retrieval techniques, developed for searching text documents, into the world of (primarily numeric) scientific data. We propose an approach that uses a blend of automated and "semi-curated" methods to extract metadata from large archives of scientific data. We then perform searches over the extracted metadata, returning results ranked by similarity to the query terms. We briefly describe an implementation performed at an ocean observatory to validate the proposed approach. We propose performance and scalability research to explore how continued archive growth will affect our goal of interactive response, no matter the scale.

Year	DOI	Venue
2012	10.1145/2389686.2389688	PIKM
Keywords	Field	DocType
exponential growth,observational data,lost data,scalability research,research interest,existing scientific archive,continued archive growth,scientific data,data volume,big data	Data science,Metadata,Metadata repository,Information retrieval,Ranking,Petabyte,Terabyte,Computer science,Data retrieval,Big data,Scalability	Conference
Citations	PageRank	References
8	0.60	20
Authors
2

Authors (2 rows)

Cited by (8 rows)

References (20 rows)

Name	Order	Citations	PageRank
V. M. Megler	1	41	5.16
David Maier	2	5639	1666.90

1