Title
Open dataset discovery using context-enhanced similarity search
Abstract
Today, open data catalogs enable users to search for datasets with full-text queries in metadata records combined with simple faceted filtering. Using this combination, a user is able to discover a significant number of the datasets relevant to a user's search intent. However, there still remain relevant datasets that are hard to find because of the enormous sparsity of their metadata (e.g., several keywords). As an alternative, in this paper, we propose an approach to dataset discovery based on similarity search over metadata descriptions enhanced by various semantic contexts. In general, the semantic contexts enrich the dataset metadata in a way that enables the identification of additional relevant datasets to a query that could not be retrieved using just the keyword or full-text search. In experimental evaluation we show that context-enhanced similarity retrieval methods increase the findability of relevant datasets, improving thus the retrieval recall that is critical in dataset discovery scenarios. As a part of the evaluation, we created a catalog-like user interface for dataset discovery and recorded streams of user actions that served us to create the ground truth. For the sake of reproducibility, we published the entire evaluation testbed.
Year
DOI
Venue
2022
10.1007/s10115-022-01751-z
KNOWLEDGE AND INFORMATION SYSTEMS
Keywords
DocType
Volume
Dataset, Discovery, Search, Similarity, Evaluation, Context
Journal
64
Issue
ISSN
Citations 
12
0219-1377
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
David Bernhauer100.34
Martin Necasky200.34
Petr Skoda3399.56
Jakub Klímek417021.23
Tomás Skopal520220.95