Open dataset discovery using context-enhanced similarity search - Citegraph

Paper Info

Title
Open dataset discovery using context-enhanced similarity search

Abstract
Today, open data catalogs enable users to search for datasets with full-text queries in metadata records combined with simple faceted filtering. Using this combination, a user is able to discover a significant number of the datasets relevant to a user's search intent. However, there still remain relevant datasets that are hard to find because of the enormous sparsity of their metadata (e.g., several keywords). As an alternative, in this paper, we propose an approach to dataset discovery based on similarity search over metadata descriptions enhanced by various semantic contexts. In general, the semantic contexts enrich the dataset metadata in a way that enables the identification of additional relevant datasets to a query that could not be retrieved using just the keyword or full-text search. In experimental evaluation we show that context-enhanced similarity retrieval methods increase the findability of relevant datasets, improving thus the retrieval recall that is critical in dataset discovery scenarios. As a part of the evaluation, we created a catalog-like user interface for dataset discovery and recorded streams of user actions that served us to create the ground truth. For the sake of reproducibility, we published the entire evaluation testbed.

Year	DOI	Venue
2022	10.1007/s10115-022-01751-z	KNOWLEDGE AND INFORMATION SYSTEMS
Keywords	DocType	Volume
Dataset, Discovery, Search, Similarity, Evaluation, Context	Journal	64
Issue	ISSN	Citations
12	0219-1377	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
David Bernhauer	1	0	0.34
Martin Necasky	2	0	0.34
Petr Skoda	3	39	9.56
Jakub Klímek	4	170	21.23
Tomás Skopal	5	202	20.95

1