Title
Blind Queries Applied To Json Document Stores
Abstract
Social Media, Web Portals and, in general, information systems offer their own Application Programming Interfaces (APIs), used to provide large data sets concerning every aspect of day-by-day life. APIs usually provide data sets as collections of JSON documents. The heterogeneous structure of JSON documents returned by different APIs constitutes a barrier to effectively query and analyze these data sets. The adoption of NoSQL document stores, such as MongoDB, is useful for gathering these data sets, but does not solve the problem of querying the final heterogeneous repository. The aim of this paper is to provide analysts with a tool, named HammerJDB, that allows for blind querying collections of JSON documents within a NoSQL document database. The idea below is that users may know the application domain but it may be that they are not aware of the real structures of the documents stored in the database-the tool for blind querying tries to bridge the gap, by adopting a query rewriting mechanism. This paper is an evolution of a technique for blind querying Open Data portals and of its implementation within the Hammer framework, presented in some previous work. In this paper, we evolve that approach in order to query a NoSQL document database by evolving the Hammer framework into the HammerJDB framework, which is able to work on MongoDB databases. The effectiveness of the new approach is evaluated on a data set (derived from a real-life one), containing job-vacancy ads collected from European job portals.
Year
DOI
Venue
2019
10.3390/info10100291
INFORMATION
Keywords
Field
DocType
retrieval from NoSql Databases, JSON documents, blind querying, single document extraction
Information system,Open data,Data mining,Data set,Social media,Information retrieval,Computer science,NoSQL,Application domain,Application programming interface,JSON
Journal
Volume
Issue
Citations 
10
10
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Stefania Marrara100.34
Mauro Pelucchi200.34
Giuseppe Psaila3722192.45