Full-text search engine with suffix index for massive heterogeneous data - Citegraph

Paper Info

Title
Full-text search engine with suffix index for massive heterogeneous data

Abstract
Existing popular search engines like Elasticsearch (ES) commonly use inverted indices to quickly retrieve source data matching a given set of queries. However, an inverted index may not find all of the matching results from data, particularly those that are hard to be segmented into words, such as data logs and scientific signals. This article presents our innovative technique for a true full-text search system called SAES by replacing the inverted index in ES with the suffix index to guarantee a 100% recall ratio. We designed a distributed suffix index scheme with online building and offline merging capable of scaling with the architecture of ES. The suffix index is dynamically constructed by several suffix array construction tools which adapt to the data size and available computing resources such as CPU cores, RAM, and disk capacities. Furthermore, it can be compacted to provide a trade-off between searching speed and index storage space. An experimental study was conducted to test the functions and performance of single- and multi-node SAES on realistic datasets of texts, logs, genomes, and signals. The systems performed well for both exact and approximate search queries defined on units of bytes or half-bytes. This work provides a feasible reference design for extending ES with suffix index to support true full-text searches over massive heterogeneous data.

Year	DOI	Venue
2022	10.1016/j.is.2021.101893	Information Systems
Keywords	DocType	Volume
Suffix index,Heterogeneous data,Full-text search engine,Elasticsearch	Journal	104
ISSN	Citations	PageRank
0306-4379	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Wentao Xu	1	0	0.34
Haoyu Chen	2	0	0.34
Yidong Huan	3	0	0.34
Xuedong Hu	4	0	0.34
Ge Nong	5	0	0.34

1