Title
A Lower Bound for Jumbled Indexing.
Abstract
In this paper we study lower bounds for a variant of jumbled-indexing problem: given an input string S on an alphabet ∑ = {σ1, ...,σλ}, store it in a data structure such that given frequencies f1, ..., fσ, one can find all the substrings S' of S where the frequency of the character σi is fi, for 1 ≤ i ≤ λ. This is a very interesting and challenging text indexing problem and it has received significant attention lately, both in the string-indexing community as well as in the theory community. It is known that when λ = 2, one can use a linear-size data structure to decide whether there exists a substring that matches the query in constant time [10]. It is also known [1] that for constant λ ≥ 3, then there exists a constant αλ such that it is not possible to achieve both O(n2−αλ preprocessing time and O(n1−αλ) query time. We study a variant of the problem where the goal is to report all the substrings of S that match a given jumbled-indexing query. Assuming the data structure operates in the pointer machine model, we prove unconditional space lower bounds that also apply to binary alphabets: we show that if the data structure has the query time of O(n0.5−o(1) + k), where k is the output size of the query, then it must consume Ω(n2−o(1)) space. The o(1) term in our lower bound is at most [MATH HERE], where log(·) notation denotes the iterative log function (i.e., log(10)n = log log log log log log log log log log n). This result has a number of interesting consequences. First, it shows that reporting all the matches is significantly harder than deciding whether a match exists, at least for the binary alphabet. This was surprising for us since we believed that in order for jumbled-indexing to be difficult, the alphabet size must be at least 3. Second, from a technical point of view, we make connections between this problem and the area of additive combinatorics as well as random walks. Previously, Chan and Lewenstein [6] had shown the connections between this problem and the area of additive combinatorics from an upper bound point of view, however, we use fundamental theorems from additive combinatorics but these tools are quite different from the theorems used in [6]. In our opinion, the fact that additive combinatorics appears in our lower bound approach as well as in the upper bound approach of Chan and Lewenstein [6] is noteworthy and demands further investigation. We believe our results open up a few new venues for research. For example, we believe it is interesting to study whether it is possible to build a data structure that uses O(n) space s.t., it can report all the matches to a jumble-indexing query in O(n2/3 + k) time; this is even interesting for a binary alphabet.
Year
DOI
Venue
2020
10.5555/3381089.3381125
SODA '20: ACM-SIAM Symposium on Discrete Algorithms Salt Lake City Utah January, 2020
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Peyman Afshani125623.84
Ingo van Duijn201.01
Rasmus Killmann300.34
Jesper Sindahl Nielsen4113.26