Title
Online Algorithms for Finding Distinct Substrings with Length and Multiple Prefix and Suffix Conditions
Abstract
Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers $$k_1$$ and $$k_2$$ be given, as well as a string T given in an online manner, such that $$T_i$$ represents the length-i prefix of T for $$1 \le i \le |T|$$ . In this paper we are interested in computing the set $$ ans_i $$ of distinct substrings w of $$T_i$$ such that $$k_1 \le |w| \le k_2$$ , and w contains some $$p \in P$$ as a prefix and some $$s \in S$$ as a suffix. More specifically, the counting problem is to output $$| ans_i |$$ , whereas the reporting problem is to output all elements of $$ ans_i $$ , for each iteration i. Let $$\sigma $$ denote the alphabet size, and for a sequence of strings A, $$\Vert A\Vert =\sum _{u\in A}|u|$$ . Then, we show that after $$O((\Vert P\Vert +\Vert S\Vert )\log \sigma )$$ -time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in $$O(|T_i| \log \sigma )$$ and $$O(|T_i| \log \sigma + | ans_i |)$$ total time. The preprocessing time can be reduced to $$O(\Vert P\Vert +\Vert S\Vert )$$ for integer alphabets of size polynomial with regard to $$\Vert P\Vert +\Vert S\Vert $$ . Our algorithms have possible applications to network traffic classification.
Year
DOI
Venue
2022
10.1007/978-3-031-20643-6_3
String Processing and Information Retrieval
Keywords
DocType
ISSN
Pattern matching, Counting algorithm, Suffix array, Suffix tree
Conference
0302-9743
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Laurentius Leonard100.34
Shunsuke Inenaga259579.02
Hideo Bannai362079.87
Takuya Mieno400.34