Title
An efficient algorithm for identifying the most contributory substring
Abstract
Detecting repeated portions of strings has important applications to many areas of study including data compression and computational biology. This paper defines and presents a solution for the Most Contributory Substring Problem, which identifies the single substring that represents the largest proportion of the characters within a set of strings. We show that a solution to the problem can be achieved with an O(n) running time (where n is the total number of characters in all of the input strings) when overlapping occurrences of the most contributory substring are permitted. Furthermore, we present an extended algorithm that does not permit occurrences of the most contributory substring to overlap. The expected running time of the extended algorithm is O(n log n) while its worst case performance is O(n2).
Year
DOI
Venue
2007
10.1007/978-3-540-74553-2_25
DaWaK
Keywords
Field
DocType
contributory substring problem,largest proportion,important application,data compression,input string,n log n,single substring,computational biology,efficient algorithm,contributory substring,extended algorithm
Substring,Substring index,Computer science,Algorithm,Binary tree,FM-index,Longest repeated substring problem,Time complexity,Binary search tree,Longest common substring problem
Conference
Volume
ISSN
ISBN
4654
0302-9743
3-540-74552-1
Citations 
PageRank 
References 
0
0.34
10
Authors
1
Name
Order
Citations
PageRank
Ben Stephenson15610.86