Title
Approximate string matching by position restricted alignment
Abstract
Given a collection of strings, goal of the approximate string matching is to efficiently find the strings in the collection that are similar to a query string. In this paper, we focus on edit distance as measure to quantify the similarity between two strings. Existing q-gram based methods to address this problem use inverted indexes to index the q-grams of given string collection. These methods begin by generating the q-grams of query string (disjoint or overlapping) and then merge the inverted lists of these q-grams. Several filtering techniques have been proposed so as to segment inverted lists to relatively shorter lists thus reducing the merging cost. We use a filtering technique which we call as \"position restricted alignment\" that combines well known length filtering and position filtering to provide more aggressive pruning. We then provide an indexing scheme that integrates the inverted lists storage with the proposed filter thus enabling us to auto-filter the inverted lists. We evaluate the effectiveness of the proposed approach by thorough experimentation.
Year
DOI
Venue
2013
10.1145/2457317.2457388
EDBT/ICDT Workshops
Keywords
Field
DocType
approximate string matching,aggressive pruning,inverted lists storage,problem use inverted index,proposed filter,inverted list,string collection,segment inverted list,query string,string similarity,edit distance
String searching algorithm,Edit distance,String-to-string correction problem,Data mining,Query string,Disjoint sets,Computer science,Algorithm,Search engine indexing,Approximate string matching,String metric
Conference
Citations 
PageRank 
References 
2
0.38
21
Authors
6
Name
Order
Citations
PageRank
Manish Patil1717.46
Xuanting Cai220.38
Sharma V. Thankachan328941.02
Rahul Shah4105961.31
Seung-Jong Park531931.12
David Foltz620.38