Title
A Probabilistic Approach to String Transformation
Abstract
Many problems in natural language processing, data mining, information retrieval, and bioinformatics can be formalized as string transformation, which is a task as follows. Given an input string, the system generates the \($\) most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top \($\) candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top \($\) candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings.
Year
DOI
Venue
2014
10.1109/TKDE.2013.11
IEEE Transactions on Knowledge and Data Engineering
Keywords
Field
DocType
spelling,string transformation,query reformulation,parameter estimation,learning (artificial intelligence),maximum likelihood estimation,probabilistic approach,information retrieval,information technology and systems,document and text editing,web search queries,string generation algorithm,document and text processing,log linear model,information storage and retrieval,data mining,computing methodologies,natural language processing,learning method,output strings,spelling error correction,conditional probability distribution,bioinformatics,information search and retrieval,query formulation,query processing,probability,input string,learning artificial intelligence,probabilistic logic,dictionaries,error correction,accuracy,indexes,training data
String searching algorithm,Data mining,String generation,Commentz-Walter algorithm,Computer science,Artificial intelligence,Probabilistic logic,String kernel,String metric,String (computer science),Boyer–Moore string search algorithm,Machine learning
Journal
Volume
Issue
ISSN
26
5
1041-4347
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Ziqi Wang1474.63
Gu Xu242617.90
Hang Li36294317.05
Ming Zhang41963107.42