Title
Optimal string clustering based on a Laplace-like mixture and EM algorithm on a set of strings
Abstract
In this study, we address the problem of clustering string data in an unsupervised manner by developing a theory of a mixture model and an EM algorithm for strings based on probability theory on a topological monoid of strings developed in our previous studies. We begin with introducing a parametric probability distribution on a set of strings, which has location and dispersion parameters of a string and positive real number. We develop an iteration algorithm for estimating the parameters of the mixture model of the distributions introduced and demonstrate that our algorithm converges to the EM algorithm, which cannot be explicitly written for this mixture model, with probability one and strongly consistently estimates its parameters as the numbers of observed strings and iterations increase. We finally derive a procedure for unsupervised string clustering that is asymptotically optimal in the sense that the posterior probability of making correct classifications is maximized.
Year
DOI
Venue
2019
10.1016/j.jcss.2019.07.003
Journal of Computer and System Sciences
Keywords
Field
DocType
Unsupervised string clustering,Topological monoid of strings,Probability theory,Statistical asymptotics,Convergence of a sequence of algorithms
String searching algorithm,Likelihood function,Laplace distribution,Expectation–maximization algorithm,Almost surely,Statistics,String kernel,String metric,Mathematics,Mixture model
Journal
Volume
ISSN
Citations 
106
0022-0000
0
PageRank 
References 
Authors
0.34
11
3
Name
Order
Citations
PageRank
hitoshi koyano101.69
Morihiro Hayashida215421.88
Tatsuya Akutsu32169216.05