Title
Discriminative Optimization of String Similarity and Its Application to Biomedical Abbreviation Clustering
Abstract
Many string similarity measures have been developed to deal with the variety of expressions in natural language texts. With the abundance of such measures, we should consider the choice of measures and its parameters to maximize the performance for a given task. During our preliminary experiment to find the best measure and its parameters for the task of clustering terms to improve our abbreviation dictionary in life science, we found that chemical names had different characteristics in their character sequences compared to other terms. Based on the observation, we experimented with four string similarity measures to test the hypothesis, 聛gchemical names has a different morphology, thus computation of their similarity should be differed from that of other terms.聛h The experimental results show that the edit distance is the best for chemical names, and that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
Year
DOI
Venue
2011
10.1109/ICMLA.2011.58
ICMLA
Keywords
Field
DocType
bioinformatics,natural language processing,optimisation,pattern clustering,text analysis,abbreviation dictionary,biomedical abbreviation clustering,chemical names,discriminative optimization,edit distance,life science,natural language texts,nonchemical names,string similarity measures,term clustering,String Similarity Measure,Term Clustering
Edit distance,Pattern recognition,Expression (mathematics),Computer science,Chemical nomenclature,Natural language,Natural language processing,Artificial intelligence,Cluster analysis,String metric,Discriminative model,Computation
Conference
Volume
Citations 
PageRank 
2
0
0.34
References 
Authors
10
5
Name
Order
Citations
PageRank
Atsuko Yamaguchi114916.11
Yasunori Yamamoto214322.77
Jin-Dong Kim3170592.21
Toshihisa Takagi4858102.84
Akinori Yonezawa51613226.97