Abstract | ||
---|---|---|
In Chinese, there are many characters which are similar in shape, and this phenomenon usually induces writing errors. As one important issue in spelling automatic correction, shape similarity measurement is still a challenging problem. To address this issue, we propose a component-tree based method in this paper, which is based on the hypothesis "characters are similar if their construction and components are both similar". Firstly, we decompose each character to a tree recursively, in which the root node is the character and the leaf nodes are atomic parts, called strokes. Then, we align any pair of trees using their minimal super-tree and calculate their similarity from bottom to up based on weighted edit distance. Finally, the cognitive prominence is used to adjust the similarity scores. In text proofreading experiments, our method achieved 97% precision and 95.6% recall, which can be applied in practical systems. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-25159-2_26 | Lecture Notes in Artificial Intelligence |
Keywords | Field | DocType |
Shape similarities,Chinese characters components,Cognitive similarity,Automatic text proofreading | Edit distance,Chinese characters,Pattern recognition,Computer science,Spelling,Artificial intelligence,Phenomenon,Machine learning,Recursion | Conference |
Volume | ISSN | Citations |
9403 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 3 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ya-nan Cao | 1 | 131 | 19.42 |
Shi Wang | 2 | 28 | 12.46 |
Cungen Cao | 3 | 309 | 58.63 |