Abstract | ||
---|---|---|
The accuracy of a typical state-of-the-art optical character recognition (OCR) system benefits greatly from using a language model (LM). However, a conventional LM has a limited vocabulary, resulting in out-of-vocabulary (OOV) words that cannot be recognized by the OCR system. In this paper, we present an open vocabulary OCR system based on a hybrid LM. The vocabulary of the hybrid LM consists of both words and subwords. OOV words can be generated by combinations of subwords. A refined hybrid LM training scheme is applied by interpolating a standard hybrid LM, a word-based LM and a subword-based LM. An efficient word combination method is performed by modeling optional space symbols in a decoding network. The overall system deals with OOV words in a general, data-driven and language-independent way. We conduct experiments on an English handwriting OCR task. Evaluations on three testing sets demonstrate that the OCR system with the proposed method achieves a word error rate of 33.4% on an OOV-only testing set, yet without degrading the recognition accuracies on the other two testing sets mainly consisting of in-vocabulary words. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/ICDAR.2017.91 | 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) |
Keywords | Field | DocType |
open vocabulary OCR system,hybrid word-subword language models,language model,out-of-vocabulary,OOV words,refined hybrid LM training scheme,standard hybrid LM,efficient word combination method,English handwriting OCR task,word error rate,in-vocabulary words,optical character recognition system | Hybrid word,Task analysis,Handwriting,Pattern recognition,Computer science,Word error rate,Optical character recognition,Speech recognition,Artificial intelligence,Decoding methods,Vocabulary,Language model | Conference |
Volume | ISSN | ISBN |
01 | 1520-5363 | 978-1-5386-3587-2 |
Citations | PageRank | References |
2 | 0.39 | 0 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Meng Cai | 1 | 68 | 8.24 |
Wenping Hu | 2 | 82 | 6.77 |
Kai Chen | 3 | 71 | 5.38 |
Lei Sun | 4 | 18 | 3.40 |
Sen Liang | 5 | 8 | 1.21 |
Xiongjian Mo | 6 | 3 | 0.77 |
Qiang Huo | 7 | 1098 | 99.69 |