Title | ||
---|---|---|
From the Paft to the Fiiture - a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. |
Abstract | ||
---|---|---|
A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction. |
Year | DOI | Venue |
---|---|---|
2019 | 10.26615/978-954-452-056-4_051 | RANLP |
Field | DocType | ISSN |
Computer science,Speech recognition | Conference | Proceedings of Recent Advances in Natural Language Processing.
Angelova, G., Mitkov, R., Nikolova, I. & Temnikova, I. (eds.). Shoumen:
INCOMA, p. 432-437 6 p (2019) |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mika Hämäläinen | 1 | 0 | 5.07 |
Simon Hengchen | 2 | 0 | 2.37 |