From the Paft to the Fiiture - a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. - Citegraph

Paper Info

Title
From the Paft to the Fiiture - a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction.

Abstract
A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for training a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction.

Year	DOI	Venue
2019	10.26615/978-954-452-056-4_051	RANLP
Field	DocType	ISSN
Computer science,Speech recognition	Conference	Proceedings of Recent Advances in Natural Language Processing. Angelova, G., Mitkov, R., Nikolova, I. & Temnikova, I. (eds.). Shoumen: INCOMA, p. 432-437 6 p (2019)
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mika Hämäläinen	1	0	5.07
Simon Hengchen	2	0	2.37

1