Improved Typesetting Models For Historical Ocr - Citegraph

Paper Info

Title
Improved Typesetting Models For Historical Ocr

Abstract
We present richer typesetting models that extend the unsupervised historical document recognition system of Berg-Kirkpatrick et al. (2013). The first model breaks the independence assumption between vertical offsets of neighboring glyphs and, in experiments, substantially decreases transcription error rates. The second model simultaneously learns multiple font styles and, as a result, is able to accurately track italic and non-italic portions of documents. Richer models complicate inference so we present a new, streamlined procedure that is over 25x faster than the method used by Berg-Kirkpatrick et al. (2013). Our final system achieves a relative word error reduction of 22% compared to state-of-the-art results on a dataset of historical newspapers.

Year	Venue	Field
2014	PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2	Glyph,Computer science,Newspaper,Artificial intelligence,Natural language processing,Recognition system,Inference,Font,Speech recognition,Transcription error,Statistical assumption,Machine learning,Historical document
DocType	Volume	Citations
Conference	P14-2	0
PageRank	References	Authors
0.34	5	2

Authors (2 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Taylor Berg-Kirkpatrick	1	554	35.93
Dan Klein	2	8083	495.21

1