Title | ||
---|---|---|
One Step Is Not Enough: A Multi-Step Procedure For Building The Training Set Of A Query By String Keyword Spotting System To Assist The Transcription Of Historical Document |
Abstract | ||
---|---|---|
Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set. |
Year | DOI | Venue |
---|---|---|
2020 | 10.3390/jimaging6100109 | JOURNAL OF IMAGING |
Keywords | DocType | Volume |
keyword spotting, assisted transcription, handwritten documents, training set, automatic document processing, historical documents, digital transformation, cultural heritage | Journal | 6 |
Issue | ISSN | Citations |
10 | 2313-433X | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Antonio Parziale | 1 | 25 | 5.66 |
Giuliana Capriolo | 2 | 0 | 0.34 |
Angelo Marcelli | 3 | 139 | 32.42 |