Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing - Citegraph

Paper Info

Title
Making Two Vast Historical Manuscript Collections Searchable and Extracting Meaningful Textual Features Through Large-Scale Probabilistic Indexing

Abstract
Textual access to large collections of digitized images remains unfeasible because usually they lack transcripts. Transcribing such collections is in turn typically unattainable in terms of costs. However, the use of probabilistic indices can facilitate textual accessing with only moderate demands of resources. Besides allowing effortless information retrieval, it will be shown that probabilistic indices can also be used to estimate textual features of the indexed but otherwise untranscribed collections, such as running words and Zipf's curves. Complete probabilistic indices have been recently produced for two iconic large collections: "Bentham" (90K images) and "Spanish Golden Age Theater" (40K images). To show the repercussion of making these collections searchable, we provide accessing statistics gathered through their corresponding search interfaces. To the best of our knowledge this is the first publication of large collections of untranscribed manuscripts which are now publicly accessible for effective and efficient textual access.

Year	DOI	Venue
2019	10.1109/ICDAR.2019.00026	2019 International Conference on Document Analysis and Recognition (ICDAR)
Keywords	Field	DocType
search on large historical manuscript collections,probabilistic indexing and search,Zipf's law,keyword spotting,handwritten text	Transcription (linguistics),Zipf's law,Information retrieval,Pattern recognition,Computer science,Search engine indexing,Keyword spotting,Artificial intelligence,Probabilistic logic	Conference
ISSN	ISBN	Citations
1520-5363	978-1-7281-3015-6	0
PageRank	References	Authors
0.34	7	4

Authors (4 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Alejandro Héctor Toselli	1	0	0.34
Verónica Romero-Gomez	2	0	0.34
Joan-Andreu Sánchez	3	198	29.00
Enrique Vidal	4	1096	85.46

1