Abstract: Digitization and Search: A Non-Traditional Use of HPC - Citegraph

Paper Info

Title
Abstract: Digitization and Search: A Non-Traditional Use of HPC

Abstract
We describe our efforts to provide a form of automated search of handwritten content for digitized document archives. To carry out the search we use a computer vision technique called word spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive three computationally expensive pre-processing steps are required. We augment this automated portion of the process with a passive crowd sourcing element that mines queries from the systems users in order to then improve the results of future queries. We benchmark the proposed framework on 1930s Census data, a collection of roughly 3.6 million forms and 7 billion individual units of information.

Year	DOI	Venue
2012	10.1109/SC.Companion.2012.259	High Performance Computing, Networking, Storage and Analysis
Keywords	Field	DocType
search capability,query image,billion individual unit,handwritten text,non-traditional use,census data,image retrieval,automated portion,million form,handwritten content,automated search,information retrieval systems,computer vision,big data,parallel processing	Data mining,Automatic image annotation,Query expansion,Information retrieval,Computer science,Full text search,Image retrieval,Document retrieval,Concept search,Content-based image retrieval,Visual Word	Conference
ISBN	Citations	PageRank
978-1-4673-6218-4	0	0.34
References	Authors
2	5

Authors (5 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
Liana Diesendruck	1	12	3.60
Luigi Marini	2	85	14.61
Rob Kooper	3	1234	235.10
Mayank Kejriwal	4	39	11.73
Kenton McHenry	5	54	11.15

1