Title
Information Extraction In Handwritten Marriage Licenses Books Using The Mggi Methodology
Abstract
Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
Year
DOI
Venue
2017
10.1007/978-3-319-58838-4_32
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017)
Keywords
Field
DocType
Handwritten Text Recognition, Information extraction, Language modeling, MGGI, Categories-based language model
Data mining,Grammar induction,Marriage license,Pattern recognition,Computer science,Semantic information,Information extraction,Natural language processing,Artificial intelligence,Vocabulary,Proper noun,Language model
Conference
Volume
ISSN
Citations 
10255
0302-9743
1
PageRank 
References 
Authors
0.36
8
4
Name
Order
Citations
PageRank
Verónica Romero125928.31
Alicia Fornés256348.56
E. Vidal345449.15
Joan-Andreu Sánchez419829.00