Title
Using the MGGI Methodology for Category-Based Language Modeling in Handwritten Marriage Licenses Books
Abstract
Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.
Year
DOI
Venue
2016
10.1109/ICFHR.2016.0069
2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)
Keywords
Field
DocType
Handwritten Text Recongition,Information extraction,Language modeling,MGGI,Categories-based language model
Grammar induction,Computer science,Information extraction,Natural language processing,Artificial intelligence,Hidden Markov model,Vocabulary,Proper noun,Text recognition,Semantics,Language model,Machine learning
Conference
ISSN
ISBN
Citations 
2167-6445
978-1-5090-0982-4
1
PageRank 
References 
Authors
0.39
8
4
Name
Order
Citations
PageRank
Verónica Romero125928.31
Alicia Fornés256348.56
E. Vidal345449.15
Joan-Andreu Sánchez419829.00