Title | ||
---|---|---|
Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources. |
Abstract | ||
---|---|---|
Lack of data can be an issue when beginning a new study on historical handwritten documents. In order to deal with this, we present the character-based decoder part of a multilingual approach based on transductive transfer learning for a historical handwriting recognition task on Italian Comedy Registers. The decoder must build a sequence of characters that corresponds to a word from a vector of letter-ngrams. As learning data, we created a new dataset from untapped resources that covers the same domain and period of our Italian Comedy data, as well as resources from common domains, periods, or languages. We obtain a 97.42% Character Recognition Rate and a 86.57% Word Recognition Rate on our Italian Comedy data, despite a lexical coverage of 67% between the Italian Comedy data and the training data. These results show that an efficient system can be obtained by a carefully selecting the datasets used for the transfer learning. |
Year | Venue | Field |
---|---|---|
2018 | COLING | Training set,Transduction (machine learning),Scarcity,Character recognition,Computer science,Comedy,Word recognition,Transfer of learning,Handwriting recognition,Natural language processing,Artificial intelligence |
DocType | Volume | Citations |
Conference | C18-1 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Adeline Granet | 1 | 0 | 1.69 |
Emmanuel Morin | 2 | 42 | 16.13 |
Harold Mouchère | 3 | 107 | 14.46 |
Solen Quiniou | 4 | 71 | 9.97 |
Christian Viard-Gaudin | 5 | 444 | 46.20 |