Title
Reconstituting typeset Marriage Registers using simple software tools.
Abstract
In a world of fully integrated software applications, which can seem daunting to develop and to maintain, it is sometimes useful to recall that a system of loosely-linked software components can provide surprisingly powerful and flexible methods for software development.This paper describes a project which aims to re-typeset a series of volumes from the Phillimore Marriage Registers, first published in England around the turn of the last century. The source material is plain text derived from running Optical Character Recognition (OCR) on a set of page scans taken from the original printed volumes. The regular, tabular, structure of the Register pages allows us to automate the re-typesetting process.The UNIX troff software and its tbl preprocessor are used for the typesetting itself, but a series of simple awk-based software tools, all of them parsers and code generators of one sort or another, is used to bring about the OCR-to-troff transformation.By re-parsing the generated troff codes it is possible to produce a surname index as a supplement to the re-typeset volume. Moreover, this second-stage parsing has been invaluable in discovering subtle `typos' in the automatically generated material. With small adjustments to this parser it would be possible to output the complete marriage entries in standard XML or GEDCOM notations.
Year
DOI
Venue
2012
10.1007/s00450-010-0145-x
Computer Science - R&D
Keywords
Field
DocType
gedcom notation,simple software tool,re-typesetting · ocr · troff · parsing · genealogy · hyperlinking · indexing,ocr-to-troff transformation,software development,re-typeset volume,loosely-linked software component,simple awk-based software tool,source material,typeset marriage,troff code,unix troff software,integrated software application
AWK,Programming language,Computer science,Parallel computing,Optical character recognition,Software,Plain text,Parsing,Component-based software engineering,troff,Software development
Journal
Volume
Issue
ISSN
27
2
1865-2042
Citations 
PageRank 
References 
1
0.48
3
Authors
1
Name
Order
Citations
PageRank
David F. Brailsford114029.45