Title
Comparing "parallel passages" in digital archives.
Abstract
Purpose The purpose of this paper is to present a language-agnostic approach to facilitate the discovery of "parallel passages" stored in historic and cultural heritage digital archives. Design/methodology/approach The authors explore a novel, and relatively simple approach, using a character-based statistical language model combined with a tailored version of the Basic Local Alignment Tool to extract exact and approximate string patterns shared between groups of documents. Findings The approach is applicable to a wide range of languages, and compensates for variability in the text of the documents as a result of differences in dialect, authorship, language change over time and errors due to inaccurate transcriptions and optical character recognition errors as a result of the digitisation process. Originality/value The approach is novel and addresses a need by humanities researchers for tools that can identify similar documents and local similarities represented by shared text sequences in a potentially vast large archive of documents. As far as the authors are aware, there are no tools currently exist that provide the same level of tolerance to the language of the documents.
Year
DOI
Venue
2020
10.1108/JD-10-2018-0175
JOURNAL OF DOCUMENTATION
Keywords
Field
DocType
Digital libraries,Computer applications,Archives,Linguistics,Probabilistic analysis,Language and literature
Information retrieval,Computer science,Probabilistic analysis of algorithms,Computer Applications,Digital Archives,Digital library
Journal
Volume
Issue
ISSN
76.0
1.0
0022-0418
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Martyn Harris100.34
Mark Levene21272252.84
Dell Zhang3106157.54
Dan Levene411.02