Title
Representing Standard Text Formulations as Directed Graphs
Abstract
In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora.
Year
DOI
Venue
2021
10.1007/978-3-030-86159-9_34
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II
Keywords
DocType
Volume
Graph-based text representations, Legal writings, Standardised formulation
Conference
12917
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Frieda Josi100.68
Christian Wartena218620.06
Ulrich Heid319040.48