Title
Mondrian: Spreadsheet Layout Detection
Abstract
Spreadsheet datasets are valuable sources of data, but often illsuited for machine consumption. Their unstructured nature allows users to arrange data and metadata freely in a human-readable format, often in canvas-like layouts. To extract their content, data practitioners need to resort to manual inspection and run cumbersome preparation pipelines. The Mondrian system assists users in identifying and handling multiregion layout templates: spreadsheet layouts composed of independent regions that appear repeatedly across different files. Mondrian comprises an automated approach to detect multiple regions within a single file and an algorithm that leverages mapping region layouts to graphs to compute layout similarity and identify templates [10]. Users interact with Mondrian through a web-based visual interface, that serves as a practical toolkit to handle collections of multiregion spreadsheets and enables their automated preparation.
Year
DOI
Venue
2022
10.1145/3514221.3520152
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)
Keywords
DocType
ISSN
data preparation, template recognition, multiregion layout
Conference
0730-8078
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Gerardo Vitagliano101.35
Lucas Reisener200.34
Lan Jiang300.34
Mazhar Hameed401.35
Felix Naumann500.34