Abstract | ||
---|---|---|
Spreadsheet datasets are valuable sources of data, but often illsuited for machine consumption. Their unstructured nature allows users to arrange data and metadata freely in a human-readable format, often in canvas-like layouts. To extract their content, data practitioners need to resort to manual inspection and run cumbersome preparation pipelines. The Mondrian system assists users in identifying and handling multiregion layout templates: spreadsheet layouts composed of independent regions that appear repeatedly across different files. Mondrian comprises an automated approach to detect multiple regions within a single file and an algorithm that leverages mapping region layouts to graphs to compute layout similarity and identify templates [10]. Users interact with Mondrian through a web-based visual interface, that serves as a practical toolkit to handle collections of multiregion spreadsheets and enables their automated preparation. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1145/3514221.3520152 | PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) |
Keywords | DocType | ISSN |
data preparation, template recognition, multiregion layout | Conference | 0730-8078 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gerardo Vitagliano | 1 | 0 | 1.35 |
Lucas Reisener | 2 | 0 | 0.34 |
Lan Jiang | 3 | 0 | 0.34 |
Mazhar Hameed | 4 | 0 | 1.35 |
Felix Naumann | 5 | 0 | 0.34 |