Fast access to columnar, hierarchically nested data via code transformation - Citegraph

Paper Info

Title
Fast access to columnar, hierarchically nested data via code transformation

Abstract
Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on hierarchically nested, columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present performance comparisons between transformed code and conventional object-oriented code in a High Energy Physics context.

Year	DOI	Venue
2017	10.1109/BigData.2017.8257933	2017 IEEE International Conference on Big Data (Big Data)
Keywords	Field	DocType
Big data applications,Automatic programming,Data analysis,Scientific computing,High energy physics instrumentation computing	SQL,Procedural programming,Data mining,Data processing,Computer science,Parallel computing,Abstract syntax tree,Compiler,Rewriting,Big data,Nested loop join	Conference
ISSN	ISBN	Citations
2639-1589	978-1-5386-2716-7	0
PageRank	References	Authors
0.34	3	4

Authors (4 rows)

Cited by (0 rows)

References (3 rows)

Name	Order	Citations	PageRank
Jim Pivarski	1	2	0.76
p elmer	2	1	1.79
Brian Bockelman	3	29	7.60
Zhe Zhang	4	55	22.26

1