Abstract | ||
---|---|---|
Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming model that affords unified reasoning and algorithm design. At the logical level we propose LARA, a lean algebra of three operators, that expresses RA and LA as well as relevant optimization rules. We show a series of proofs that position LARA at just the right level of expressiveness for a middleware algebra: more explicit than MapReduce but more general than RA or LA. At the physical level we find that the LARA operators afford efficient implementations using a single primitive that is available in a variety of backend engines: range scans over partitioned sorted maps. To evaluate these ideas, we implemented the LARA operators as range iterators in Apache Accumulo, a popular implementation of Google's BigTable. First we show how LARA expresses a sensor quality control task, and we measure the performance impact of optimizations LARA admits on this task. Second we show that the LARADB implementation outperforms Accumulo's native MapReduce integration on a core task involving join and aggregation in the form of matrix multiply, especially at smaller scales that are typically a poor fit for scale-out approaches. We find that LARADB offers a conceptually lean framework for optimizing mixed-abstraction analytics tasks, without giving up fast record-level updates and scans. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3070607.3070608 | BeyondMR@SIGMOD |
DocType | Volume | Citations |
Conference | abs/1703.07342 | 4 |
PageRank | References | Authors |
0.40 | 26 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dylan Hutchison | 1 | 4 | 0.40 |
Bill Howe | 2 | 1520 | 94.44 |
Dan Suciu | 3 | 9625 | 1349.54 |