Title | ||
---|---|---|
BLOCKSET (Block-Aligned Serialized Trees): Reducing Inference Latency for Tree ensemble Deployment |
Abstract | ||
---|---|---|
ABSTRACTWe present methods to serialize and deserialize gradient-boosted trees and random forests that optimize inference latency when models are not loaded into memory. This arises when models are larger than memory, but also systematically when models are deployed on low-resource devices in the Internet of Things or run as cloud microservices where resources are allocated on demand. Block-Aligned Serialized Trees (BLOCKSET) introduce the concept of selective access for random forests and gradient boosted trees in which only the parts of the model needed for inference are deserialized and loaded into memory. %BLOCKSET combines concepts from external memory algorithms and data-parallel %layouts of random forests that maximize I/O-density for in-memory models. Using principles from external memory algorithms, we block-align the serialization format in order to minimize the number of I/Os. For gradient boosted trees, this results in a more than five time reduction in inference latency over layouts that do not perform selective access and a 2 times latency reduction over techniques that are selective, but do not encode I/O block boundaries in the layout. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3447548.3467368 | Knowledge Discovery and Data Mining |
Keywords | DocType | Citations |
random forest, gradient boosted tree, tree ensemble, block alignment, serialization, efficient inference, IoT, Microservices, locality | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Meghana Madhyastha | 1 | 0 | 0.68 |
kunal lillaney | 2 | 9 | 2.18 |
James Browne | 3 | 0 | 1.01 |
Joshua T. Vogelstein | 4 | 273 | 31.99 |
Randal Burns | 5 | 1955 | 115.15 |