Title
Tools and strategies for debugging distributed stream processing applications
Abstract
Distributed data stream processing applications are often characterized by data flow graphs consisting of a large number of built-in and user-defined operators connected via streams. These flow graphs are typically deployed on a large set of nodes. The data processing is carried out on-the-fly, as tuples arrive at possibly very high rates, with minimum latency. It is well known that developing and debugging distributed, multi-threaded, and asynchronous applications, such as stream processing applications, can be challenging. Thus, without domain-specific debugging support, developers struggle when debugging distributed applications. In this paper, we describe tools and language support to support debugging distributed stream processing applications. Our key insight is to view debugging of stream processing applications from four different, but related, perspectives. First, debugging the semantics of the application involves verifying the operator-level composition and inspecting the flows at the logical level. Second, debugging the user-defined operators involves traditional source-code debugging, but strongly tied to the stream-level interactions. Third, debugging the deployment details of the application require understanding the runtime physical layout and configuration of the application. Fourth, debugging the performance of the application requires inspecting various performance metrics (such as communication rates, CPU utilization, etc.) associated with streams, operators, and nodes in the system. In light of this characterization, we developed several tools such as a debugger-aware compiler and an associated stream debugger, composition and deployment visualizers, and performance visualizers, as well as language support, such as configuration knobs for logging and tracing, deployment configurations such as operator-to-process and process-to-node mappings, monitoring directives to inspect streams, and special sink adapters to intercept and dump streaming data to files and sockets, to name a few. We describe these tools in the context of Spade—a language for creating distributed stream processing applications, and System S—a distributed stream processing middleware under development at the IBM Watson Research Center. Published in 2009 by John Wiley & Sons, Ltd. This article is a U.S. Government work and is in the public domain in the U.S.A.
Year
DOI
Venue
2009
10.1002/spe.v39:16
Softw., Pract. Exper.
Keywords
DocType
Volume
stream processing application,language support,domain-specific debugging support,traditional source-code debugging,user-defined operator,associated stream debugger,data processing,data stream processing application,stream processing middleware,asynchronous application
Journal
39
Issue
ISSN
Citations 
16
0038-0644
7
PageRank 
References 
Authors
0.50
28
8
Name
Order
Citations
PageRank
Bugra Gedik12397108.79
Henrique Andrade270.50
Andy Frenkiel3201.16
Wim De Pauw440431.73
Michael Pfeifer5201.16
Paul Allen670.50
Norman Cohen770.50
Kun-Lung Wu82849389.90