Chukwa: a system for reliable large-scale log collection - Citegraph

Paper Info

Title
Chukwa: a system for reliable large-scale log collection

Abstract
Large Internet services companies like Google, Yahoo, and Facebook use the MapReduce programming model to process log data. MapReduce is designed to work on data stored in a distributed filesystem like Hadoop's HDFS. As a result, a number of log collection systems have been built to copy data into HDFS. These systems often lack a unified approach to failure handling, with errors being handled separately by each piece of the collection, transport and processing pipeline. We argue for a unified approach, instead. We present a system, called Chukwa, that embodies this approach. Chukwa uses an end-to-end delivery model that can leverage local on-disk log files for reliability. This approach also eases integration with legacy systems. This architecture offers a choice of delivery models, making subsets of the collected data available promptly for clients that require it, while reliably storing a copy in HDFS. We demonstrate that our system works correctly on a 200-node testbed and can collect in excess of 200 MB/sec of log data. We supplement these measurements with a set of case studies describing real-world operational experience at several sites.

Year	Venue	Keywords
2010	LISA	unified approach,end-to-end delivery model,log collection system,log data,mapreduce programming model,reliable large-scale log collection,delivery model,local on-disk log file,large internet services company,legacy system,case study,scale,logging
Field	DocType	Citations
Architecture,Programming paradigm,Computer science,Testbed,Database,Legacy system,The Internet	Conference	14
PageRank	References	Authors
0.93	22	2

Authors (2 rows)

Cited by (14 rows)

References (22 rows)

Name	Order	Citations	PageRank
Ariel Rabkin	1	1704	73.10
Randy H. Katz	2	16819	3018.89

1