Title
Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData
Abstract
This tutorial is organized in two parts. In the first half, we will present an overview of applications and services in the BigData ecosystem. We will use known distributed database and systems literature as landmarks to orient the attendees in this fast-evolving space. Throughout, we will contrast models of resource management, performance, and the constraints that shape the architectures of prominent systems. We will also discuss the role of academia and industry in the development of open-source infrastructure, with an emphasis on open problems and strategies for collaboration. We assume only basic familiarity with distributed systems. In the second half, we will delve into Apache Hadoop YARN. YARN (Yet Another Resource Negotiator) transformed Hadoop from a MapReduce engine to a general-purpose cluster scheduler. Since its introduction, it has been deployed in production and extended to support use cases beyond large-scale batch processing. The tutorial will present the active research and development supporting such heterogeneous workloads, with particular attention to multi-tenant scheduling. Topics include security, resource isolation, protocols, and preemption. This portion will be detailed, but accessible to anyone with a background in distributed systems and all attendees of the first half of the tutorial.
Year
DOI
Venue
2015
10.1109/ICDE.2015.7113417
Data Engineering
Keywords
Field
DocType
big data,batch processing (computers),data handling,distributed databases,parallel processing,public domain software,apache hadoop yarn,bigdata ecosystem,mapreduce engine,distributed database,general-purpose cluster scheduler,large-scale batch processing,multitenant scheduling,open-source,resource management,yet another resource negotiator,databases,engines,ecosystems
Resource management,Data mining,Preemption,Use case,Yarn,Computer science,Scheduling (computing),Distributed database,Big data,Database,Negotiation
Conference
ISSN
Citations 
PageRank 
1084-4627
0
0.34
References 
Authors
36
2
Name
Order
Citations
PageRank
Chris Douglas166723.01
Carlo Curino2201290.35