Title
Introducing Tpcx-Hs: The First Industry Standard For Benchmarking Big Data Systems
Abstract
The designation Big Data has become a mainstream buzz phrase across many industries as well as research circles. Today many companies are making performance claims that are not easily verifiable and comparable in the absence of a neutral industry benchmark. Instead one of the test suites used to compare performance of Hadoop based Big Data systems is the TeraSort. While it nicely defines the data set and tasks to measure Big Data Hadoop systems it lacks a formal specification and enforcement rules that enable the comparison of results across systems. In this paper we introduce TPCx-HS, the industry's first industry standard benchmark, designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload defined in TeraSort with formal rules for implementation, execution, metric, result verification, publication and pricing. It can be used to asses a broad range of system topologies and implementation methodologies of Big Data Hadoop systems in a technically rigorous and directly comparable and vendorneutral manner.
Year
DOI
Venue
2014
10.1007/978-3-319-15350-6_1
PERFORMANCE CHARACTERIZATION AND BENCHMARKING: TRADITIONAL TO BIG DATA
Keywords
Field
DocType
TPC, Big Data, Industry standard, Benchmark
Software engineering,Workload,Computer science,Formal specification,Network topology,Software,Verifiable secret sharing,Big data,Benchmarking,Marketing buzz
Conference
Volume
ISSN
Citations 
8904
0302-9743
7
PageRank 
References 
Authors
0.59
3
7
Name
Order
Citations
PageRank
Raghunath Othayoth Nambiar120216.60
Meikel Poess269664.64
Akon Dey3414.20
Paul Cao4101.06
Tariq Magdon-Ismail570.92
Da-Qi Ren682.69
Andrew Bond780.95