10 Bigdata Benchmark Suites
Big data Benchmark Suites include micro benchmarks, component benchmarks and application level benchmarks. Micro benchmarks are used to evaluate low level system operations, component benchmarks are used for evaluation of high level function and application benchmarks measure the system for application performance. The architecture, systems, and data management makes the big data systems and architecture complex and it is required to measure, compare, and evaluate these systems. This involves measuring and comparing big data systems and architecture.
Big data Benchmark Suites : HiBench, AMP Benchmark, BigDataBench, Yahoo! Cloud Serving Benchmark, GridMix, CloudSuite, SWIM, TPC Express Benchmark, PUMA Benchmark Suite, LinkBench are some of the Bigdata Benchmark Suites in no particular order.
Top Bigdata Benchmark Suites
HiBench suite contains 10 typical micro workloads. This benchmark suite also has options for users to enable input/output compression for most workloads with default compression codec (zlib)
AMP benchmark measures response time on a handful of relational queries: scans, aggregations, joins, and UDF's, across different data sizes. It is used for quantitative and qualitative comparisons of five systems: Redshift, Hive , Shark, Impala and Stinger/Tez .These systems have very different sets of capabilities. MapReduce-like systems (Shark/Hive) target flexible and large-scale computation, supporting complex User Defined Functions (UDF's), tolerating failures, and scaling to thousands of nodes. Traditional MPP databases are strictly SQL compliant and heavily optimized for relational queries. The workload here is simply one set of queries that most of these systems these can complete.
BigDataBench 3.1 includes 14 real-world data sets and 33 big data workloads, covering the data types, including structured, semi-structured, and unstructured data, and different data sources, including text, graph, image, audio, video and table data.
4.Yahoo! Cloud Serving Benchmark
Yahoo Cloud Serving Benchmark (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different "key-value" and "cloud" serving stores.
GridMix is a benchmark for Hadoop clusters. It submits a mix of synthetic jobs, modeling a profile mined from production loads and there are three versions of the GridMix tool.
CloudSuite is a benchmark suite for emerging scale-out applications. The second release consists of eight applications that have been selected based on their popularity in today's datacenters. The benchmarks are based on real-world software stacks and represent real-world setups.
SWIM enables rigorous performance measurement of MapReduce systems. SWIM contains suites of workloads of thousands of jobs, with complex data, arrival, and computation patterns. This represents an advance over previous MapReduce pseudo-benchmarks of limited diversity and scope. SWIM informs both highly targeted, workload specific optimizations, as well as designs that intend to bring general benefit.
8.TPC Express Benchmark
TPC Express Benchmark HS (TPCx-HS) was developed to provide an objective measure of hardware, operating system and commercial Apache Hadoop File System API compatible software distributions, and to provide the industry with verifiable performance, price-performance and availability metrics.
9.PUMA Benchmark Suite
PUMA Benchmark Suite represents a broad range of "real-world" MapReduce applications exhibiting application characteristics with high/low computation and high/low shuffle volumes
LinkBench is a database benchmark developed to evaluate database performance for workloads similar to those of Facebook's production MySQL deployment. LinkBench is highly configurable and extensible.