Bigdata

Now Reading

Apache Kudu

Next
Prev

Review

Apache Kudu

Overview

Synopsis

Category

Column-oriented DBMS

Features

• In-memory columnar execution path
• Advanced in-process tracing capabilities
• Extensive metrics support
• Watchdog threads which check for latency outliers
• Columnar storage allows efficient encoding and compression
• Lazy data materialization and predicate pushdown

License

• Open source

Price

• Open source

Pricing

Subscription

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)

Company

Apache Kudu

What is best?

• In-memory columnar execution path
• Advanced in-process tracing capabilities
• Extensive metrics support
• Watchdog threads which check for latency outliers

What are the benefits?

• Strong performance for running sequential and random workloads simultaneously
• High availability: Reads can be serviced by read-only follower tablets, even in the event of a leader tablet failure
• Data Compression: Fulfill queries while reading even fewer blocks from disk
• Integrated: Take advantage of the broader Hadoop ecosystem

PAT Rating™

Editor Rating

Aggregated User Rating

Rate Here

Ease of use

7.6

6.6

Features & Functionality

7.6

7.1

Advanced Features

7.6

7.3

Integration

7.6

7.4

Performance

7.6

7.7

Customer Support

7.6

—

Implementation

—

Renew & Recommend

—

Bottom Line

Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem.

7.6

Editor Rating

7.2

Aggregated User Rating

6 ratings

You have rated this

Kudu is a columnar storage manager developed for the Apache Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation. Kudu internally organizes its data by column rather than row. Columnar storage allows efficient encoding and compression. With techniques such as run-length encoding, differential encoding, and vectorized bit-packing, Kudu is as fast at reading the data as it is space-efficient at storing it. Columnar storage also dramatically reduces the amount of data IO required to service analytic queries. Using techniques such as lazy data materialization and predicate pushdown, Kudu can perform drill-down and needle-in-a-haystack queries over billions of rows and terabytes of data in seconds. Kudu is implemented in C++, so it can scale easily to large amounts of memory per node. With an in-memory columnar execution path, Kudu achieves good instruction-level parallelism using SIMD operations from the SSE4 and AVX instruction sets. Kudu gives architects the flexibility to address a wider variety of use cases without exotic workarounds. Kudu is designed to excel at use cases that require the combination of random reads / writes and the ability to do fast analytic scans—which previously required the creation of complex Lambda architectures. Engineered to take advantage of next-generation hardware and in-memory processing, Kudu lowers query latency significantly for Apache Impala (incubating) and Apache Spark. Kudu targets support for families of applications that are difficult or impossible to implement on current generation Hadoop storage technologies.

Filter reviews