Business Intelligence
Now Reading
Apache Crunch
0
Review

Apache Crunch

Overview
Synopsis

The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.

Category

ETL Software Free

Features

Multi-faceted
Easy to use
Integrated
Supports various WriteModes

License

Proprietary Software

Price

Contact for Pricing

Pricing

Subscription

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)

Company

Apache Crunch

PAT Rating™
Editor Rating
Aggregated User Rating
Rate Here
Ease of use
8.4
8.5
Features & Functionality
8.4
7.1
Advanced Features
8.5
8.4
Integration
8.5
8.5
Performance
8.6
5.0
Customer Support
8.6
8.0
Implementation
8.4
Renew & Recommend
Bottom Line

Apache Crunch library is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce. The APIs are especially useful when processing data that does not fit naturally into relational model, such as time series, serialized object formats like protocol buffers or Avro records, and HBase rows and columns.

8.5
Editor Rating
7.7
Aggregated User Rating
2 ratings
You have rated this

The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. It runs on top of the Hadoop MapReduce and Apache Spark, and its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.Crunch supports different output options via the WriteMode menu, which can be passed along with a Target to the write method on either PCollection or Pipeline.

Listed below are the some of the supported WriteModes:Many of the most common aggregation patterns in Crunch are provided as methods on the PCollection interface, including count, max, min, and length.

The implementations of these methods, however, are in the Aggregate library class. The methods in the Aggregate class expose some additional options that perform aggregations, such as controlling the level of parallelism for count operations.Joins in Crunch are based on equal-valued keys in different PTables.

They have also evolved a great deal in Crunch over the lifetime of the project. The Join API provides simple methods for performing equijoins, left joins, right joins, and full joins. However, modern Crunch joins are usually performed using an explicit implementation of the JoinStrategy interface, which has support for the same rich set of joins that you can use in tools like Apache Hive and Apache Pig.After joins and cogroups, sorting data is the most common distributed computing pattern.

The Crunch APIs have several utilities for performing fully distributed sorts and more advanced patterns such as secondary sorts.Many MapReduce jobs can generate many small files that could be used more effectively by clients if they were all merged together into smaller large files. The Shard API allows users to coalesce a given PCollection into a few partitions.

You may like to read: Top Extract, Transform, and Load, ETL Software, How to Select the Best ETL Software for Your Business and Top Guidelines for a Successful Business Intelligence Strategy

Filter reviews
User Ratings





User Company size



User role





User industry





Ease of use
Features & Functionality
Advanced Features
Integration
Performance
Customer Support
Implementation
Renew & Recommend

What's your reaction?
Love It
0%
Very Good
0%
INTERESTED
0%
COOL
0%
NOT BAD
0%
WHAT !
0%
HATE IT
0%