Cascading is a proven application development platform for building Big Data applications on Apache Hadoop. Whether solving simple or complex data problems, Cascading balances an optimal level of abstraction with the necessary degrees of freedom through a computation engine, systems integration framework, data processing and scheduling capabilities.Uniquely, the platform offers Hadoop development teams portability. As new, more interesting, compute fabrics are developed, teams will need the ability to move existing applications without incurring the cost to rewrite them. With Cascading, it is simply a matter of changing a few lines of code and a Cascading [...]
The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. It runs on top of the Hadoop MapReduce and Apache Spark, and its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.Crunch supports different output options via the WriteMode menu, which can be passed along with a Target to the write method on either PCollection or Pipeline. Listed below are the some of the supported WriteModes:Many of the most common aggregation patterns in Crunch are provided as methods on the PCollection interface, including count, max, min, and length. [...]
Apatar is an innovative and powerful suite of software tools designed to provide extraordinary productivity benefits to those organizations that need to move data in and out of different sources. Applications include data warehousing, data migration, synchronization and application integration.Apatar can be used anywhere users have data in diverse databases or applications that need to be captured in a new application, data warehouse, or presented with a single front end. Additionally, Apatar is user-friendly, and for even a non-technical user it would take just a couple of hours to get trained.Training to perform complex transformations may take up to [...]
Business-Insight distributes a free version of Anatella to promote their main suite of software tool: The “TIMi suite” which includes Anatella, TIMi and Stardust.Anatella was developed with some unique set of functionalities that allow users to dramatically reduce the time required to develop new data transformations. Developing new scripts with Anatella is usually a lot shorter (from ½ to 1/10 of the time) than to develop the equivalent transformations using any competitor tool.The Anatella integrated-development-environment (IDE) is based on a unique hybrid technology. Using, creating and debugging new data-manipulation-scripts is extremely simple & [...]
Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on Hadoop clusters. The platform gives users the ability to establish the accurately relationship between various data and processing elements on a Hadoop environment. The solution allows for Feed management services such as feed retention, replications across clusters, archival and more. The platform makes it easy for users to onboard new workflows/pipelines, with support for late data handling, retry policies. It provides for integration with metastore/catalog such as Hive/HCatalog. It provides [...]
Oozie is a workflow scheduler system that is designed to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. The platform is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).The system is a scalable, reliable, and extensible hence it ensures that developers make the most out of their time and processes. [...]
GETL is a set of libraries of pre-built classes and objects that can be used to solve problems unpacking, transform and load data into programs written in Groovy, or Java, as well as from any software that supports the work with Java classes.GETL based package in Groovy, which automates the work of loading and transforming data.
GETL taken into account when developing ideas and following requirements: The simpler the class hierarchy, the easier solution.The data structures tend to change over time, or not be known in advance, working with them must be maintained. All routine work ETL should be automated wherever [...]