Bigdata

Now Reading

DataTorrent Simplifies Data Ingestion and Extraction for Hadoop

DataTorrent dtIngest simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline and is available to organizations for unlimited use at no cost. Project Apex is the Apache 2.0 open source unified batch and stream processing engine that forms the core foundation of DataTorrent RTS 3. DataTorrent RTS 3 Community edition is the DataTorrent certified version of Project Apex. DataTorrent RTS 3 Enterprise Edition offers additional capabilities for operational management, easy development and data visualization on top of the community edition. “Hadoop ingestion is difficult and often prevents enterprises from gaining value from Hadoop, creating inefficiencies in the analysis process and stalling data initiatives altogether,” said Phu Hoang, CEO and co-founder, DataTorrent. “With the release of DataTorrent dtIngest, we now provide a free application to overcome this challenge. DataTorrent dtIngest, built on the enterprise-grade Project Apex, delivers secure, high performance and fault tolerant data ingestion for any Hadoop-based project.”

DataTorrent dtIngest makes configuring and running Hadoop data ingestion and data extraction a point-and-click process and includes enterprise-grade features.

Apache 2.0 open-source Project Apex based built on Project Apex, dtIngest is a native YARN application. It is completely fault tolerant, unlike other tools such as distCP, and can “resume” file ingest on failure. It is horizontally scalable and supports extremely high throughput and low latency data ingestion.

Simple to use and manage, a point-and-click application user interface makes it easy to configure, save and launch multiple data ingestion and distribution pipelines. Centralized management provides visibility, monitoring and summary logs. Batch as well as stream data, dtIngest supports moving data between NFS, (S)FTP, HDFS, AWS S3n, Kafka and JMS so you can use one platform to exchange data across multiple endpoints.

HDFS small file ingest using ‘compaction’ configurable automatic compaction of small files into large files during ingest into HDFS helps prevent running out of HDFS namenode namespace. Secure and efficient data movement, dtIngest supports compression and encryption during ingestion and is certified with Kerberos-enabled secure Hadoop clusters.