Reviews
Now Reading
Apache Spark
0
Review

Apache Spark

Overview
Synopsis

Apache Spark is a fast and general engine for large-scale data processing. Spark requires a cluster manager and a distributed storage system.

Category

Bigdata

Company

Apache

PAT Rating™
Editor Rating
Aggregated User Rating
Rate Here
Ease of use
7.6
6.7
Features & Functionality
7.8
8.8
Advanced Features
7.8
8.4
Integration
7.7
9.6
Performance
7.7
8.6
Training
8.3
Customer Support
7.6
6.3
Implementation
6.5
Renew & Recommend
10
Bottom Line

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming.

7.7
Editor Rating
8.1
Aggregated User Rating
8 ratings
You have rated this

Apache Spark is a fast and general engine for large-scale data processing. Spark requires a cluster manager and a distributed storage system. For cluster management, Spark supports standalone (native Spark cluster), Hadoop YARN, or Apache Mesos.

For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS), MapR File System (MapR-FS), Cassandra,OpenStack Swift, Amazon S3, Kudu, or a custom solution can be implemented. Spark also supports a pseudo-distributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file system can be used instead, Spark is run on a single machine with one executor per CPU core.

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.

It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Filter reviews
User Ratings





User Company size



User role





User industry





Ease of use
Features & Functionality
Advanced Features
Integration
Performance
Training
Customer Support
Implementation
Renew & Recommend

What's your reaction?
Love It
0%
Very Good
0%
INTERESTED
50%
COOL
0%
NOT BAD
0%
WHAT !
50%
HATE IT
0%