Sign in to see all reviews and comparisons. It's Free!
•Open source software for mining big data streams •Spark Streaming extension •Implemented methods CluStream; Hoeffding Decision Trees; bagging; Stream KM ++; HyperplaneGenerator;
What are the benefits?
• Open source software for mining big data streams • Spark Streaming extension • Implemented methods CluStream;Hoeffding Decision Trees;bagging;Stream KM ++; HyperplaneGenerator.
Aggregated User Rating
Ease of use
Features & Functionality
Renew & Recommend
Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD).
Aggregated User Rating
You have rated this
streamDM is an open source software for mining big data streams that uses Spark Streaming, developed at Huawei Noah's Ark Lab. This software is licensed under Apache Software License v2.0.
Today, Big Data Stream learning is more challenging because data may not keep the same distribution over the lifetime of the stream. Learning algorithms needs to be very efficient because each example that comes in a stream can be processed once or these examples needs to be summarized with a small memory footprint.
Spark Streaming, which makes building scalable fault – tolerant streaming applications easy, is an extension of the core Spark API (fast and general engine for large-scale data processing) which enables stream processing from a variety of sources.
It is extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD) which receives input data streams and divides the data into batches and then in order of generating the results, these data are processed by the Spark engine. All of these data are into a sequence of DStreams, represented internally as a sequence of RDDs. Methods implemented in are SGD learner and perceptron; naïve bayes; CluStream; Hoeffding Decision Trees; bagging; Stream KM ++; HyperplaneGenerator; RandomTreeGenerator; RandomRBFGenerator; RandomRBFEventsGenerator.
SampleDataWriter is also implemented which can call data generators in order of creating sample data for simulation or test. When it comes to next releases adding more methods such as classification – random forests; regression – Hoeffding regression tree, Bagging, random forests; clustering – Clustree, DenStream; Frequent itemset Miner – IncMine, IncSecMine is planned.
Every day, thousands of potential buyers including CEO's, CIO's, Directors, and Executives use PAT RESEARCH.
PAT RESEARCH is a B2B discovery platform which provides Best Practices, Buying Guides, Reviews, Ratings, Comparison, Research, Commentary, and Analysis for Enterprise Software and Services. We provide Best Practices, PAT Index™ enabled product reviews and user review comparisons to help IT decision makers such as CEO’s, CIO’s, Directors, and Executives to identify technologies, software, service and strategies.