Bigdata
Now Reading
What is big data ? Top Bigdata Tools
0

What is big data ? Top Bigdata Tools

What is big data ? Top Bigdata Tools : Big data are large data sets which are difficult to capture, curate, manage and process with the traditional database models with in a tolerable time. The data sets are so large or complex that traditional data processing applications are inadequate and it poses challenges in analysis, capture, curation, search, retrieval, sharing, storage, transfer and visualization. Analysis of data sets can find new correlations to spot business trends, prevent diseases, combat crime etc. New techniques and technologies are required to uncover hidden values from large datasets as these are diverse, complex, and of a massive scale. Bigdata Tools include Bigdata Platforms and Bigdata Analytics Software, Bigdata Benchmark Suites, Data Ingestion Tools, Data preparation tools and platforms. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.

From the Wall Street Journal “Companies are being inundated with data” . The Financial Times “Increasingly businesses are applying analytics to social media such as Facebook and Twitter” . Forbes “Big Data has arrived at Seton Health Care Family”.

What is big data and Why is it getting this type of coverage ?

Because it has the potential to profoundly affect the way business is done and decisions are made.

What is big data and why this coverage ?

Data is the new Oil

The quote on CNBC really exemplifies this in  “Data is the new Oil”. Data is a natural resource that is growing tremendously  bigger. Like any resource, it is difficult to extract. It comes in many types and with a huge variety. It is also difficult to extract, refine and analyze.

Traditional data processing

Traditional data processing

Big data are large data sets which are difficult to capture, curate, manage and process with the traditional database models with in a tolerable time. The data set size which are considered to be defined as Big data is a moving target. As of 2012 this  data set size ranges from a few dozen TB- terabytes to many PB- petabytes of data in a single data set.

Big data requires exceptional technologies to efficiently process these large quantities of data sets within tolerable times. Some of the suitable technologies for these processing suggested by McKinsey report include A/B testing, crowdsourcing, data fusion and integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis and visualisation.

Big data processing

Big data processing

Top Bigdata Tools

Big data has the ability and potential to change the way decisions are made and business is done. Big Data has increased the demand of information management analysts/specialists. Companies such as Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion specializing in data management and analytics. In 2010, this industry was worth more than $100 billion. The growth of the industry was at 10 percent a year and this about twice as fast as the software business as a whole.

Bigdata Tools provide the ability to analyze a Variety of Information, analyze Information in Motion on ad hoc basis, analyze Extreme Volumes cost effectively. Provide ad-hoc analytics, data discovery and experimentation and enables the governance on data structure, integrity and control to ensure consistency for repeatable queries.

1.Bigdata Platforms and Bigdata Analytics Software

IBM Bigdata Analytics, HP Bigdata , SAP Bigdata Analytics, Microsoft Bigdata, Oracle Bigdata Analytics, Talend Open Studio, Teradata Bigdata Analytics, SAS Big data, Dell Bigdata Analytics, HPCC System Big data, Palantir Bigdata, Pivotal Bigdata, Google BigQuery, Pentaho Big Data Analytics, Amazon Web Service, Cloudera Enterprise Bigdata, Hortonworks Data Platform, FICO Bigdata Analytics, Cisco Bigdata, Splunk Bigdata Analytics, Fusion-io Bigdata, Intel Bigdata, Mu Sigma Bigdata, MicroStrategy Bigdata , Opera Solutions Bigdata, Redhat Bigdata, Informatica Bigdata, MarkLogic Bigdata, Vmware Bigdata, Syncsort Bigdata, SGI Bigdata, MongoDB , Guavus Bigdata, Alteryx Bigdata, 1010data Advanced Analytics, Actian Analytics Platform, MapR, Tableau Software bigdata, QlikView Bigdata, Attivio’s Bigdata, DataStax Bigdata, Gooddata, Google Bigdata, Datameer, CSC Big Data Platform, Flytxt, Amdocs, Cisco Bigdata, Platfora and GE Bigdata are some of the Big data Analytics Platforms and Software in no particular order.

Bigdata Platforms and Bigdata Analytics Software

Bigdata Platforms and Bigdata Analytics Software
PAT Index™
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Informatica PowerCenter Big Data Edition
 
FICO Big Data Analyzer
 
Attivio Active Intelligence Engine
 
Kognitio Cloud
 
 

2.Bigdata Benchmark Suites

HiBench, AMP Benchmark, BigDataBench, Yahoo! Cloud Serving Benchmark, GridMix, CloudSuite, SWIM, TPC Express Benchmark, PUMA Benchmark Suite, LinkBench are some of the Bigdata Benchmark Suites in no particular order.

Top Bigdata Benchmark Suites

3.Data Ingestion Tools

Gobblin, Amazon Kinesis, Apache Samza, Cloudera Morphlines, White Elephant, Apache Chukwa, Heka, Apache Flume, Databus, Apache Sqoop, Scribe and Fluentd some of the top data ingestion tools in no particular order.

Data Ingestion Tools

White Elephant

White Elephant

4.Data preparation tools and platforms

Platfora, Paxata, Datawatch, Microsoft Power Query for Excel, Tamr Platform, Alteryx , ClearStory Data, RapidMiner Studio, Alpine Chorus, Lavastorm, Teradata Loom, IBM SPSS, Looker , Informatica Rev, SAP Lumira, Trifacta, Waterline, Datameer, Advanced Miner, FICO Big Data Analyzer, Pentaho 5, Dell Toad Data Point, IBM DataWorks, SAS Enterprise Miner, KNIME, Progress Easyl, Omniscope and Infactum are some of the top Data preparation tools and platforms in no particular order.

Data preparation tools and platforms

Paxata

Paxata

5.Open Source Big data Enterprise Search Software

Apache Solr, Apache Lucene Core, Elasticsearch, Sphinx, Constellio, DataparkSearch Engine ApexKB, Searchdaimon ES, mnoGoSearch, Nutch, Xapian are some of the Top Open Source Big data Enterprise Search Software.

Top Open Source Big data Enterprise Search Software

6.In Memory Data Grid Applications

Oracle Coherence, WebSphere eXtreme Scale, Ehcache, GigaSpaces eXtreme Application Platform, GridGain IMDG, Redhat JBoss Data Grid 6, ScaleOut Software, Galaxy and Hazelcast are some of the top data grid softwares in no particular order.

In Memory Data Grid Applications

7. NewSQL Databases

Clustrix, NuoDB, VoltDB, MemSQL, TransLattice Elastic Database, ActorDB, GemFire XD, Trafodion, TokuDB, TIBCO ActiveSpaces, dbShards, Google Spanner, and CockroachDB are some of the NewSQL databases in no particular order.

NewSQL Databases

8.Top Graph Databases

Neo4j, AllegroGraph, Oracle Spatial and Graph, Teradata Aster, ArangoDB, Graphbase, InfiniteGraph, Bitsy, Horton, HyperGraphDB, DEX/Sparksee, Titan. VelocityGraph, VertexDB, InfoGrid, Oracle NoSQL Database, OrientDB, Blazegraph, Cayley, Weaver, Stardog, Sqrrl Enterprise, GraphDB, MapGraph and IBM System G Native Store are some of the top graph databases in no particular order.

Top Graph Databases

9.Deep Learning Software Libraries

Torch, Deeplearning4j, Gensim, Caffe, Theano, ND4J, DeepLearnToolbox, convnetjs are some of the deep learning software libraries in no particular order.

Deep Learning Software Libraries

10.Top Free Graph Databases

GraphDB Lite, Neo4j Community Edition, OrientDB Community Edition, Graph Engine, HyperGraphDB, MapGraph, ArangoDB,Titan, BrightstarDB, Cayley ,WhiteDB, Orly,Weaver, sones GraphDB and Filament are some of the top free graph databases in no particular order.

Top Free Graph Databases

11.SQL and No SQL Cloud Databases

MySQL, MariaDB, PostgreSQL, IBM DB2, Oracle Database, NuoDB, Ingres Database,Apache Cassandra, Clusterpoint database, Apache CouchDB, Apache Hadoop, MarkLogic, MongoDB, Neo4j,IBM dashDB,  Microsoft Azure SQL Database, Amazon Relational Database, Clustrix, EnterpriseDB, Heroku, Amazon DynamoDB, Google App Engine, Cloudant,  Amazon SimpleDBt are some of the top SQL and No SQL Cloud Databases in no particular order.

SQL and No SQL Cloud Databases

12.Free and Commercial MultiValue Databases

jBASE, OpenQM, Rocket D3 Database Management System , OpenInsight, InterSystems Caché,and InfinityDB are some of the top Free and Commercial MultiValue Databases in no particular order.

Free and Commercial MultiValue Databases

Predictive Analysis Process

For more information of predictive analytics process, please review the overview of each components in the predictive analytics process: data collection (data mining), data analysis, statistical analysis, predictive modeling and predictive model deployment.

Predictive Analytics Process Flow

Predictive Analytics Process Flow

What's your reaction?
Love It
60%
Very Good
20%
INTERESTED
0%
COOL
20%
NOT BAD
0%
WHAT !
0%
HATE IT
0%