What is big data ? Top Bigdata Tools
What is big data ? Top Bigdata Tools : Big data are large data sets which are difficult to capture, curate, manage and process with the traditional database models with in a tolerable time. The data sets are so large or complex that traditional data processing applications are inadequate and it poses challenges in analysis, capture, curation, search, retrieval, sharing, storage, transfer and visualization. Analysis of data sets can find new correlations to spot business trends, prevent diseases, combat crime etc. New techniques and technologies are required to uncover hidden values from large datasets as these are diverse, complex, and of a massive scale. Bigdata Tools include Bigdata Platforms and Bigdata Analytics Software, Bigdata Benchmark Suites, Data Ingestion Tools, Data preparation tools and platforms. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.
What is big data and Why is it getting this type of coverage ?
Because it has the potential to profoundly affect the way business is done and decisions are made.
What is big data and why this coverage ?
Data is the new Oil
The quote on CNBC really exemplifies this in “Data is the new Oil”. Data is a natural resource that is growing tremendously bigger. Like any resource, it is difficult to extract. It comes in many types and with a huge variety. It is also difficult to extract, refine and analyze.
Big data are large data sets which are difficult to capture, curate, manage and process with the traditional database models with in a tolerable time. The data set size which are considered to be defined as Big data is a moving target. As of 2012 this data set size ranges from a few dozen TB- terabytes to many PB- petabytes of data in a single data set.
Big data requires exceptional technologies to efficiently process these large quantities of data sets within tolerable times. Some of the suitable technologies for these processing suggested by McKinsey report include A/B testing, crowdsourcing, data fusion and integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis and visualisation.
Top Bigdata Tools
Big data has the ability and potential to change the way decisions are made and business is done. Big Data has increased the demand of information management analysts/specialists. Companies such as Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion specializing in data management and analytics. In 2010, this industry was worth more than $100 billion. The growth of the industry was at 10 percent a year and this about twice as fast as the software business as a whole.
Bigdata Tools provide the ability to analyze a Variety of Information, analyze Information in Motion on ad hoc basis, analyze Extreme Volumes cost effectively. Provide ad-hoc analytics, data discovery and experimentation and enables the governance on data structure, integrity and control to ensure consistency for repeatable queries.
1.Bigdata Platforms and Bigdata Analytics Software
IBM Bigdata Analytics, HP Bigdata , SAP Bigdata Analytics, Microsoft Bigdata, Oracle Bigdata Analytics, Talend Open Studio, Teradata Bigdata Analytics, SAS Big data, Dell Bigdata Analytics, HPCC System Big data, Palantir Bigdata, Pivotal Bigdata, Google BigQuery, Pentaho Big Data Analytics, Amazon Web Service, Cloudera Enterprise Bigdata, Hortonworks Data Platform, FICO Bigdata Analytics, Cisco Bigdata, Splunk Bigdata Analytics, Fusion-io Bigdata, Intel Bigdata, Mu Sigma Bigdata, MicroStrategy Bigdata , Opera Solutions Bigdata, Redhat Bigdata, Informatica Bigdata, MarkLogic Bigdata, Vmware Bigdata, Syncsort Bigdata, SGI Bigdata, MongoDB , Guavus Bigdata, Alteryx Bigdata, 1010data Advanced Analytics, Actian Analytics Platform, MapR, Tableau Software bigdata, QlikView Bigdata, Attivio’s Bigdata, DataStax Bigdata, Gooddata, Google Bigdata, Datameer, CSC Big Data Platform, Flytxt, Amdocs, Cisco Bigdata, Platfora and GE Bigdata are some of the Big data Analytics Platforms and Software in no particular order.
2.Bigdata Benchmark Suites
HiBench, AMP Benchmark, BigDataBench, Yahoo! Cloud Serving Benchmark, GridMix, CloudSuite, SWIM, TPC Express Benchmark, PUMA Benchmark Suite, LinkBench are some of the Bigdata Benchmark Suites in no particular order.
3.Data Ingestion Tools
Gobblin, Amazon Kinesis, Apache Samza, Cloudera Morphlines, White Elephant, Apache Chukwa, Heka, Apache Flume, Databus, Apache Sqoop, Scribe and Fluentd some of the top data ingestion tools in no particular order.
4.Data preparation tools and platforms
Platfora, Paxata, Datawatch, Microsoft Power Query for Excel, Tamr Platform, Alteryx , ClearStory Data, RapidMiner Studio, Alpine Chorus, Lavastorm, Teradata Loom, IBM SPSS, Looker , Informatica Rev, SAP Lumira, Trifacta, Waterline, Datameer, Advanced Miner, FICO Big Data Analyzer, Pentaho 5, Dell Toad Data Point, IBM DataWorks, SAS Enterprise Miner, KNIME, Progress Easyl, Omniscope and Infactum are some of the top Data preparation tools and platforms in no particular order.
5.Open Source Big data Enterprise Search Software
Apache Solr, Apache Lucene Core, Elasticsearch, Sphinx, Constellio, DataparkSearch Engine ApexKB, Searchdaimon ES, mnoGoSearch, Nutch, Xapian are some of the Top Open Source Big data Enterprise Search Software.
6.In Memory Data Grid Applications
Oracle Coherence, WebSphere eXtreme Scale, Ehcache, GigaSpaces eXtreme Application Platform, GridGain IMDG, Redhat JBoss Data Grid 6, ScaleOut Software, Galaxy and Hazelcast are some of the top data grid softwares in no particular order.
7. NewSQL Databases
Clustrix, NuoDB, VoltDB, MemSQL, TransLattice Elastic Database, ActorDB, GemFire XD, Trafodion, TokuDB, TIBCO ActiveSpaces, dbShards, Google Spanner, and CockroachDB are some of the NewSQL databases in no particular order.
8.Top Graph Databases
Neo4j, AllegroGraph, Oracle Spatial and Graph, Teradata Aster, ArangoDB, Graphbase, InfiniteGraph, Bitsy, Horton, HyperGraphDB, DEX/Sparksee, Titan. VelocityGraph, VertexDB, InfoGrid, Oracle NoSQL Database, OrientDB, Blazegraph, Cayley, Weaver, Stardog, Sqrrl Enterprise, GraphDB, MapGraph and IBM System G Native Store are some of the top graph databases in no particular order.
9.Deep Learning Software Libraries
Torch, Deeplearning4j, Gensim, Caffe, Theano, ND4J, DeepLearnToolbox, convnetjs are some of the deep learning software libraries in no particular order.
10.Top Free Graph Databases
GraphDB Lite, Neo4j Community Edition, OrientDB Community Edition, Graph Engine, HyperGraphDB, MapGraph, ArangoDB,Titan, BrightstarDB, Cayley ,WhiteDB, Orly,Weaver, sones GraphDB and Filament are some of the top free graph databases in no particular order.
11.SQL and No SQL Cloud Databases
MySQL, MariaDB, PostgreSQL, IBM DB2, Oracle Database, NuoDB, Ingres Database,Apache Cassandra, Clusterpoint database, Apache CouchDB, Apache Hadoop, MarkLogic, MongoDB, Neo4j,IBM dashDB, Microsoft Azure SQL Database, Amazon Relational Database, Clustrix, EnterpriseDB, Heroku, Amazon DynamoDB, Google App Engine, Cloudant, Amazon SimpleDBt are some of the top SQL and No SQL Cloud Databases in no particular order.
12.Free and Commercial MultiValue Databases
jBASE, OpenQM, Rocket D3 Database Management System , OpenInsight, InterSystems Caché,and InfinityDB are some of the top Free and Commercial MultiValue Databases in no particular order.
Predictive Analysis Process
For more information of predictive analytics process, please review the overview of each components in the predictive analytics process: data collection (data mining), data analysis, statistical analysis, predictive modeling and predictive model deployment.