Bigdata
Now Reading
Top 38 Data Preparation Tools and Platforms
2

Top 38 Data Preparation Tools and Platforms

Top 38 Data Preparation Tools and Platforms
4.77 (95.38%) 26 ratings

Data preparation tools and platforms: The purpose of data preparation is to transform data sets in a way that the information contained is best exposed to the tool. Data preparation tools and platforms enables Data discovery, exploration, analysis, conversion, cleaning, transformation, modeling, structuring, curation and cataloguing. Actian Vector in Hadoop (VectorH Express), AdvancedMiner, Alpine Chorus, Alteryx Analytics, ClearStory Data, Datameer, Datawatch, FICO Big Data Analyzer, Holistics, IBM SPSS Modeler, Informatica Rev, Information Builders WebFOCUS Platform, KNIME, Lavastorm Analytics Engine, Logi DataHub, Logi Vision, Looker, Microsoft Power Query for Excel, Paxata, Pentaho Big Data Analytics, Platfora, RapidMiner, SAP Lumira, SAS Enterprise Guide, SAS Enterprise Miner, Segment, Stytch, Tamr, Teradata Loom, Trifacta, Vero Analytics, Waterline Data, Dell Toad Data Point, IBM DataWorks, Progress Easyl, and Omniscope are some of the top Data preparation tools and platforms in no particular order.

Top Data Preparation Tools and Platforms

Actian Vector in Hadoop (VectorH Express), AdvancedMiner, Alpine Chorus, Alteryx Analytics, ClearStory Data, Datameer, Datawatch, FICO Big Data Analyzer, Holistics, IBM SPSS Modeler, Informatica Rev, Information Builders WebFOCUS Platform, KNIME, Lavastorm Analytics Engine, Logi DataHub, Logi Vision, Looker, Microsoft Power Query for Excel, Paxata, Pentaho Big Data Analytics, Platfora, RapidMiner, SAP Lumira, SAS Enterprise Guide, SAS Enterprise Miner, Segment, Stytch, Tamr, Teradata Loom, Trifacta, Vero Analytics, Waterline Data, Dell Toad Data Point, IBM DataWorks, Progress Easyl, and Omniscope

1

Actian Vector in Hadoop

The Actian Analytics Platform turns Hadoop into a high-performance analytics platform, enabling organizations to improve the accuracy of predictions and decision making by analyzing data from more sources without sampling. Actian Express – Hadoop SQL Edition includes Data Science Workbench, Data Flow Engine, Analytics Database, Management Console and Preloaded Data and Tableau Workbooks. The Data Science Workbench build visual workflows to prepare, blend and analyze Hadoop data and Data Flow Engine execute analytic workflows at least 10 times faster than MapReduce – without coding. The Analytics Database run high-performance SQL queries natively in Hadoop and the Management Console easily monitor…

Actian Vector in Hadoop

2

AdvancedMiner

AdvancedMiner is an integrated analytical tool for data processing, analysis and modeling. With a graphical interface (Workflow) it offers a complete and user-friendly environment for data exploration. Advanced miner allows for data processing. AdvancedMiner provides features for extracting and saving data from/to different database systems and files, performing a wide range of operations on data, such as sampling, joining datasets, dividing into testing/training/validating sets, assigning roles to attributes, graphical and interactive data exploration, outlier filtering, supplying missing values, PCA, various data transformations, building association models, clustering analyses, variable importance analyses, constructing various analytical models with the use of diverse Data…

AdvancedMiner

3

Alpine Chorus

Alpine Chorus : Alpine Chorus is a comprehensive platform for Advanced Analytics which provides the entire analytic lifecycle in one environment, and enable people to build, deploy and consume analytic applications and insights in an agile and collaborative manner. Features include hypothesis testing and predictive modeling using full statistical functionality including Time-Series Analysis, Classification, Regression, Decision Trees and more, Data transformation, feature creation, and model building. Also curate and leverage data, models, and results securely to avoid data silos and provides end-to-end workflows cover extraction, transformation, modeling, and scoring. Alpine Chorus’s visual drag-and-drop interface allows business users and data scientists…

Alpine Chorus

Alpine Chorus

4

Alteryx

Alteryx Analytics : Alteryx Analytics portfolio includes Alteryx Designer, Alteryx Server and Alteryx Analytics Gallery. Alteryx Designer allows to blend internal, third-party, and cloud-based data, build powerful R-based predictive and spatial analytics applications without any programming and share deep data insight with business decision makers. Predictive modeling techniques, such as logical regression or decision trees, clustering techniques such as K-centroid clustering and principle component analysis, data investigation techniques, such as scatter plots and association analysis- all can be included with out any programming using Alteryx Designer. Alteryx Server, scale the critical analytic workflows to meet data and analytic requirements, schedule…

Alteryx Analytics

Alteryx Analytics

5

ClearStory Data

ClearStory Data : ClearStory Data infers what’s in data to speed data preparation and converge disparate data on the fly. Internal and external data access requires no pre modeling or skills that mandate data specialists. ClearStory’s Intelligent Data Harmonization identifies data relationships across disparate data sources and converges data on-the-fly, to reach holistic, interactive answers faster. ClearStory’s advanced data harmonization platform is powered by an inference and profiling engine to extract metadata in real-time, using Apache Spark’s fast in-memory processing. Data dimensions including dates, time, currencies, geographical entities, and other custom attributes can be inferred and blended with no pre-modeling or…

ClearStory Data

6

Datameer

Datameer : Datameer Professional, is a SaaS big data analytics platform targeted for department specific deployments. Datameer offering features leading Hadoop cloud providers Altiscale and Bigstep. Datameer simplifies the big data analytics environment into a single application on top of the powerful Hadoop platform. Datameer combines self-service data integration, analytics and visualization functionality that provides the fastest time to insights. Datameer simplifies the big data analytics process into a single self-service big data application on top of Hadoop, disrupting a multi-process system. With more than 70+ pre-built data connectors for any data type, size or source, a spreadsheet user interface,…

Datameer

7

Datawatch

Datawatch : Datawatch provides a platform for visual analytics to acquire, prepare, and transform data from structured and multi-structured sources such as PDF and log files, as well as real-time streaming data, into visually rich analytic applications. This allows users to dynamically discover key factors that impact any operational aspect of their business. Datawatch Managed Analytics Platform deliver an enterprise solution for self-service data preparation and visual data discovery. The capabilities delivered with the Datawatch Managed Analytics Platform include self-service data preparation, advanced data enrichment, automation without scripting, access multi-structured data, synchronous visual authoring, visual data discovery and frictionless governance.…

Datawatch

8

FICO Big Data Analyzer

FICO Big Data Analyzer, is a purpose-built analytics environment for a new generation of data professionals. Big Data Analyzer empowers a broad range of users to collaboratively explore data and discover new insights from any type and size of data on Hadoop. FICO Big Data Analyzer, provides features to ingest your own data, explore, query and visualize data, find and re-use analytic assets, wrangle big data for predictive and prescriptive modeling, export insights for downstream decisions and services and empower data and business teams to collaborate. FICO Big Data Analyzer closes the loop between data exploration and insight discovery with…

FICO Big Data Analyzer

FICO Big Data Analyzer

9

Holistics

Holistics is the most flexible data reporting and preparation software that works with the company’s data infrastructure with no training required. Data reporting has never been easier for users as Holistics provide automated importing of database records into Excel and pivoting data for the organization’s reporting needs.Holistics wanted to make sure that users get to manage the system without having any special skills in order to access information. Business teams can easily filter live data in their reports while data teams can create reports with the familiar data language such as SQL to create any business reports that is needed.Its…

Holistics

10

IBM SPSS

IBM SPSS Modeler is an analytics platform from IBM, which bring predictive intelligence to everyday business problems. The solution provides a range of advanced analytics including text analytics, entity analytics, social network analysis, automated modeling, data preparation, decision management and optimization.SPSS Modeler offers capabilities include conducting analysis regardless of where the data is stored such as in a data warehouse, a database, Hadoop or flat file and regardless of whether it is structured such as age, price, product, location or unstructured such as text, emails, social media. IBM SPSS Modeler is offered as three editions: IBM SPSS Modeler Gold –…

IBM SPSS Modeler

IBM Predictive Analytics

11

Informatica Rev

Informatica Rev : Informatica Rev, merge data from multiple sources, including spreadsheets, and prepare it for analysis. Informatica Rev, is a spreadsheet like interface combined with a recommendation engine, that provides business users with intelligent guidance on combining, preparing and cleansing data. It offers an intuitive user experience optimized for business users to bring together data for business decision making in visualization tools such as Tableau. Informatica Rev let business users seamlessly operationalize any work that needs to be handed off to IT. Features include simplified data blending and merging, auto standardization and validation, managed provisioning of complex sources and…

Informatica Rev

Informatica Rev

12

WebFOCUS

WebFOCUS, is a comprehensive and flexible BI and analytics platform, offers simple to sophisticated analytical tools and apps that enable the visual exploration and answering of a broad range of business questions for users inside and outside the enterprise. Three editions of the product are designed to deliver the many benefits of governed self-service apps and tools to your entire universe of business stakeholders, from management to employees to partners to customers to citizens. WebFOCUS offers self-service analytics with Dashboards and scorecards give executives and managers a high-level view of critical indicators and metrics, Self-service tools allow users to easily…

Information Builders WebFOCUS Platform

13

KNIME

KNIME: KNIME, the Konstanz Information Miner, is an open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept and provides a graphical user interface allows assembly of nodes for data preprocessing, for modeling and data analysis and visualization. KNIME Analytics Platform provides over 1000 data analytic routines, either natively or through R and Weka, for such topics as Univariate and Multivariate Statistics, Data Mining,Time Series, Image Processing, Web Analytics, Text Mining, Network Analysis and Social Media Analysis. KNIME analytic workflows can be run through the interactive…

KNIME

14

Lavastorm Analytics Engine

Lavastorm Analytics Engine : Lavastorm is a visual data discovery solution that allows to rapidly integrate diverse data, easily discover elusive insights, and continuously detect anomalies, outliers, or patterns. Lavastorm Analytics Engine provides self-service capability for business users and rapid development capabilities for IT users in the areas of integration, analytics, and business control. Features include acquire, transform, combine, and enrich data from virtually any source, including Big Data sources without intensive modeling, pre-planning, or scripting. The solution discover data issues, such as completeness, inconsistent formats, accuracy, automate the evaluation and cleansing process. Lavastorm Analytics Engine use the visual analytic…

Lavastorm Analytics Engine

Lavastorm Analytics Engine

15

Logi DataHub

Logi DataHub works with both Logi Info and Logi Vision to simplify data preparation and ensure high performance for self-service analytics. With DataHub, customers can rapidly connect, acquire, and blend data from files, applications or databases, whether on-premise or in the cloud; cache it in a high-performance self-tuning repository; and prepare it using DataHub’s smart profiling, joining, and intuitive data enrichment. With DataHub, it is easy for multiple users to create, edit, and share data connections and dataviews with others, so preparing data can be a team exercise.Logi DataHub centralizes your data for all your self-service applications. Dataviews can be…

Logi DataHub

16

Logi Vision

Logi Vision is a visual analytics application designed for workgroup collaboration. Vision empowers business users to acquire data, analyze information, create visualizations, and share insights for faster, better-informed decisions. Logi Vision is now smarter than ever, providing business users with an improved data visualization recommendation engine that learns from user activities. New project templates help shortcut data discovery with prepopulated data connections, visualizations, and dashboards that simply require users to connect their data to begin their analysis. Logi Vision provides users with guided data exploration to drive faster insights. The recommendation engine uses algorithms based on industry best practices to…

Logi Vision

17

Looker

Looker : Looker is a web-based business intelligence platform that brings people and data together. Looker puts actionable data in the hands of the people who need it most, through a unique data description language called LookML. LookML is a easy-to-use modeling language for encapsulating business logic, defining important metrics once and then reusing them throughout the model. Using LookML, analysts can create and curate custom data experiences so any employee can explore and utilize the data that’s most relevant to them. Looker was built from the ground up to enable Big Data processing, leveraging dialect-specific SQL and analytic functions…

Looker

Looker

18

Microsoft Power Query for Excel

Microsoft Power Query for Excel: Microsoft Power Query for Excel is an Excel add-in that enhances the self-service Business Intelligence experience in Excel by simplifying data discovery, access and collaboration. Microsoft Power Query for Excel, provides a seamless experience for data discovery, data transformation and enrichment for Information Workers, BI professionals and other Excel users. Power Query features include identify the data about from the sources work with such as relational databases, Excel, text and XML files, OData feeds, web pages, Hadoop HDFS. Power Query let discover relevant data from inside(*) and outside organization using the search capabilities within Excel…

Microsoft Power Query for Excel

Microsoft Power Query for Excel

19

Paxata

Paxata : Paxata is self-service Adaptive Data Preparation platform that lets business analysts rapidly collect, explore, transform and combine data with the same freedom they are used to in their analytic discovery. Paxata’s solution lets business people make data sets ready for ad-hoc analytics without going through the painful and manual steps they traditionally dealt with. Paxata platform was built with a data management layer that persists data inside the Hadoop Distributed File System (HDFS) and a real-time columnar parallelized in-memory pipeline data prep engine powered by Intellifusion. The data prep engine wraps Apache Spark v1.1 with additional functionality built…

Paxata

20

Pentaho Big Data Analytics

Pentaho Big Data Analytics provides big data tools to extract prepare and blend user’s data, plus the visualizations and analytics that will change the way a company runs its business. From Hadoop and Spark to NoSQL, Pentaho allows users to turn big data into big insights.Pentaho Big Data Analytics offers full array of analytics such as data access and integration to data visualization and predictive analytics. It empowers users to architect big data blends at the source and stream them directly for more complete and accurate analytics. Users are given the ability to spot check data in-flight with immediate access…

Pentaho Big Data Analytics

21

Platfora

Platfora : Platfora is an end-to-end big data analytics platform with a native-Hadoop infrastructure that enables analysts, business professionals and data scientists to instantly access and drill down into the rawest forms of petabyte-scale data without the need for IT support. Platfora analyze all of data to answer the toughest questions with no code required including data preparation, data warehousing and business analytics are included. Platfora Big Data Analytics includes significant enhancements to the visual analysis capabilities and processing engine, including interactivity at Big Data Scale, advanced visualizations and geo analytics. Platfora provides the ability to interactively analyze the biggest…

Platfora

Platfora

22

RapidMiner

RapidMiner : RapidMiner provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics and is used for business and industrial applications as well as for research, education, training, rapid prototyping, and application development. RapidMiner supports all steps of the data mining process including results visualization, validation and optimization.RapidMiner uses a client/server model with the server offered as Software as a Service or on cloud infrastructures. RapidMiner provides data mining and machine learning procedures including: data loading and transformation, data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment. RapidMiner is written in…

RapidMiner

RapidMiner

23

SAP Lumira

SAP Lumira : SAP Lumira is a self service Business Intelligence solution from SAP which allows business users to access, transform, visualize data, analyze the trends, and share insights on the BI platform or in the cloud. SAP Lumira solution portfolio includes SAP BusinessObjects Lumira, Standard Edition, SAP BusinessObjects Lumira, Server for Teams and SAP BusinessObjects Lumira, Server for BI Platform. The SAP Lumira desktop prepares the data from multiple sources, and provide tools to visualize the data. Using SSAP BusinessObjects Lumira, Server for Teams and SAP BusinessObjects Lumira, Server for BI Platform, these visualizations can be shared with in…

SAP Lumira

SAP Lumira

24

SAS Enterprise Guide

SAS Enterprise Guide is a point-and-click, menu- and wizard-driven tool that empowers users to analyze data and publish results.It provides fast-track learning for quick data analysis, generates code for productivity and speeds your ability to deploy analyses and forecasts in real time. It is a centralized system for managing access to corporate data ensures that users have appropriate access privileges that empower them to react quickly to evolving business conditions. It guides users so they can quickly access data for analysis, schedule projects, share results and embed output easily for repeated use – including access to advanced analytics and other…

SAS Enterprise Guide

25

SAS Enterprise Miner

SAS Enterprise Miner : SAS Enterprise Miner is a solution to create accurate predictive and descriptive models on large volumes of data across different sources in the organization. SAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market. Through innovative solutions, SAS helps customers at more than 70,000 sites improve performance and deliver value by making better decisions faster. Since 1976 SAS has been helping customers around the world. SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. SAS Enterprise Miner offers…

SAS Enterprise Miner

SAS Enterprise Miner

26

Segment

Segment provide its users a better way of collecting data from customers and be able to send it to everyone in the team. Stream data to all marketing integration needs that the people need as well. This process ensures that all departments are getting the right data to better come up with solutions that can be planned and take action right away. All information from customers is vital to a company to understand how they can connect with them and better provide services. From mobile devices, websites, servers and cloud applications, Segment will be able to process these data obtained…

Segment

27

Stytch

Stytch is a data analytics platform that provides business teams with everything they need to get more insights, faster—from data to dashboards. Stytch empowers business analysts to easily blend and model their data, providing the foundation for better quality data discovery and reporting across the enterprise. Stytch is unique. It provides the fastest way to prepare, explore and share your data to get the most business insights. It is the only end-to-end data analytics platform connected to the world’s largest business database from Dun & Bradstreet, so the user can accurately match, cleanse and blend data along the way. Stytch…

Stytch

28

Tamr

Tamr : Tamr’s data unification platform catalogues, connects and curates hundreds or thousands of internal and external data sources through a combination of machine learning algorithms and human expert guidance reducing the cost, time and effort of preparing data for analysis. Tamr, catalogs, connects and curates the vast reserves of underutilized internal and external data using a combination of machine learning with human guidance so enterprises can use all their data for analytics. Tamr dynamically catalogs the organization’s information assets with their crawlers, entity tagging, and metadata visualization features to provide a comprehensive, organized, bottom-up inventory of all information assets…

Tamr

Tamr

29

Teradata Loom

Teradata Loom : Teradata Loom enables data analysts and data scientists to easily find, access, and understand data in Hadoop. Loom quickly start with data analysis to accelerate the time from data acquisition to delivering business insights and enables highly exploratory, iterative interactions with the datasets to quickly prepare the data for meaningful statistical analysis. The Loom workbench is a simple browser based, intuitive user interface accessible in a self service fashion by multiple users in the organization. Features include single, unified integrated platform from discovery to metadata management to data preparation, automated source discovery, metadata generation, and data profiling.…

Teradata Loom

Teradata Loom

30

Trifacta

Trifacta : Trifacta’s Visual Data Profiling features provide immediate visibility into unique elements of the data set like data distributions and outliers to inform the transformation and analysis process.Trifacta uses data inference techniques to introspect the data and automatically apply initial shaping and metadata recommendations for the user. This greatly accelerates the transformation process. Users can quickly un-nest and iterate on the shape of their data in preparation for the dataset’s downstream use. Trifacta’s data enrichment features make standardizing data, joining datasets and aggregating data outputs to the right level, faster and more accurate.Advanced visual data profiling capabilities that guide…

Trifacta

Trifacta’s Visual Data Profiling

31

Vero Analytics

Vero is an SQL IDE that can Write SQL. It goes beyond providing basic keyword hinting to generating complete queries, automatically resolving complex join trees and providing Alias Aware code completions. Vero generates Multi-Pass SQL Scripts that Data Engineers and Analysts write manually. This makes automatic join resolution, in database blending and federated queries easier. Vero also allows its users to run queries across separate databases as if they are collocated. Users can drag and drop to generate a data blending query scaffold and then proceed to hack the query. Vero’s high performance data blending tech takes care of moving…

Vero Analytics

32

Waterline Data

Waterline Data is an automated data discovery platform that helps Data architects inventory all data in Hadoop automatically at scale, and provision data to business users securely and to make the data ready for analysis automatically without having to explore every file manually. Waterline Data also helps to discover lineage and business metadata automatically, as well as manage metadata. Waterline Data Inventory automatically profiles and catalogs all the files in Hadoop, detects when the contents of files have changed and notifies users and inspects each field in a file to infer its meaning, tags the field accordingly, and generates key…

Waterline Data

Waterline Data

33.Dell Toad Data Point

Dell Toad Data Point is a Data analysis tools that simplify data access, integration and reporting. It connect to and integrate all your relational and non-relational data sources, simplify complex query development and data integration, profile data to ensure accuracy, automate routine query and reporting tasks and validate data quickly and easily.
Dell Toad Data Point

34.IBM DataWorks

IBM DataWorks is a cloud based data refinery which transforms raw data into relevant and actionable information and makes it easily accessible to those who need it. IBM DataWorks saves time and resources across the organization and ccelerates data-based decisions.
IBM DataWorks

35.Progress Easyl

Progress Easyl is a cross platform, simple, self service data preparation tool that makes it easy to access, blend, and report on data that spans a wide variety of business applications and data sources. It is browser based solution that allows to easily obtain, collaborate and share critical insight between different departments such as marketing, sales, and customer support professionals, empowering the organization to capitalize on new opportunities.

Progress Easyl

Progress Easyl

Progress Easyl

36.Omniscope

Omniscope Desktop Edition integrates two workspaces in a single, in-memory, file-based application.DataManager provides data import from most sources, preparation/transformation, integration and delivery of processed data sets in a wide variety of formats and DataExplorer provides interactive visual data discovery, analysis, multi-tab, multi-view reporting, dashboarding, publication in a wide variety of formats.

Omniscope

Omniscope

Omniscope

37.Open Source Data Quality and Profiling

This provides high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytics.

Open Source Data Quality and Profiling

Open Source Data Quality and Profiling

38.Infactum

Infactum provides actionable visual insights from data in seconds by simply droping datasheet.

Infactum

2 Reviews
  • May 15, 2015 at 7:48 am

    ADDITIONAL INFORMATION
    Omniscope on this list. Omniscope 3.0 has a streaming-based highly-scalable in situ (and/or in-memory) data preparation workspace that is much cheaper and faster than Alteryx, et al. and also includes an integrated multi-tab, multi-view visualisation/presentation interface that allows the user to iterate between analytics ‘visual discovery’, and pixel-perfect, branded dashboard presentations. Open-source JavaScript visualisations from libraries like D3.js and others are supported…full R integration, geo-spatial/locational analytics and much more. Free to try:
    http://www.visokio.com/download

  • May 13, 2016 at 8:00 am

    ADDITIONAL INFORMATION
    Ideata Analytics has a compelling tool in the self-serve data preparation space.

    You can check it out at https://ideata-analytics.com. They are also providing a very intuitive and machine learning driven self service data preparation interface.

    Based on user selection of data, ideata analytics auto suggests users with a list of transformation which can be applied in order to shape and clean the data. Any data analyst, data scientist or a business user do not have to write a single line of code or SQL script or design complicated ETL jobs. They can just visually clean the data and see the results instantly.

    The major advantage with Ideata Analytics is that it is built on top of big data technologies from scratch so even if you have millions and trillions of messy rows in your data it will clean it in no time and with ease.

    Free Trial Link : https://ideata-analytics.com/trial

What's your reaction?
Love It
65%
Very Good
12%
INTERESTED
6%
COOL
12%
NOT BAD
0%
WHAT !
6%
HATE IT
0%
About The Author
imanuel