Advanced Analytics – An interview with Ingo Mierswa
Advanced Analytics – An interview with Ingo Mierswa : RapidMiner provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. RapidMiner is used for business, industrial applications, research, education, training, rapid prototyping, and application development and has more than 600 enterprise customers and more than 250,000 active users. Ingo Mierswa, an industry - veteran data scientist, is the CEO of RapidMiner. He has authored numerous award winning publications about predictive analytics and big data. The founder of RapidMiner explores the history of RapidMiner, Advanced Analytics, Predictive Analytics, the Advanced Analytics market, Data Science with PredictiveAnalyticsToday.
Advanced analytics can answer questions including “why is this happening,” “what if these trends continue,” “what will happen next” (prediction) and “what is my best option going forward.” The last one is what I like to call “predaction” – short for prediction-based action.
Advanced Analytics – An interview with Ingo Mierswa
RapidMiner, formerly Rapid-I, was founded in 2007 in Dortmund, Germany. The development of the technology actually began in 2001 in the Artificial Intelligence Unit of the Dortmund University of Technology in Germany.
Myself (Ingo Mierswa, CEO and co-founder), Ralf Klinkenberg (general manager of RapidMiner GmbH in Germany) and Simon Fischer (VP Engineering) began developing a flexible and powerful data mining software environment we called YALE. The innovative, open source product quickly became popular and by 2006, the demand was high enough that Klinkenberg and I decided to found a company to support the product. Shortly after, we rebuilt the software and released it as RapidMiner.
This past fall, RapidMiner closed a $5 million Series A funding round to expand domestic and international sales and marketing operations. In 2014, we moved our worldwide headquarters to Cambridge, Massachusetts. RapidMiner was also selected by Gartner as a Leader in its Advanced Analytics Platforms Magic Quadrant and landed first place for the second year in a row in KDnuggets’ annual software usage poll as the most widely used predictive analytics solution, with 44 percent of all business analysts relying on the software.
Today, RapidMiner has more than 250,000 active enterprise users and more than 600 customers, including Lufthansa, PayPal, Pepsi, Sanofi, Siemens, Telenor and Volkswagen.
I was one of the original developers of RapidMiner and have been a data scientist for more than 15 years. I have devoted my scientific work to this topic. As a scientist, I published more than 30 publications, including my Ph.D. on predictive analytics, and presented at numerous conferences. So, everything I do is connected to these deep skills and market knowledge. As the co-founder and CEO of the company, as well as a member of the Board, I provide the overarching vision, strategic direction and spearhead the international expansion efforts of the company.
Under my leadership, RapidMiner has grown more than 300 percent per year over the past five years and reached 60 employees globally. In 2012, I spearheaded the international strategy with the opening of an office in the U.S. I am serving the company as decision maker, leader, manager, executor and communicator. This includes setting the strategy and vision and ensuring that these are implemented in their respective departments. Building the team and setting corporate culture are also part of my responsibilities, as well as ensuring that operations are in compliance with corporate and legal requirements. Overall, I developed the company from zero revenues to a profitable, successful company and am now leading it through a phase of even further accelerated growth.
Analytics, in the broadest sense, refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of data to gain insight and drive business planning. We view analytics as consisting of two major areas: business intelligence (BI) and advanced analytics, which includes the area of predictive analytics.
BI is like driving a car while you are constantly looking into the rear-view mirror. You are only looking at data from the past. It consists of querying, reporting, and OLAP (online analytical processing), and can answer questions including “what happened,” “how many” and “how often.”
Advanced analytics, however, goes far beyond BI by using sophisticated modeling techniques to predict future events or discover patterns which cannot be detected otherwise. For example, advanced analytics can answer questions including “why is this happening,” “what if these trends continue,” “what will happen next” (prediction) and “what is my best option going forward.” The last one is what I like to call “predaction” – short for prediction-based action.
Predictive analytics is the most important part of the advanced analytics market segment, which is the practice of analyzing historical and current data to make statistically accurate predictions about future events. One example of this is analyzing historical customer behavior to predict which customers are most likely to leave a smartphone contract (churn). Or analyzing a manufacturing plant’s maintenance and breakdown records to predict when a particular machine or part is likely to fail – the plant operators can then better plan maintenance schedules and reduce disruptive break downs.
There are two somewhat orthogonal major trends. First, many new companies offer highly specialized vertical or horizontal solutions making use of predictive algorithms for solving one specific business problem. The advantage is high ease-of-use and a quick time-to-market. But at the same time, this trend comes with huge disadvantages: the “no-free-lunch theorem” proves that there is not a single algorithm that is best for all data sets and applications. Without the ability to add your own data, the results are also lacking some relevance for your company, and you can never become better than your competition that all use the same suboptimal algorithms. The second trend therefore is to offer full platforms to deliver advanced analytics but make them much easier than ever before to use and deploy.
RapidMiner is somewhat in between these two extremes by offering shrink-wrapped “accelerators” to produce predictions within just a few minutes. But at the same time, users can always lift the curtain and use the solutions produced by the accelerator as a starting point and, with a few clicks, fine-tune, optimize and integrate them into their business processes.
And then there is of course the move towards big data. For the past several years, the market has been dealing with the efficient storage and retrieval of data in new infrastructures like NoSQL databases and Hadoop-based solutions, and the overall insight now is that data alone does not offer any value. It is the ability to unlock hidden patterns in the data which gives you a competitive edge. Hadoop and predictive analytics are symbiotic here since data is often stored on a transactional level, and predictive analytics offers the largest value because users can make a prediction on every single case, and then act on the predictions.
In fact, RapidMiner just announced the acquisition of Radoop, which not only brings a Hadoop connector to RapidMiner, but also allows users to perform the calculations on the distributed Hadoop clusters. This is the first code-free solution doing this for predictive analytics, and we are years ahead of our competition with this technology.
First of all, I see a challenge: data scientists do not scale since they have a very rare skill mix. Data scientists are superstar programmers with Ph.D.s in statistics and the ability to understand every business problem in the world. Employees with such a mixture are difficult to find, and team-based approaches can help to overcome this shortage. But at the same time, those rare experts typically spend 95 percent of their time on standard tasks for data integration and transformation instead of on the creation of new algorithms. As such, organizations should carefully assess how they utilize their resources. In my experience, it is best to empower a team of experts with different backgrounds for the best results.
The opportunity for data scientists now is to be freed up from those standard tasks by letting the business analysts take over the control of the overall analytics architecture. This makes sense since business analysts are used to translate between business problems and analytics. By implementing this strategy, companies will not face a skills gap and will instead have effective teams working in competency centers, which are empowering business users to proactively solve their biggest problems. But it needs the right collaboration platform for such a team to really be effective, and a programming language like R or Python is not the answer.
Everything at RapidMiner follows three simple principles: predaction, collaboration and simplicity. Predaction represents the fact that the biggest value of predictive analytics is not in high-level predictions, but in performing millions of micro-predictions and acting on them. “What is the weather predaction for tomorrow?” – “I will bring my umbrella!” The value is not in the knowledge that it is going to rain, the true value lies in what is the best option for you when this happens. Are you staying home? Do you bring you umbrella? RapidMiner is a platform that can create millions of predictions and trigger the right business actions based on the results.
Collaboration means that we offer teams of people with different backgrounds a white board on which they can express their ideas on data integration, transformation, and modeling and turn them into reality with a single click. RapidMiner makes predictive analytics for business analysts easy since the product requires no programming and users can build analytics processes with a drag and drop interface. And indeed the business analyst is often in the driver’s seat at our customer organizations. But if business users can contribute the business problem, then data scientists are freed up from standard tasks and can focus on the specialized algorithms where needed, and IT professionals can contribute the data and control access rights. RapidMiner empowers such a team to effectively reach the best solution together.
Finally, simplicity means that everybody can create predictions and predactions within just a few minutes. We identified a speed-up of up to a factor of 40-50 times compared to pure scripting approaches for data integration, transformation, modeling, deployment and maintenance. It is amazing how RapidMiner achieves this ease-of-use and performance gain given that it supports more than 1,500 analytical operations, including hundreds of methods for data integration, data transformation, data modeling and data visualization – with access to data sources including Excel, Access, Oracle, IBM DB2, Microsoft SQL, Sybase, Ingres, MySQL, Postgres, SPSS, dBase, text files and more.
Yes, that’s right. We were extremely pleased to be included in Gartner’s first Leaders quadrant for advanced analytics platforms. We believe it validates RapidMiner’s value as a platform that supports the collaboration of users with different backgrounds – including business analysts, data scientists and business managers. It also reflects that RapidMiner is the most widely used solution for predictive analytics thanks to its unique combination of ease-of-use, number of functions, and flexibility for analytics and integration of the results.
In terms of our strategy moving forward, we’re going to continue to deliver on our promise to provide the most easy-to-use and innovative platform to our customers that gives them with the functionality they need to make better informed, forward looking business decisions and then act on those decisions. Our plan is to continue to grow our business and add new talent in the coming years to continue to deliver on our mission in multiple geographies of the world – to put the power of predictive analytics in the hands of users everywhere as fast as possible.
We’re proud of our open source roots and have a very large community of dedicated users that create add-ons to our software, which we will continue to support. Our new business source model represents a perfect balance of customer support and a scalable business model. The idea behind business source is incredibly simple: the latest and greatest version of RapidMiner is available under a freemium model, while previous versions are available under an open source license.
Popularized by Michael (Monty) Widenius, one of the founders of MySQL and an investor in RapidMiner, business source is a commercial software license model that offers many of the benefits of open source, but with a built-in time delay on users being able to access new versions of our products. What this does is allows us to deliver feature-rich versions of the software to all groups of users, while commercial, paid users are able to analyze larger data sets and connect the software to more data sources.
The strength of RapidMiner is that it supports dozens of use cases and application areas by reconnecting the provided building blocks. But we see four major areas where predictive analytics is benefiting our customers most, and they are: churn reduction, sentiment analysis, predictive maintenance and direct marketing.
RapidMiner’s Churn Analysis application wizard sifts through customer data to identify which customers are most likely to switch to a competitor and why, so that they can implement a targeted retention campaign in a timely fashion.
In terms of sentiment analysis, social media is a powerful tool where there is an endless source of conversations about products, services and brands that are discussed both positively and negatively. By plugging RapidMiner’s Sentiment Analysis Application wizard into the social media stream, our customers can see what their users are saying about their business and plan accordingly.
By using the RapidMiner Predictive Maintenance application wizard, our customers are able to transform the unknown into the predictable, reducing both unnecessary maintenance costs and unexpected failures.
And lastly, given the multiple marketing channels now available, including text messages, emails, online ads and various other forms of digital marketing, RapidMiner’s Direct Marketing application wizard allows our customers to invest only in marketing actions with the highest conversion rates and helps them reduce costs by improving targeting.
Thanks very much for conducting this interview, Manuel. We really appreciate your interest in RapidMiner and look forward to keeping you up to date on the company’s progress in the future!