Big Data Analysis With Packaged MapReduce Algorithms for Hadoop from Mu Sigma
Big Data Analysis With Packaged MapReduce Algorithms for Hadoop from Mu Sigma : Mu Sigma, is a pure-play decision sciences and analytics firm, that helps companies institutionalize data-driven decision making and harness Big Data. Mu Sigma solves high-impact business problems in the areas of Marketing, Risk and Supply Chain across 10 industry verticals. Mu Sigma has driven disruptive innovation in the analytics industry with its interdisciplinary approach combining business, math and technology, and its integrated decision support ecosystem comprised of technology platforms, processes, methodologies and people. muHPC (for High Performance Computing) is a library of popular statistical algorithms written in MapReduce, designed for enterprise-class Big Data analysis in Hadoop environments. As with Mu Sigma's other products, muHPC was successfully and extensively used within Mu Sigma on many client engagements before the company brought it to market.
Traditionally, enterprises that wanted to leverage R and Hadoop for Big Data analysis have had to write their own algorithms, or rely on open-source options that had not been widely used or tested. Quality varied, and it was a challenge for companies to acquire talent with relevant skills and competencies in order to code their own algorithms. Mu Sigma's offering enables enterprises to accelerate their R and Hadoop initiatives, and their overall Big Data analysis programs.
muHPC consists of packages muGLM, muEDA, muKMeans, muRecommender, muRandomForest and muHMM. muGLM, offers easy-to-use R functions for building a wide variety of generalized linear models (OLS, Logistic, Poisson, Negative Binomial, Gamma etc.) on Big Data. muEDA offers easy-to-use R functions for performing exploratory analysis on Big Data. muKMeans offers easy-to-use R functions for data clustering on Big Data using the K-means algorithm. muRecommender package is built to run natively on Hadoop, using Java MapReduce. It leverages Alternating Least Squares, Singular Value Decomposition and other complex algorithms to recommend items based on latent factors. muRandomForest constructs multiple decision trees while training a model and uses an ensemble learning method to make predictions. muHMM package identifies hidden states underneath recorded observations to build models that help in understanding the transitions between hidden states.
Mu Sigma leveraged technology from Cloudera and Revolution Analytics to build muHPC. muHPC algorithms have been written using components from Revolution Analytics' open-source RHadoop project. Hadoop integration is based on the rmr2 package, which provides Hadoop MapReduce functionality in R, and has been implemented and tested with Cloudera's distribution of Hadoop and Revolution R Enterprise.