The "Programming with Big Data in R" project (pbdR) is a set of highly scalable R packages for distributed computing and profiling in data science.
Statistical Software Free
• High Performance
• High-Level Interfaces To Mpi
Small (<50 employees), Medium (50 to 1000 employees), Enterprise (>1001 employees)
The "Programming with Big Data in R" project (pbdR) is a set of highly scalable R packages for distributed computing and profiling in data science. The "Programming with Big Data in R" project (pbdR) offers packages which includes Application, Communication, Computation, Developers, I/O and Profiling.
Application packages are specifically purposed packages that utilize pbdR such as pbdML which are machine learning algorithms, using pbdDMAT; cubfits for estimating mutation and selection coefficients on synonymous codon bias usage based on models of ribosome overhead cost (ROC); pmclust which are tools for parallel model-based clustering and includes k-means and Gaussian mixture modeling, and can be applied to ad hoc distributed matrices as well as pbdDMAT conformable ones.
The pbdDEMO which is a collection of pbdR package demonstrations and examples as well as a lengthy, textbook-style vignette to help quickly move R programmers from their laptops to distributed platforms. Communication Pacakges are tools for handling multi-machine communication which includes the pbdRPC which is a remote Procedure Call and a very light implementation yet secure for remote procedure calls with unified interface via ssh (OpenSSH) or plink/plink.exe (PuTTY).
The remoter which is a collection of utilities for performing R computations on a remote resource; the pbdCS which is a client/server framework for pbdR; and the pbdZMQ which is a set of bindings for the well-known ZeroMQ communication library. Computation Packages are frameworks for building other scalable tools like pbdDMAT and pbdMPI. Developers Packages are Tools for developers such as pbdTEST, pbdBASE and pbdSLAP. I/O Packages are for large, scalable I/O. Profiling packages are Performance analysis packages.