Reviews

Now Reading

Vowpal Wabbit

Next
Prev

Review

Vowpal Wabbit

Overview

Synopsis

The Vowpal Wabbit (VW) project is a fast out-of-core learning system sponsored by Microsoft Research and (previously) Yahoo! Research.

Category

Data Mining Software Free

Features

•Input format
•Speed
•Scalability
•Feature pairing

License

Open Source

Price

Free

Pricing

Subscription

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)

Website

Vowpal Wabbit

Company

Vowpal Wabbit

What is best?

•Input format
•Speed
•Scalability
•Feature pairing

What are the benefits?

•Input format
•Speed
•Scalability
•Feature pairing
•C++ Compiler optimization

PAT Rating™

Editor Rating

Aggregated User Rating

Rate Here

Ease of use

7.4

8.4

Features & Functionality

7.5

8.2

Advanced Features

7.4

7.8

Integration

7.5

—

Performance

7.6

—

Customer Support

7.6

8.1

Implementation

—

Renew & Recommend

—

Bottom Line

There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation

7.5

Editor Rating

8.1

Aggregated User Rating

3 ratings

You have rated this

The Vowpal Wabbit (VW) project is a fast out-of-core learning system sponsored by Microsoft Research and (previously) Yahoo! Research. Support is available through the mailing list. There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm.

This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function (several are available), The code should be easily usable. Its only external dependence is on the boost library, which is often installed by default.

The learning algorithm is pretty fast---similar to the few other online algorithm implementations out there. As one datapoint, it can be effectively applied on learning problems with a sparse terafeature (i.e. 1012 sparse features). As another example, it's about a factor of 3 faster than Leon Bottou's svmsgd on the RCV1 example in wall clock execution time. Subsets of features can be internally paired so that the algorithm is linear in the cross-product of the subsets. This is useful for ranking problems.

David Grangier seems to have a similar trick in the PAMIR code. The alternative of explicitly expanding the features before feeding them into the learning algorithm can be both computation and space intensive, depending on how it's handled. This learning system is extremely reliable and easy to use.

Filter reviews

1 Reviews

Leave a Review

Bryan Shoger
September 11, 2017 at 5:28 pm

Fast out-of-core learning system
Company size
Medium (50 to 1000)

User Role
Consultant
User Industry
Manufacturing
Rating
Ease of use8.1
Features & Functionality8.2
Advanced Features8.2
Customer Support8.1

ADDITIONAL INFORMATION
Vowpal Wabbit is a fast out-of-core learning system designed to exceed the capacity of any single system interface amongst learning algorithms. The learning algorithm is significantly flexible than might be expected in terms of free form text, which is interpreted as a bag-of-words model. Multiple sets of free form text are available in different namespaces. Vowpal Wabbit is fast and can be effectively applied to learning challenges as one datapoint with a sparse tera feature. The system has similarities to few other online algorithm implementations available. The memory footprint of the program is bounded independently of data, meaning the training set is not loaded into the main memory before learning starts. Additionally, the size of feature sets are bounded independent of training data using the hashing trick. Features can be internally paired so that the algorithm is linear in the cross-product of the subsets which is useful for ranking problems. The platform includes a default C++ compiler optimization flag which performs resolutely.