Reviews
Now Reading
MALLET
1
Review

MALLET

Overview
Synopsis

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Category

Data Mining Software Free

Features

•Java-based package for statistical natural language processing, document classification, clustering, topic modeling, •Information extraction, and other machine learning applications to text
•Provides tools for sequence tagging
•Routines for transforming text documents into numerical representations
•Add-on package called GRRM
•Open Source Software

License

Open Source

Price

Free

Pricing

Subscription

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)

Website
Company

MALLET

What is best?

•Java-based package for statistical natural language processing, document classification, clustering, topic modeling, •Information extraction, and other machine learning applications to text
•Provides tools for sequence tagging
•Routines for transforming text documents into numerical representations
•Add-on package called GRRM

What are the benefits?

• Perform document classification easily
• Transform text to numerical representations
• Optimize numerical representations
• Analyze unlabeled text
• Access an arbitrary graphical structure

PAT Rating™
Editor Rating
Aggregated User Rating
Rate Here
Ease of use
7.6
9.1
Features & Functionality
7.6
4.1
Advanced Features
7.6
8.1
Integration
7.6
8.2
Performance
7.6
Customer Support
7.6
Implementation
Renew & Recommend
Bottom Line

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

7.6
Editor Rating
7.4
Aggregated User Rating
3 ratings
You have rated this

MALLET known as Machine Learning for LanguagE Toolkit is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Sophisticated tools for document classification are provided - efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. It also provides tools for sequence tagging for applications such as named-entity extraction from text.

Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields and all of these methods are implemented in an extensible system for finite state transducers. In order of analyzing large collections of unlabeled text, topic models are very useful.

These modeling toolkits contain efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA and MALLET includes an efficient implementation of Limited Memory BFGS, among many other optimization methods. Routines for transforming text documents into numerical representations are also included.

This process is implemented through a flexible system of "pipes", which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors. MALLET also has an add-on package called GRRM which contains support for inference in general graphical models, and training of CRFs with arbitrary graphical structure.

The toolkit is Open Source Software, and is released under the Common Public License. Users can also import data through MALLET and there are two methods for it - first when the source data consists of many separate files, and second when the data is contained in a single file, with one instance per line.

Filter reviews
User Ratings





User Company size



User role





User industry





1 Reviews
  • Darci Thorman
    September 11, 2017 at 11:32 am

    Statistical natural language processing, document classification, clustering

    Company size

    Enterprise (>1001)

    User Role

    Executive

    User Industry

    Education

    Rating
    Ease of use8.2

    Features & Functionality8.1

    Advanced Features8.1

    Integration8.2

    ADDITIONAL INFORMATION
    MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
    MALLET provides these sophisticated tools for document classification: efficient routines for converting text to features; a wide variety of algorithms such as Naïve Bayes, Maximum Entropy, and Decision Trees; and code for evaluating classifier performance using several commonly used metrics.
    MALLET also features tools for sequence tagging for applications such as named-entity extraction from text. Such algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers. The MALLET topic modeling toolkit contains efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA. These topic models are useful for analyzing large collections of unlabeled text. Many of the algorithms in MALLET depend on numerical optimization using optimization methods such as Limited Memory BFGS, among many others. MALLET’S sophisticated Machine Learning applications also provide routines for transforming text documents into numerical representations that can then be processed efficiently. GRMM, an add-on package to MALLET contains support for inference in general graphical models and training of CRFs with arbitrary graphical structure. MALLET is Open Source Software released under the Common Public License and can be used for research or commercial purposes under the terms of the license.

Ease of use
Features & Functionality
Advanced Features
Integration
Performance
Customer Support
Implementation
Renew & Recommend

What's your reaction?
Love It
0%
Very Good
0%
INTERESTED
0%
COOL
0%
NOT BAD
0%
WHAT !
0%
HATE IT
0%