Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. Carrot2 is a library and a set of supporting applications you can use to build a search results clustering engine. Such an engine will organize your search results into topics, fully automatically and without external kowledge such as taxonomies or preclassified content. Carrot2 integrates very well with both Open Source and proprietary search engines. Apart from the two main specialized document clustering algorithms( Suffix Tree Clustering and Lingo), Carrot2 offers ready-to-use [...]
Aika is an open source text mining engine that automatically extracts and annotates semantic information into text. For a case where the extracted information is ambigous Aika generates several hypothetical interpretations concerning the meaning of the text and pick the most likely one.Aika algorithm is based on various ideas and approaches from the field of AI such as artificial neural networks, frequent pattern mining and logic based expert systems. Aika is written in Java and distributed under the Apache license.Aika is based on non-monotonic logic, meaning that it first draws tentative conclusions only. In other words, Aika is able to generate [...]
Distributed Machine Toolkit is an open source project from the Microsoft Company.To generate better accuracies in various distributed Machine learning applications it requires a large number of computation resources which has become a main challenge for common machine learning researchers and practitioners. Microsoft released Microsoft Distributed Machine Learning Toolkit (DMTK), which contains both algorithmic and system innovations. These innovations make machine learning tasks on big data highly scalable, efficient, and flexible. It comprises four components.
• LightLDA: an extremely fast and scalable topic model [...]
Textable was initally developed as part of a pedagogical innovation project at the University of Lausanne but it has gained access to a new widget named Theatre Classique by simply installing Textable-Prototypes using Orange’s software.This new widget offers a straightforward way of importing theater plays from the Théâtre Classique website. Orange Textable is an open-source add-on bringing advanced text-analytical functionalities to the Orange Canvas data mining software package. It essentially enables users to build data tables on the basis of text data, by means of a flexible and intuitive interface. Textable can import text from keyboard, files, or [...]
LPU (which stands for Learning from Positive and Unlabeled data) is a text learning or classification system that learns from a set of positive documents and a set of unlabeled documents (without labeled negative documents). This type of learning is different from classic text learning/classification, in which both positive and negative training documents are required. Given a set of positive documents and a set of unlabeled documents, the LPU algorithm learns a classifier in two steps:
• Step 1 : Identifying a set of reliable negative documents from the unlabeled set. For this step, LPU has three techniques, i.e., spy, [...]