Gensim started off as a collection of various Python scripts for the Czech Digital Mathematics Library dml.cz in 2008, where it served to generate a short list of the most similar articles to a given article (gensim = “generate similar”).
Deep Learning Software
• Efficient implementations
• Platform independent
• Converters & I/O formats
• Similarity queries
Small (<50 employees), Medium (50 to 1000 Enterprise (>1001 employees)
Gensim is a FREE Python library that has scalable statistical semantics. It analyzes plain-text documents for semantic structure and retrieve semantically similar documents. In addition, Gensim is a robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. It stands in contrast to brittle homework-assignment-implementations that do not scale on one hand, and robust java-esque projects that take forever just to run “hello world”.
Gensim is licensed under the OSI-approved GNU LGPLv2.1 license. This means that it’s free for both personal and commercial use, but if users make any modification to gensim that users distribute to other people, users have to disclose the source code of these modifications. Apart from that, users are free to redistribute gensim in any way users like, though users are not allowed to modify its license.
Genism can process large, webscale corpora, using incremental online training algorithms. There is no need for the whole input corpus to reside fully in RAM at any one time. In addition, the core algorithms in genism use highly optimized math routines. Genism also contains a distributed version of several algorithms, intended to speed up processing and retrieval on machine clusters. Being a pure Python, genism runs on Linux, Windows and OS X, as well as any other platform that supports Python and NumPy. Genism further contains memory efficient implementations to several popular data formats such as Matrix Market, SVMlight, Blei’s LDA-C. These can be used for input, output, or to convert between one another.