Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62716

Would you use a LDA based Topic Modeling library in Java which handled a few million documents on a single 16 GB Machine

$
0
0

I am running a prototype which I developed in Java to perform LDA on a few million documents in Java. I have personally found it very useful as most LDA implementations in Java or R or Python either run out of memory for a few thousand documents or run down to a crawl.

I am planning on open sourcing it but I still have to add the licensing text in my source files and create some documentations. I was curious if there would be any interest in such as library. Or are people using LDA content with what is out there in the Open Source space.

Edit : Forgot to add that for 500 topics on 2 million documents I am getting a performance of approximately 5 hours for 1000 iterations on EC2 High Memory Instance with Java Max Heap Memory set as 10GB.

submitted by textml2730
[link][comment]

Viewing all articles
Browse latest Browse all 62716

Trending Articles