For a project I was working on, I made a proof-of-concept text recommendation engine
However, my project got cancelled. That said, I had a lot of fun building it and would like to continue to work on it.
- Is something like that any useful to anyone else?
- If so, in what disciplines would something like that be useful?
The basic idea is that documents that have similar "semantic" topic distributions would be similar.
Here's how it works:
- I have implemented an application that will use Latent Dirichlet Allocation to create topic models from a given corpus of data
- Subsequent document queries will have their topic distribution inferred and matched for similarity via Random Projections and Levenshtein distance
You can find an example running here: http://174.129.217.121/smarts/query/hnlinks/
If you go to hacker news, and pick a url for an article and enter it in the textbox, there is a chance similar content will show up.
EDIT: formatting
EDIT2: The training set hasn't been picked for accuracy, so the recommendation results might be inaccurate unless its articles about education, startup fanboy pieces, nerd-shoe-gaze blog posts and articles about apple and google.
EDIT3: The demo server now runs on port 80!
[link] [5 comments]