Dear reddit,
I am a compsci student from Germany and currently working on a project that will most likely be part of my bachelors thesis. I am having troubles finding an algorithm/method that allows me to do the following:
There will be a set of objects (up to 5,000 per Set) which will be presented to the user. The user will select exactly one of these objects from the set depending on the context – he will choose the object which fits best into the current context.
The system should learn over time what object the user selected within what context and later present the most suitable object for the current context at the top of the list of available objects.
Now here is an example: The user has loaded a resource containing numerous items (Strings) into his the system (a texteditor). After the user initiated a special auto-complete mode, the system should present a list of these items (Strings) and should order them depending on how suitable the Strings are for the current context (before and after the cursor position). The system has learned what Strings are most suitable for the context.
I was first thinking of using an naïve bayes approach by using the tokenized context as featureset and the items from the resource as available classes. I am not sure if this is a valid approach. Here are the problems I am facing:
- The number of available classes (items) can be very large.
- The context will be different every time. It is most unlikely, that the exact same featureset will occure more than once.
- The algorithm will have to be suitable for incremental learning: The system will start off without any data about the user's preferences and learn them over time, while the user is working with the system.
Do you have any suggestions as to what methods would be applicable to the task I've just described? Obviously performance is an issue – the system should not take too long for the user to notice, before the ordered list of items is presented.
Thanks in advance
[link] [comment]