I know some of the basics of the multi-armed bandit problem, but I am not sure that my problem fits that paradigm.
I am simulating a crowdsourcing platform in which I have k workers and n samples.
The samples and workers can be clustered respectively, and each type of worker will do well only on one type of sample (technically they could belong to multiple types). The workers and samples are otherwise stationary.
Starting from nothing, I would normally consider this a multi-armed bandit problem, where I want to have workers label samples to learn about them (exploration) and then once I reach a certain point of learning, use that information to send the right samples to the right workers (exploitation).
However, this seems different than a normal multi-armed bandit, where each bandit's probability is across all samples rather than a subset.
Furthermore, the clustering aspect tells us about similar workers. In other words, seeing how worker A did on sample B will inform us about how worker A* (who is very similar to A) will do on sample B* (which is very similar to B).
Does this fall under the multi-armed bandit (and if so, is there specific work addressing this kind of problem)?
If not, I plan to just develop a new active learning solution, I just wanted to rule out any existing solutions first.
[link][1 comment]