I am working on a system where you could assess the value of a new worker in a crowdsourcing platform on a new sample. To state it more formally:
Given a set of workers W which have each already labeled a set of samples S (for which we know the actual ground truth), we want to know the probability that a new worker w will get a new sample s correct. Assume that we know the similarity/distance between s and each sample in S and the similarity/distance between w and each worker in W.
Would this just simply be something like:
p(y | x) = 1/k Σ (e-D(w,Wi) 1/n Σ (e-D(s,Sx) * (1 - error(Wi on Sx))))
where k is the number of workers in W and n is the number of samples in S, and D(x1,x2) is the distance between x1 and x2
Or am I missing something?
Furthermore, suppose I have the option of choosing one of several new workers based on their probability of being correct with this sample. Do I just add an argmax?
[link][5 comments]