Hello all:
I need one point clarified for me regarding gibbs sampling on a dirichlet process--
I am pretty much up-to-speed on the generative portion of the algorithm:
for each datapoint to be generated: *Draw a partition, c_i, from a Chinese Restaurant Process, ployna urn, or STICK()...
*look up the parameter vector, phi_i, associated with c_i
*If there is currently no existing phi(c_i), draw one from the base measure Phi, and associate it with subsequent c_i
*generate datapoint based on F(phi_i )
For gibbs sampling, we remove one datapoint at random from his table and sample a new table from the CRP... If he was the only datapoint at his old table, we can forget the index->parameter mapping previously associated with that table. We calculate the posterior: p(x|y,z) which is a lot like saying, p(x|y, Phi_z).
What special sauce is my intuition missing here? it is not obvious to me how this will optimize the number of tables/clusters.
[link][5 comments]