Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63814

Power Law Effect with Pitman-Yor Process

$
0
0

I know about PYP and that they produce power law effect and am looking at some of the graphs, like:

http://deliveryimages.acm.org/10.1145/1900000/1897842/figs/f1.jpg

and wondering how would you draw the PYP curve given a corpus of English words? The black curve corresponding to English Text is trivial, but the DP or PYP are not. Help me please!

I know that the discount parameter is the key element that distinguishes DP and PYP. But don't know how to relate the probability that PYP assigns to each cluster, to the word frequencies? A simple guess is that they both have a decaying behavior: where in english text there are a few number of words that occur a lot, and the majority of words only occur a few times. The same thing happens in PYP where the there are a few famous clusters (with lots of customers) and the the majority of the clusters only have a few customers. This decaying effect is faster in DP compared to PYP. But my main question is how one can associate words to clusters and draw a graph like what I pointed above?

submitted by koormoosh
[link][comment]

Viewing all articles
Browse latest Browse all 63814

Latest Images

Trending Articles



Latest Images