Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62908

Question about LDA: How to initiate?

$
0
0

So I've read edwin chen's blogpost on the topic as well as the two other threads in here that touched upon Latent Dirichlet Allocation, but I still can't get my head wrapped around on how exactly to assign the words to each topic and how to improve the accuracy of what words belong in each topic.

From what I understand, choosing the number of topics is up to us (but i would also love to know about the best approach in choosing the # of topics in relation to the # of documents. do we assign 1 topic per document?). But lets say I have two documents where stopwords have been emitted, and I have two topics.

I know we just randomly distribute words initially, so would I do something as simple as assigning every other word in both documents to Topic 1 and Topic 2, and then start the improvement process from there?

Thanks!

edit:

or better yet, can someone explain how each word was assigned to each topic in edwin chen's example

  • I like to eat broccoli and bananas.
  • I ate a banana and spinach smoothie for breakfast.
  • Chinchillas and kittens are cute.
  • My sister adopted a kitten yesterday.
  • Look at this cute hamster munching on a piece of broccoli.

Topic A: food Topic B: animals

submitted by mangaprincess
[link][5 comments]

Viewing all articles
Browse latest Browse all 62908

Trending Articles