Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62787

[homework] trouble with nltk python NaiveBayesClassifier, I keep getting same probabilities

$
0
0

Hey /r/machinelearning--

I don't see too many [homework] posts here, so I hope I'm not in the wrong sub. If so, please point me to a better option.

so I'm working on a project in which i take in anime names and genres and if they are relevant or irrelevant I am trying to build a NaiveBayesClassifier with that and then I want to pass in genres and for it to tell me if it is relevant or irrelevant I currently have the following:

import nltk trainingdata =[({'drama': True, 'mystery': True, 'horror': True, 'psychological': True}, 'relevant'), ({'drama': True, 'fantasy': True, 'romance': True, 'adventure': True, 'science fiction': True}, 'unrelevant')] classifier = nltk.classify.naivebayes.NaiveBayesClassifier.train(trainingdata) classifier.classify({'Fantasy': True, 'Comedy': True, 'Supernatural': True}) prob_dist = classifier.prob_classify(anime) print "relevant " + str(prob_dist.prob("relevant")) print "unrelevant " + str(prob_dist.prob("unrelevant")) 

I currently have :

size of training array:110 the relevant length 57 the unrelevant length 53 

Some results I receive :

relevant Tantei Opera Milky Holmes TD {'Mystery': True, 'Comedy': True, 'Super': True, 'Power': True} relevant 0.518018018018 unrelevant 0.481981981982 relevant Juuou Mujin no Fafnir {'Romance': True, 'Fantasy': True, 'School': True} relevant 0.518018018018 unrelevant 0.481981981982 

I was wondering if that makes sense... Since I am getting the same relevant probability for each classification it makes.. From my understanding of Naive Bayes it shouldn't be doing that...

using python, nltk Thanks!

submitted by sayan5678
[link][2 comments]

Viewing all articles
Browse latest Browse all 62787

Trending Articles