Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62546

Naive Bayes and the Test/Training Set

$
0
0

Hi, I'm using a naive bayes for some basic text classification of some twitter feeds and I had a question.

If I have a training set and a test set each of 1000 randomly instances from a group of 70000. Would there be a problem if I had say instance X in the training set and then the same instance (X) appeared in the test set?

I'm curious if this would cause significant skewing or just general issues such that I want to go and ensure that there is 0 overlap between the 2 sets.

If there is a problem could someone explain to me what it might be? My reasoning behind why I think it might not be a problem is that I think the naive bayes is very simplistic in it's classification approach. Even if it's seen this exact instance in the past it's still going to calculate it's probabilities and classify based on those probabilities and not the fact it's seen this instance in the past, which it doesn't even remember anyways.

So if someone could help me clear this up I'd really appreciate it. Thanks for any insight.

submitted by hntd
[link] [10 comments]

Viewing all articles
Browse latest Browse all 62546

Trending Articles