Naive Bayes and the Test/Training Set

Hi, I'm using a naive bayes for some basic text classification of some twitter feeds and I had a question.

If I have a training set and a test set each of 1000 randomly instances from a group of 70000. Would there be a problem if I had say instance X in the training set and then the same instance (X) appeared in the test set?

I'm curious if this would cause significant skewing or just general issues such that I want to go and ensure that there is 0 overlap between the 2 sets.

If there is a problem could someone explain to me what it might be? My reasoning behind why I think it might not be a problem is that I think the naive bayes is very simplistic in it's classification approach. Even if it's seen this exact instance in the past it's still going to calculate it's probabilities and classify based on those probabilities and not the fact it's seen this instance in the past, which it doesn't even remember anyways.

So if someone could help me clear this up I'd really appreciate it. Thanks for any insight.

submitted by hntd
[link] [10 comments]

Naive Bayes and the Test/Training Set

Trending Articles

SuperCopier Profesional v4.1.0.100 "BlueFish" (2014)

Principal’s past includes domestic violence case

The 10 Tennessee Cities With The Largest Black Population For 2021

Can I request a sedan if I book full-size luxury suv?

Shanike Mcbride

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

99 God Status for Whatsapp, Facebook

Rapist Malachi Williams in contempt for 'uncontrolled' behaviour...

Bradford County Court News 4/7/2013

Sexual Assault Alert, Man Wanted in an ongoing Sexual Assault investigation,...

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

Black Angus Grilled Artichokes

CIERA PERNELL

Teenage girl from North Devon suffered panic attacks from being...

Practice Sheet of Right form of verbs for HSC Students

[GET] Steal My $1,566.66/Month BLACK HAT SEO Method Before It Gets Saturated...

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Outlook でメールを保存または送信時に...

ESENT データベース USS.jtx で、エラーイベント ID 490、454、489、455 が記録される事象について

[BluRay] Girls’ Generation – The Best Live at Tokyo Dome