Having trouble with WEKA - "train and test set are not compatible" - how to resolve the fact that attributes in training and testing data are different?

Hi ML redditors!

I have a large, unlabelled dataset of tweets with a certain hashtag, and I want to use supervised learning to label the data based on sentiment (buy, hold, sell). I want to employ Naive Bayes to categorize sentiment on the tweets (the dataset is rather large, and I don't want to have to manually categorize 50000 tweets), and I've converted the training set (manually categorized set of ~500 tweets) to a bag of words model.

I've built the training model in WEKA, but I'm getting the error "train and test set are not compatible" when I try to use the training model to classify the rest of the data. I believe the problem is because I don't have the exact same attributes in the two sets. That is, my training model has a different bag of words than my complete test set, but I'm not sure how to resolve that other than combing through my full data set and remove all words that are not in the training set.

Forgive me if I'm conceptualizing the problem incorrectly or if there's something obvious I'm overlooking, as I don't have a background in ML. Sorry in advance for my newbishness!

Thanks!

submitted by ChocolateCorgi
[link][comment]

Having trouble with WEKA - "train and test set are not compatible" - how to resolve the fact that attributes in training and testing data are different?

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...