Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63349

Having trouble with WEKA - "train and test set are not compatible" - how to resolve the fact that attributes in training and testing data are different?

$
0
0

Hi ML redditors!

I have a large, unlabelled dataset of tweets with a certain hashtag, and I want to use supervised learning to label the data based on sentiment (buy, hold, sell). I want to employ Naive Bayes to categorize sentiment on the tweets (the dataset is rather large, and I don't want to have to manually categorize 50000 tweets), and I've converted the training set (manually categorized set of ~500 tweets) to a bag of words model.

I've built the training model in WEKA, but I'm getting the error "train and test set are not compatible" when I try to use the training model to classify the rest of the data. I believe the problem is because I don't have the exact same attributes in the two sets. That is, my training model has a different bag of words than my complete test set, but I'm not sure how to resolve that other than combing through my full data set and remove all words that are not in the training set.

Forgive me if I'm conceptualizing the problem incorrectly or if there's something obvious I'm overlooking, as I don't have a background in ML. Sorry in advance for my newbishness!

Thanks!

submitted by ChocolateCorgi
[link][comment]

Viewing all articles
Browse latest Browse all 63349

Trending Articles