I am looking for some insight here.
I am working on a classification project and I am fairly new to WEKA but much more comfortable with it than R or another alternative. I am doing a binary classification of imbalanced data. I am told to use 5x2 cross-validation.
My data is in an .arff file, a big set of text + class.
In explorer I can load in this text, convert it with StringToWordVector (TF-IDF and Bag of Words), then run my classifier on it. With the results I can produce a nice ROC curve and easily get the AUC. But I can only do 1x10 cross-validation.
In experimenter I can load in the data but I cannot preprocess it. I can preprocess it in explorer the same way and then save that output, I tried this but I am getting errors saying that my class is not nominal. I am not touching the data so I don't know why all of a suddent it stops working. I could then run 5x2 CV here on this data (if I can get it to load and run properly), apply the same classifier on it, but then I lose the ability to produce ROC curves, though I can still get it to produce the AUC.
Any suggestions on how I can get this to work?
[link][comment]