Quantcast
Viewing all articles
Browse latest Browse all 62693

help with Multiclass classification using WEKA

So after a few months and some hard work and learning, I've progressed to multiclass classifiers. The problem my research professor has given me now is to classify the dataset used in Blitzer, et al (though I likely will not achieve anything like the results that some of the research teams have, given my inexperience). Problem number one: using the standard, textdirectoryloader, WEKA wants to take in data that is: A. in raw text form of some sort, rather than pre-organized into some other sort of vector or feature file. and B. binary in nature

Unfortunately, the datasets provided in the Blitzer, et all are what appear to be feature vectors (you can go look for yourself). After doing some somewhat rigorous searches, I am no more sure how to change these feature vectors into arff format, or how to arrange the directory structure. I tried arranging the directory structure as a tree, for the most part, and it complained about all the branches, all the way down about how it was not binary. Even with just two simple branches on one category, it didn't recognize the sparse vector nature of the data and simply assumed that the_dog:3 was a single feature, rather than something with its frequency already specified.

So what i'm asking for here is: how do I get WEKA to take in the data in the format provided, and how do I arrange the directories in this case so that it can be used with a multiclass classifier?

You can see the datasets (processed) at www.cs.jhu.edu/~mdredze/datasets/sentiment/

I was trying to use the acl one as apposed to the stars one. Thank you very much.

submitted by WEKAnewb
[link] [12 comments]

Viewing all articles
Browse latest Browse all 62693

Trending Articles