Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62858

newbie WEKA questions

$
0
0

Hi everyone, sorry if this isn't the right forum for this, but I have some really newbie WEKA questions.

So i'm trying to process some raw data into the ARFF format so I can experiment on it. I figured i'd go whole-hog right from the start, so I downloaded the TREC spam dataset from 2008-2009 from university of waterloo, and used ubuntu linux's htmltotext converter to convert them all (75,000) to text files, and remove the html tags.

My next step was try and use an old tool someone wrote in 2002 called TextDirectoryToArff , whose source can be found here: http://weka.wikispaces.com/ARFF+file...xt+Collections.

So I loaded it all up in eclipse, added the external weka package and it tells me that line 59: data.add(new Instance(1.0, newInst));

isn't valid, because Instance cannot be instantiated.

My questions:

  1. Is it worth even compiling TextDirectoryToArff, or am I misunderstanding how to go about converting raw text data into an arff file?

  2. If this is the right tool to be using for the job, what am I doing wrong with the file?

Thanks in advance.

submitted by WEKAnewb
[link] [11 comments]

Viewing all articles
Browse latest Browse all 62858

Trending Articles