Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62956

categorizing labels using k-means clustering for a SVM model

$
0
0

Hi,

I have data from two types (classes) of networks: net1, net2. Each network has 100 data instances. Each instance has twenty features and one output label.

My aims are two-fold:

  • build a classification model and
  • rank the features using fisher test and AUC

The labels for the data are floating numbers: 47.23, 67.5 etc. It can range from 0.0 to 100.0. If I use the labels as is, the prediction accuracy is, predictively, too low. I want to create bins to categorize these labels for a given range using k-means clustering algorithm. The number of labels hence will be the number of clusters I mention for the k-means algorithm. Once I build the model using k-fold cross validation, I will compare the performance by using two different kernels to start with: linear and RBF.

I want to repeat the steps 1 and 2 for three cases:

  • data from only net1,
  • data from only net2 and
  • data from net1 and net2 (I will use +ve and -ve suffixes to separate the data from the two classes)

and observe which features rank high. Feature ranking using Linear SVM [1] lists 4 methods for ranking that include fisher test and AUC.

ML isn't my area of expertise. Hence, I would like to hear from the reddit ML community if this is the correct approach. I would love to hear suggestions and comments.

I started out with LibSVM but I didn't find it flexible to change the c and gamma parameters while using cross validation. I'm now using scikit-learn package in Python.

Thanks a lot!

[1] Feature Ranking Using Linear SVM - http://core.kmi.open.ac.uk/download/pdf/16008.pdf#page=61

submitted by bkamapantula
[link][comment]

Viewing all articles
Browse latest Browse all 62956

Trending Articles