Hi All - I'm fairly new to R and am playing around with some of the UCI datasets. I'm looking for a couple of R data mining examples. Specifically, I'm looking for a kmeans example in R where k=N (any value). I am also looking for R decision tree example where the data set is split into training and test datasets of specific sizes. I know these have to be online, but the examples I'm finding either A) aren't defining N (for the clustering), B) not splitting into training/testing (for classification), or C) maybe doing some of this, but I'm not well versed in R enough to see. Any help is appreciated.
EDIT/UPDATE:
I think I have the clustering thing in terms of the K=N thing - I feel silly.
km10 <- kmeans(yeast2,10)
km4 <- kmeans(yeast2,4)
I know how to look at the clusters in terms of the sizes, but how can I plot them and the centroids? I tried and it's just ugly meaningless graphs.
EDIT/UPDATE 2:
Still working on the clustering...still stuck, but moving on to classification, I was able to get it (somewhat) with the following code, but still confused on how to see the accuracy of the model:
library(rpart)
yeast <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/yeast/yeast.data")
colnames(yeast)[1] <- "SequenceName"
colnames(yeast)[2] <- "Mcg"
colnames(yeast)[3] <- "Gyh"
colnames(yeast)[4] <- "Alm"
colnames(yeast)[5] <- "Mit"
colnames(yeast)[6] <- "Erl"
colnames(yeast)[7] <- "Pox"
colnames(yeast)[8] <- "Vac"
colnames(yeast)[9] <- "Nuc"
colnames(yeast)[10] <- "Class"
yeast.df <- data.frame(yeast)
set.seed(2568)
n <- nrow(yeast.df)
train <- sort(sample(1:n, floor(n*.7)))
yeast.train <- yeast.df[train,] yeast.test <- yeast.df[-train,]
fit <- rpart(Class ~ Mcg + Gyh + Alm + Mit + Erl + Pox + Vac + Nuc,method="class", data=yeast.train)
plot tree
plot(fit, uniform=TRUE, main="Classification Tree for Yeast")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
pred <- predict(fit, yeast.test, type="class")
pred
[link][comment]