Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62811

Clustering heterogeneous data - a question about relevant methods.

$
0
0

Hi everyone,

I've never posted here before but I thought this would be a good place to look for some guidance on a problem we're facing in our research group. Basically, we are looking to perform clustering on a dataset that has heterogeneous feature vectors. For simplicity's sake, imagine the features are "color", "quantity", and "pattern" - a mixture of nominal and ratio values. Furthermore, they can be missing. (As an additional note, we do not know the underlying struture/quantity of the clusters at all - however, they can be verified by a human).

Now, we are aware of some distance function solutions for feature vectors of this type so we have a few different modules that calculate distance functions differently. However, we cannot find many clustering algorithms that are useful to us. Many clustering algorithms seem to rely on being able to find a mean of some collection of feature vectors, which clearly does not mean anything for our data set (what's the mean of 'red' and 'blue', for example). Furthermore, we have the missingness issue to contend with - but that's less of a problem at the moment.

Mostly, I was just wondering if anyone could suggest some clustering methods or distance functions that might be well-suited for our problem. I've read quite a lot of papers and two books (not exactly exhausted my resources but I think I've got a good lay of the land) but I'm still sure there must be something fitting for us out there. For full disclosure, my background is CS and physics but I am doing research in a CS department.

Thanks!

submitted by smashstacker
[link][1 comment]

Viewing all articles
Browse latest Browse all 62811

Trending Articles