I've applied the common clustering algorithms (kmeans, dbscan, hierarchical) to small datasets (n<1000) with the intention of classifying customer use cases. All of the data is continuous. I found the results of k-means and other algorithms unsatisfying for one particular reason. That reason being that if two points are mutually nearest neighbors (they are closer to each other than to any other point), then they should belong to the same cluster. I realize that most algorithms are trying to minimize the intra-cluster distance and there is no requirement that two mutually nearest neighbors fall into the same cluster. Have I overlooked a method that considers this requirement or something similar? I know that this might not be the conventional usage of the term mutual nearest neighbors but I couldn't come up with a better description.
I wrote an agglomerative script in matlab based on this idea and it seems to perform reasonably well for small data. An added feature of this method is that the clusters cover the spread in the data rather than focusing on the most densely populated areas. I realize this feature would be considered a flaw in most clustering applications.
Any insight would be appreciated. Thanks.
[link][comment]