So I have a problem where I have a graph with N nodes and a distance function defined over any two pairs of nodes (x,y). Computing this distance function is reasonably expensive. Note that I do not have a vector representation for each node, I only have a distance function.
My interest is in getting something like normalized graph cuts. I.e. I want a way of cutting my graph into roughly connected components.
One strategy that I have tried is picking a cutoff k and then doing a search for connected components. In practice this scales in close to linear time, as it quickly puts many of the nodes into a few clusters and they don't need to be searched multiple times (so in practice I only need to compute a tiny fraction of the whole distance matrix over all nodes). The results are also reasonably good for my task, but it is brittle because of the hard threshold.
Alternatively, I've tried doing spectral clustering and DBSCAN. These also work well and are much less britle than a hard search. However they require me to compute the full distance matrix, which is prohibitively expensive.
One way I could do it is by first running a search for connected components at threshold k1. Then try to merge the clusters using a different (and lower) threshold k2, but requiring at least two matches to merge the clusters.
Does anyone have any ideas on better approaches for this problem?
[link][4 comments]