I am not at all a machine learning expert and as I am setting out on a new project I thought I would ask for some pointers.
I would like to cluster some high-dimensional trajectories. They are quite short - between 10-20 time points in a 50-dimensional space. I will have several hundred trials and expect there to be around 5-20 clusters.
What would people recommend to approach this problem? At first I thought just ignore the time structure so I have 50 dims x 10 time points = 500 features. Another approach might be to define distance between two trajectories as the sum over time points of the euclidean distances in the 50dim space and then cluster the similarity matrix. Would this be better - I don't have a good intuition for how it would differ from the full 500 feature vector? Are there any better techniques or approaches?
I actually have an external labelling of the data into 6 classes - but I would like to try an unsupervised approach to see if the same comes out of the data, or if there might be more structure - and also investigation the evolution with time of the cluster structure.
[link] [19 comments]