I'm analyzing audio segments in songs. I want to find phrases: groups of segments which may repeat several times in the song. I want to hierarchically group phrases of phrases.
A segment is a slice of audio usually 100-1000ms long. Python's echonest analyzer gives me a bunch of features to work with per segment (pitch vector, timbre vector, envelope vector).
No two segments are identical, but the euclidean distance of their feature vectors will be close.
What kind of hierarchical phrase clustering algorithm works well for that?
The first thing I thought of was k-means on every segment, to reduce the dimensionality of all the data to a string of k different kinds of segment. Then using some kind of string compression algorithm that recursively looks for the most common n-grams. Thoughts?
[link][1 comment]