I'm working on a project where I have multiple users who have each ranked a long list of items to identify their top 5. If I focus on one user (the focus user), what is a good metric for finding other users who likely have the most similar preferences.
For example, imagine that the focus user has expressed that their top 5 preferences are:
[3, 7, 2, 11, 322]
In this order (3 being the most preferred)
User 2 has preferences:
[2, 7, 103, 13, 3]
User 3 has preferences:
[7, 3, 86, 44, 322]
I'm trying to find a metric that will allow me to see which users have the closest preferences to the focus user. Because the goal of the analysis is to find the most favored preferences, ideally there should be a weighting toward the top. i.e. if the test user most prefers item 3, then that scores more highly than if the test user also prefers item 322 in position 5.
Ideally too it should be fuzzy, so if a test user prefers item 3 in position 2, while the focus user prefers item 3 in position 1, this should count for something.
Simple methods of counting overlapping sets between the target and test user work for a first approximation, but these don't account for the relative rankings nor do they give additional priority to the higher ranked items.
This problem becomes worse when I have a longer list of preferences, say the top 50 items. I really care if two users have a common top 5, but don't care so much if they have an identical ordering of their 40th-50th list--these are much less important.
Any ideas?
[link][2 comments]