Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62888

Question: Computing similarity of useragents

$
0
0

I'm writing a program, and part of that program aims to detect if two computers connecting to a website are the same. Part of that function uses the useragent strings. I want to build a function that takes two useragents as input and returns an output of value between 0 and 1, where 0 is "no similarity" and 1 is "exactly the same".

I'm currently just tokenising the useragents and returning the Jaccard similarity (size of intersection divided by size of union of the two sets). Is there a better way? The goal is that the same browser will show with a high similartiy, particularly if something upgrades (i.e. IE8.0 gets upgraded to IE9.0).

An idea I had was to learn a the Baye's probabilities for transitions (i.e. upgrades have a high probability, downgrades a low probability) and use that to calculate an overall probability for the two user-agents being generated by the same browser. If there is a simpler method though, I'd love to hear it.

submitted by rlayton
[link] [8 comments]

Viewing all articles
Browse latest Browse all 62888

Trending Articles