I have hit a goldmine of data for my bachelor's thesis. A mobile gaming company is giving me full access to their database (about half a petabyte! though not all relevant to my research, of course). The only requirement from them is that the deliverable should be a segmentation of customers based on playing behavior, hopefully revealing correlations to attrition and/or monetization.
I am expecting the data to be clean and well structured. Each player will be associated with a log of activities. The relevant activites are expected to be (with time stamps) level began, result of the attempt and in-app purchases (the player can buy boosts, etc). The behaviour we are specifically interested in is both patterns of play, but also how these patterns change in response to difficulty or frustration (such as what happens to the liklihood of monetization and attrition as a player has more and more difficulty passing a series of levels? What about the pattern of difficulty?).
As I think about the problem, it occurs to me that with this relatively simple data, there are an overwhelming number of metrics. Should we look at frequency of play, regularity of play, skillfulness...? Should we treat play patterns as a time series analysis for each player? That seems interesting but unfeasible to implement. Should the data be grouped into categories such as (average time between sessions, average number of failed attempts before an extended period of play etc...) and the players then be clustered? Would this be to computationally too expensive to implement?
Has anyone seen research related to this? Where would you begin? I may be a little out of my depth here.
[link][comment]