Diversity Measurement of Recommender Systems under Different User Choice Models

Imagine a recommender system that advises, for instance, movies to users. How and what you recommend to users, affects what movies they will watch, and how users behave when presented with recommendations affects what the system will recommend in the future. In other words, depending on what is recommended and how users behave, the dynamics of the whole system can become very different. What we investigate is the impact of these behaviours (both from the system and user side), and we are particularly interested in how diversity changes as time passes.

The particular questions we addressed in our ICWSM paper were:

  • What happens if we force users to rate a certain number of items in a period of time (e.g., everyone rates 5 movies a week)? Such a restriction is an example of how the owner of the recommender can ‘influence’ the behaviour of users.
  • What is the eff ect of changing the type of information that a recommender gives to users? For example, a recommender can show to the user either most popular movies, or movies that match best the user’s preferences.

We considered diversity defined in various ways. We investigate, for example, entropy of rated movies, overall variance of ratings, variance of ratings per user, etc. These diversity values were measured using simulations that were based on the Netflix dataset (hence the use of the word ‘movies’ as opposed to the more general term ‘items’).

The main idea behind the simulation was that the circular process illustrated above can be broken down into ’rounds’. In one round, which we considered to be a month, a number of users rate a number of movies, and at the end of the month, the recommender is retrained. How users react to recommendations was simulated using several choice models. For instance, in one simulation, users selected and rated those and only those movies that the system recommended to them (we called them ‘Yes-men’). In another simulation, users accepted everything the system offered, but everyone had to rate the same number of movies (‘Uniform Yes-men’). At the other end of the scale were simulations with ‘Randomisers’, where users seemed to select and rate movies randomly.

All simulations were tied to the Netflix dataset. For instance, the notion of a month came from the data, the list of available movies per month as well, and ‘true ratings’ were calculated using the same dataset.

Simulation outline (red arrows demonstrate a single round of simulations)

As for the results, we have found that forced uniformity in terms of number of items rated does not necessarily result in users becoming more uniform, and the mean ratings they give to items will decrease, indicating lower system performance as perceived by the user. Also, we have identified how three kinds of choice models, i.e., Yes-men, Trend-followers (always watch currently highest rated movies) and Randomisers, result in different diversity and mean rating values. This is a particularly important result as these behaviours can directly be encouraged by recommender system owners, e.g., we might decide to offer more trending items to one particular kind of users, while other users might need different kind of recommendations.

Zoltán Szlávik, Wojtek Kowalczyk and Martijn Schut carried out this work while at the Computational Intelligence research group, VU University Amsterdam.

Paper: Z. Szlávik, W. Kowalczyk, and M.C. Schut. Diversity measurement of recommender systems under different user choice models. In ICWSM, 2011.
Slides of a talk based on the paper: http://prezi.com/1rrwhprwzokl/make-your-recommender-more-effective/