Sunday, October 30, 2011

does playlist order matter?

Does playlist order matter?  Many people seem to think so (see High Fidelity), but I remain unconvinced.  I was inspired to resolve this issue in my mind after seeing a poster at ISMIR 2011 evaluating several algorithms for automatic playlist generation.  It turns out that there's a free dataset containing around 30,000 actual user-created playlists, covering over 200,000 songs.

What does it mean for playlist order to matter?  There are a few simple things we could look for:

  1. Some pairs of songs should occur frequently in one order, but not the other.
  2. Some pairs of songs should occur close together (in either order) more frequently than predicted by chance alone.
There's a problem with looking at pairs of songs, though: the above dataset contains over 47 billion possible song transitions and only 30,000 playlists, containing around 20 songs on average.  Possibly worse, only 9627 of the 545,867 pairs of consecutive songs (1.8%) appear more than once.
To address this lack-of-common-pairs issue, we can think about modeling a playlist as a list of artists rather than a list of songs, as this should increase the amount of repetition.  Now, instead of counting the number of times Even Flow by Pearl Jam comes after Rusty Cage by Soundgarden, we count the number of times any Pearl Jam song comes after any Soundgarden song.  This ameliorates the problem somewhat, as there are 128,778 consecutive artist pairs (23.6%) that occur at least twice in the data set.

Now, let's address #1: are there pairs of artists for which one is more likely to come first?  Here's a simple way to test this:
  • For each pair of artists, compute the probability that they appear in one order (say, Pearl Jam then Soundgarden) when they appear consecutively.  Then find the pairs of artists for which this probability is highest.  (It's necessary to use something akin to Bayesian Rating here, to avoid pairs of artists that appear a few times in one direction and zero in the other.)

  • Randomly shuffle each playlist and recompute the probabilities, again looking at the highest-probability pairs.  Do this several times.  The highest probabilities you see will give you a good sense of what can happen based on chance alone, since there is obviously no meaningful order to the randomly shuffled playlists.

  • Are there any pairs of artists with ordering probability higher than anything you saw when using the randomly shuffled playlists?  If so, we can say this pair of artists is likely to appear in one order over the other with some degree of significance.
It turns out that running this experiment yields no pair of artists that are significantly likely to appear in one order over the other.  There's still a possibility of some asymmetry, however — maybe Pearl Jam is likely to appear two songs before Soundgarden, or three, etc.  To address this, we can re-run the above test, but without the requirement that the songs are adjacent.  That is, we can compute the probability that Pearl Jam occurs earlier than Soundgarden in a playlist where both appear.  When we do this, we find that there's still no pair of artists with a preferred ordering.

Let's look at #2, then: are there pairs of artists that appear next to each other more than chance would predict?  We can adapt our above experiment for this case: just compute the probability that two artists appear consecutively (in either order) when they both show up in the same playlist, and do the same for the randomly shuffled playlists.  It turns out that now there are a few pairs of artists that tend to appear consecutively with some significance (keep in mind these playlists are from the early 2000s):
  • Ben Folds Five and Barenaked Ladies
  • Radiohead and Björk
  • Blink 182 and Weezer
  • Radiohead and Pink Floyd
So, I guess I will have to acknowledge that playlist ordering is something, and not nothing.