Tuesday, November 15, 2011

don't break the only way to do something

It is impossible to stay up-to-date with all of the blogs and news sites I read.  For infrequently-updated sites with a high percentage of good stuff, RSS solves the problem.  For frequently-updated sites with a lower percentage, it does not.  I don't want to sift through hundreds or thousands of articles manually each day.

Up until a few weeks ago, I knew of exactly one solution to this problem: "sort by magic" in Google Reader, which sorted the RSS entries in my feed by, well, magic.  Unfortunately, Google's much-maligned Reader update has killed the magic.  The "sort by magic" option is still there, but they should probably change its name to "sort by angry illiterate moose".  I used to see a nice mixture of posts from all of my followed sites, where posts from infrequently-updated sites usually showed up near the top, along with only the "best" posts from frequently-updated sites.  (I don't know what "best" means, but it seemed to do just fine.)  Now, I just see hundreds of posts from TechCrunch.  This is not magic.

So, now I have zero solutions to this problem.  I can no longer follow blogs like Marginal Revolution with several posts per day.  TechCrunch is out of the question.

Any suggestions?

Sunday, October 30, 2011

does playlist order matter?

Does playlist order matter?  Many people seem to think so (see High Fidelity), but I remain unconvinced.  I was inspired to resolve this issue in my mind after seeing a poster at ISMIR 2011 evaluating several algorithms for automatic playlist generation.  It turns out that there's a free dataset containing around 30,000 actual user-created playlists, covering over 200,000 songs.

What does it mean for playlist order to matter?  There are a few simple things we could look for:

  1. Some pairs of songs should occur frequently in one order, but not the other.
  2. Some pairs of songs should occur close together (in either order) more frequently than predicted by chance alone.
There's a problem with looking at pairs of songs, though: the above dataset contains over 47 billion possible song transitions and only 30,000 playlists, containing around 20 songs on average.  Possibly worse, only 9627 of the 545,867 pairs of consecutive songs (1.8%) appear more than once.
To address this lack-of-common-pairs issue, we can think about modeling a playlist as a list of artists rather than a list of songs, as this should increase the amount of repetition.  Now, instead of counting the number of times Even Flow by Pearl Jam comes after Rusty Cage by Soundgarden, we count the number of times any Pearl Jam song comes after any Soundgarden song.  This ameliorates the problem somewhat, as there are 128,778 consecutive artist pairs (23.6%) that occur at least twice in the data set.

Now, let's address #1: are there pairs of artists for which one is more likely to come first?  Here's a simple way to test this:
  • For each pair of artists, compute the probability that they appear in one order (say, Pearl Jam then Soundgarden) when they appear consecutively.  Then find the pairs of artists for which this probability is highest.  (It's necessary to use something akin to Bayesian Rating here, to avoid pairs of artists that appear a few times in one direction and zero in the other.)

  • Randomly shuffle each playlist and recompute the probabilities, again looking at the highest-probability pairs.  Do this several times.  The highest probabilities you see will give you a good sense of what can happen based on chance alone, since there is obviously no meaningful order to the randomly shuffled playlists.

  • Are there any pairs of artists with ordering probability higher than anything you saw when using the randomly shuffled playlists?  If so, we can say this pair of artists is likely to appear in one order over the other with some degree of significance.
It turns out that running this experiment yields no pair of artists that are significantly likely to appear in one order over the other.  There's still a possibility of some asymmetry, however — maybe Pearl Jam is likely to appear two songs before Soundgarden, or three, etc.  To address this, we can re-run the above test, but without the requirement that the songs are adjacent.  That is, we can compute the probability that Pearl Jam occurs earlier than Soundgarden in a playlist where both appear.  When we do this, we find that there's still no pair of artists with a preferred ordering.

Let's look at #2, then: are there pairs of artists that appear next to each other more than chance would predict?  We can adapt our above experiment for this case: just compute the probability that two artists appear consecutively (in either order) when they both show up in the same playlist, and do the same for the randomly shuffled playlists.  It turns out that now there are a few pairs of artists that tend to appear consecutively with some significance (keep in mind these playlists are from the early 2000s):
  • Ben Folds Five and Barenaked Ladies
  • Radiohead and Björk
  • Blink 182 and Weezer
  • Radiohead and Pink Floyd
So, I guess I will have to acknowledge that playlist ordering is something, and not nothing.

Monday, January 24, 2011

reverse lottery

Of course you know about the lottery, where you can pay some small amount of money for a low probability of winning a large amount of money.  And lots of businesses have promotions along the same lines: e.g., if you buy our slightly-overpriced french fries, you might win a yacht.

However, I have never seen the reverse: we're giving out free french fries, except there's a small chance you'll have to pay us $1000.  Now, a potential customer might not have $1000, so let's lower the values here to get a simpler and fairer comparison between two potential offerings:

  1. French fries cost $1.25, but one out of every five customers gets his for free.
  2. French fries are free, but one out of every five customers has to pay a $5.00 fine.
There must be a reason why businesses never offer the second type of deal.  Is it just logistical?  Or is it that people would hate this, and a 1 in 5 chance of losing $5 is worse than just giving away $1.25.  It seems that this sort of deal would not be too difficult to arrange, so I have to imagine that businesses don't offer it because people don't like it.

This brings me to my main point: a lot of law enforcement works like this "reverse lottery", and I think we're underestimating just how much people hate this.  Our goal should be punishments that exactly offset the value of the crime, and a 100% chance of getting caught.  This makes enforcement prohibitively costly, but serves as an ideal.  (The other extreme, where you get away with criminal behavior almost all of the time, but if you get caught you and your family are tortured for 20 years then killed, sounds awful to anyone.)  Certainty itself has value, and this needs to be traded off against the cost of enforcement.

Disclaimer: I don't know anything about crime statistics, and I would hope there are experts who do know what happens as you vary the level of punishment and probability of getting caught.  I'm betting that, while keeping level of punishment times probability of getting caught constant, good things happen as punishment goes down and probability of getting caught goes up, and the only reason not to move further in this direction is cost of patrol.

This blog post arose from a discussion with Alex Jaffe.

(Another way to look at things is that in my ideal scenario, there is no crime — you simply pay for the damage you do.)