Sunday, November 01, 2009

PageRank for images

I was just watching a talk by Peter Norvig in which he mentions some research done at Google (paper here) on applying PageRank to image search. The idea is that instead of connections based on hyperlinks, connections between images are based on visual similarity between the two images, as computed via SIFT or some other descriptor. Then, the VisualRank of an image is some estimate of its centrality in the underlying collection of images.

I don't think this is the right way to think about image search, for a few reasons. First, and most simply, a hyperlink between two web pages has different meaning from a pair of similar images. The PageRank algorithm for web pages has an intuitive explanation: imagining a simplified web surfer randomly following web links, what is the stationary distribution over pages visited by this surfer? (Of course, it's slightly more complicated than that.) VisualRank has no such intuitive model -- instead of following web links, a surfer would have to randomly jump from one image to another similar image. This doesn't seem related to actual image browsing, and is nearly impossible on the current web as links between visually similar images aren't manifested in any way.

One could argue that it's not clear that the PageRank browsing model is the right way to think about web search either, and I would imagine much research has been done on alternative models. In essence, what is captured by both PageRank and VisualRank is a measure of centrality. I want to argue that while centrality may be a reasonable criterion for ranking web search results, it is less reasonable for ranking image search results.

In PageRank, hyperlinks are directed. I can link to CNN (and just did) without them linking back to me. This is not true of image similarities in VisualRank, which are undirected. What are the implications of this? Well, if I link to CNN and they don't link back, it is entirely possible and likely that our PageRanks will be very different. However, if two images are visually similar, they are likely to have similar VisualRanks. (This is caused not only by the symmetry of visual similarity, but also its near-transitivity -- if image A is similar to B and B is similar to C, then it is likely that A is also similar to C. One can imagine a contrived image collection in which this is not the case, but in a quick test it seemed to hold about 75% of the time. Visual similarity is much more like an equivalence relation than hyperlinking is.)

This is all fine if you want to find the single best image. But, if you want to find the best k images, you're going to get a bunch of near-duplicates of the highest ranked image, or at least a set that exhibits high redundancy. As an example, I computed the VisualRank of every image in a collection of a few thousand images of the city of Dubrovnik. Here are the images with the top 5 VisualRanks:


In contrast, and to illustrate that this isn't the only photo people take in Dubrovnik, here are the top 5 images selected by a simple summarization algorithm that also aims, indirectly, to avoid redundancy (disclaimer: algorithm is mine):


I do think centrality is a useful measurement on image collections. Alone, however, it is not a good way to order image search results unless you want homogeneity. This is less of a problem in web search in that there's nothing inherently wrong with the top set of results linking to one another. Sure, they may suffer from homogeneity of opinion to some degree, but the content of each result is different. With connections based on visual similarity, linkage among the top ranked images means some of them are providing no new information at all.