Planet Zorgnine: November 2009

How is it that everyone has so much to say they have trouble deciding what not to say? I'm thinking of this quote, attributed to Blaise Pascal:

I made this letter longer than usual because I lack the time to make it shorter.

This gets brought up in the context of presentations, papers, conversations, code, art, anything that requires some amount of creativity. The hard part, as it goes, is condensing the huge number of things you'd like to say into a smaller coherent form, often to satisfy some external constraint on time or space. If left to my own devices, a person of this mindset must think, I would emit an endless stream of wisdom, but alas, others are unwilling or unable to process my brilliance in its entirety. All of my effort must therefore go toward eliminating all but the very best of what I have to offer. And these are gems I'm discarding.

Never once in my life have I had this experience. Coming up with enough to say has always been a struggle for me. While other people are cutting stuff out left and right, I'm trying to come up with just enough to avoid outright embarrassment. Am I just not creative? Not prolific? Or is my threshold for what is worth saying much higher? I wouldn't say it's that high, since I'm blogging about this right now.

Incidentally, So Much to Say is possibly the only Dave Matthews Band song I like.

I was just watching a talk by Peter Norvig in which he mentions some research done at Google (paper here) on applying PageRank to image search. The idea is that instead of connections based on hyperlinks, connections between images are based on visual similarity between the two images, as computed via SIFT or some other descriptor. Then, the VisualRank of an image is some estimate of its centrality in the underlying collection of images.

I don't think this is the right way to think about image search, for a few reasons. First, and most simply, a hyperlink between two web pages has different meaning from a pair of similar images. The PageRank algorithm for web pages has an intuitive explanation: imagining a simplified web surfer randomly following web links, what is the stationary distribution over pages visited by this surfer? (Of course, it's slightly more complicated than that.) VisualRank has no such intuitive model -- instead of following web links, a surfer would have to randomly jump from one image to another similar image. This doesn't seem related to actual image browsing, and is nearly impossible on the current web as links between visually similar images aren't manifested in any way.

One could argue that it's not clear that the PageRank browsing model is the right way to think about web search either, and I would imagine much research has been done on alternative models. In essence, what is captured by both PageRank and VisualRank is a measure of centrality. I want to argue that while centrality may be a reasonable criterion for ranking web search results, it is less reasonable for ranking image search results.

In PageRank, hyperlinks are directed. I can link to CNN (and just did) without them linking back to me. This is not true of image similarities in VisualRank, which are undirected. What are the implications of this? Well, if I link to CNN and they don't link back, it is entirely possible and likely that our PageRanks will be very different. However, if two images are visually similar, they are likely to have similar VisualRanks. (This is caused not only by the symmetry of visual similarity, but also its near-transitivity -- if image A is similar to B and B is similar to C, then it is likely that A is also similar to C. One can imagine a contrived image collection in which this is not the case, but in a quick test it seemed to hold about 75% of the time. Visual similarity is much more like an equivalence relation than hyperlinking is.)

This is all fine if you want to find the single best image. But, if you want to find the best k images, you're going to get a bunch of near-duplicates of the highest ranked image, or at least a set that exhibits high redundancy. As an example, I computed the VisualRank of every image in a collection of a few thousand images of the city of Dubrovnik. Here are the images with the top 5 VisualRanks:

In contrast, and to illustrate that this isn't the only photo people take in Dubrovnik, here are the top 5 images selected by a simple summarization algorithm that also aims, indirectly, to avoid redundancy (disclaimer: algorithm is mine):

I do think centrality is a useful measurement on image collections. Alone, however, it is not a good way to order image search results unless you want homogeneity. This is less of a problem in web search in that there's nothing inherently wrong with the top set of results linking to one another. Sure, they may suffer from homogeneity of opinion to some degree, but the content of each result is different. With connections based on visual similarity, linkage among the top ranked images means some of them are providing no new information at all.

Planet Zorgnine

Monday, November 16, 2009

so much to say

Sunday, November 01, 2009

PageRank for images

About Me

Archive