ECCV 2008 happened over a month ago, but it's not too late for me to post a summary of some of my favorite papers from the conference, as well as my own paper. Let's start with my paper:
Scene Segmentation Using the Wisdom of Crowds
Ian Simon and Steven M. Seitz
There are many cues one could use when segmenting images, such as color, edges, recognizing objects, etc. Here we ignore all of these cues and segment 3D scenes based on the distribution of photos taken at the scenes (downloaded from Flickr). The basic idea is that people do not take photos simply by pointing the camera randomly, but take pictures "of" interesting objects. We effectively treat each photo as a vote that all of the scene points appearing in this photo belong to the same object. Of course, this is not precisely true for most photographs, but by combining information from multiple photographers, we can get accurate 3D segmentations.
Here are some other papers I liked:
Learning to Localize Objects with Structured Output Regression
Matthew B. Blaschko and Christoph H. Lampert
Object localization is usually done by training a classifier on positive and negative image regions, then running this classifier in sliding-window fashion on a new image. This paper proposes training directly for the localization task using a structured SVM.
Integration of Multiview Stereo and Silhouettes via Convex Functionals on Convex Domains
Kalin Kolev and Daniel Cremers
Several previous papers have tried to combine photoconsistency and silhouettes. The key insight here is a way to express silhouette constraints over a voxel grid in a way that yields a simple convex relaxation. There's no proof of a meaningful performance guarantee relative to the optimal solution of the discrete problem, but it's still cleaner than any of the other papers I've seen that address silhouettes in multiview stereo.
Image Segmentation by Branch-and-Mincut
Victor Lempitsky, Andrew Blake, and Carsten Rother
Suppose you're trying to segment a particular object with unknown pose in an image. For fixed pose, the problem can be solved with a graph cut. This paper describes a branch-and-bound search through a tree of hierarchically-clustered poses for the optimal pose and segmentation. The important observation is that a lower bound on the quality of the optimal solution in a particular subtree can be computed with a single graph cut.
What is a Good Image Segment? A Unified Approach to Segment Extraction
Shai Bagon, Oren Boiman, and Michal Irani
This paper proposes a simple criterion for segmentation: a segment should be easily composable using its own pieces, but difficult to compose from pieces outside the segment. The algorithm implied by this criterion sacrificies speediness for elegance, but I think there is value in figuring out the right thing to optimize, even if actually optimizing it proves impractical. Of course, it's not clear that composability is the right thing, and it would be interesting to compare against human segmentations.