Thursday, February 28, 2008

medoids

Suppose I have a set of points. The mean of this set of points is the point (not necessarily in the set) which minimizes the sum of its squared distances to all the points in the set. A related concept is the medoid, which is like the mean but is restricted to being one of the points in the set.

Interesting fact: the sum of squared distances from the medoid to all points in the set cannot be more than twice the sum of squared distances from the mean to all points in the set.

This imples that for the k-medoids problem, if you have an algorithm that achieves some constant times the optimal solution cost, this same algorithm also achieves a constant factor approximation for the k-means problem.

Again, this is probably trivial if you're a theory person, but I think it's fun.