We’ve been working on several products that contain rich visual previews of third party content. An example of this is Shareaholic Recommendations, which shows thumbnails of related pages at the bottom of a publisher’s webpage.
Finding the image that best represents a webpage is no trivial task. Our research revealed that there isn’t much information floating around on this topic, and that the state of the art systems that do exist (e.g. Facebook’s thumbnail selection algorithm) are not all that sophisticated. We quickly realized that we could do better by rolling our own. Here’s how we did it.