The objective is to predict the presence and bounding box of human heads in a dataset depicting extremely variable human poses and viewpoints in complex backgrounds.


We can consider two main approaches:

The content-based approach:

  • State-of-the art face detectors are good starting candidates for generating head hypotheses, but they still have too many false positives and do not work for all points of view.
  • Head Detectors have a bad sensitivity/specificity equilibrium point.

The context-based approach:

  • State-of-the art body and body-part  detectors can generate good hypothesis for head locations and scales.


Integration of several context-based and content-based methods by a
discriminatively trained voting scheme:


This document describes part of  the work presented to the Person Layout  Taster Competition of the PASCAL Visual Object Classes Challenge 2010 (VOC2010). Our team placed FIRST in this challenging competition.