Abstract
Visual tracking with explicit occlusion models is computationally hard, in the sense that the complexity explodes as the number of targets increases. Recently, the Hybrid Joint-Separable (HJS) model has been proposed that enables tracking the local appearance of a number of bodies through occlusions with a quadratic, no more exponential, upper bound. In this paper we extend that method to account for a larger spectrum of visual interactions, captured by a full-image likelihood enabling true Bayesian inference, without compromising scalability. The resulting tracker then proves to be significantly more robust, and able to resolve long term occlusion among five people aligned on a single line-of-sight, observed from a single camera, at a manageable computational cost. ©2009 IEEE.