Abstract
Automatic analysis of interactive people behavior is an emerging field where significant research efforts of the audio and image processing communities converge. In this paper we present a particle filter for jointly tracking the position of multiple people, their head orientation and speaking activity based on audio visual cues. These are integrated with a novel fusion technique that takes into account the spatial distribution of the sensing infrastructure. The resulting system provides real time information about peoples'behavior and activities that can be used to boost the awareness of technology assisted working and living environments. © EURASIP, 2010.