Multi-Speaker Tracking from an Audio-Visual Sensing Device

Xinyuan Qian; Alessio Brutti; Oswald Lanz; Maurizio Omologo; Andrea Cavallaro

doi:10.1109/TMM.2019.2902489

Back

Multi-Speaker Tracking from an Audio-Visual Sensing Device

Journal article

Peer reviewed

Multi-Speaker Tracking from an Audio-Visual Sensing Device

Xinyuan Qian, Alessio Brutti, Oswald Lanz, Maurizio Omologo and Andrea Cavallaro

IEEE Transactions on Multimedia, Vol.21(10), pp.2576-2588

2019

DOI: https://doi.org/10.1109/TMM.2019.2902489

Handle:

https://hdl.handle.net/10863/19024

Abstract

Audio-visual fusion

Likelihood

Co-located sensors

Particle filter

3-D target tracking

Compact multi-sensor platforms are portable and thus desirable for robotics and personal-assistance tasks. However, compared to physically distributed sensors, the size of these platforms makes person tracking more difficult. To address this challenge, we propose a novel 3-D audio-visual people tracker that exploits visual observations (object detections) to guide the acoustic processing by constraining the acoustic likelihood on the horizontal plane defined by the predicted height of a speaker. This solution allows the tracker to estimate, with a small microphone array, the distance of a sound. Moreover, we apply a color-based visual likelihood on the image plane to compensate for misdetections. Finally, we use a 3-D particle filter and greedy data association to combine visual observations, color-based, and acoustic likelihoods to track the position of multiple simultaneous speakers. We compare the proposed multimodal 3-D tracker against two state-of-the-art methods on the AV16.3 dataset and on a newly collected dataset with co-located sensors, which we make available to the research community. Experimental results show that our multimodal approach outperforms the other methods both in 3-D and on the image plane.

Files and links (1)

url

https://ieeexplore.ieee.org/document/8656587View

Details

Title: Multi-Speaker Tracking from an Audio-Visual Sensing Device
Creators: Xinyuan Qian
Alessio Brutti
Oswald Lanz
Maurizio Omologo
Andrea Cavallaro
Publication Details: IEEE Transactions on Multimedia, Vol.21(10), pp.2576-2588
ISSN: 1520-9210
EISSN: 1941-0077
Series / Volume: 21
Publisher: Institute of Electrical and Electronics Engineers
Number of pages: 13
Identifiers: (UNIBZ)42814652
991006161807601241
Web of Science ID: 000489728400012
Scopus ID: 2-s2.0-85072752246
Academic Unit: Faculty of Computer Science
Language: English
Resource Type: Journal article
Author Names String: Qian X, Brutti A, Lanz O, Omologo M, Cavallaro A

Metrics

6 Record Views

See more details