Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Joint Authors

Chen, Yang
Yang, Bing
Li, Yidi
Ding, Runwei
Liu, Hong

Source

Complexity

Issue

Vol. 2020, Issue 2020 (31 Dec. 2020), pp.1-8, 8 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2020-08-31

Country of Publication

Egypt

No. of Pages

8

Main Subjects

Philosophy

Abstract EN

For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution.

The current challenges are focused on the construction of a stable observation model.

To this end, we propose a 3D audio-visual speaker tracker assisted by deep metric learning on the two-layer particle filter framework.

Firstly, the audio-guided motion model is applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer.

Then, a stable observation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate particle weights.

The speaker position is estimated using an optimal particle set, which integrates the decisions from audio particles and visual particles.

Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during tracking.

Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison methods.

Efficient and robust tracking is achieved both in 3D space and on image plane.

American Psychological Association (APA)

Li, Yidi& Liu, Hong& Yang, Bing& Ding, Runwei& Chen, Yang. 2020. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity،Vol. 2020, no. 2020, pp.1-8.
https://search.emarefa.net/detail/BIM-1141667

Modern Language Association (MLA)

Li, Yidi…[et al.]. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity No. 2020 (2020), pp.1-8.
https://search.emarefa.net/detail/BIM-1141667

American Medical Association (AMA)

Li, Yidi& Liu, Hong& Yang, Bing& Ding, Runwei& Chen, Yang. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity. 2020. Vol. 2020, no. 2020, pp.1-8.
https://search.emarefa.net/detail/BIM-1141667

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1141667