Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter
Joint Authors
Chen, Yang
Yang, Bing
Li, Yidi
Ding, Runwei
Liu, Hong
Source
Issue
Vol. 2020, Issue 2020 (31 Dec. 2020), pp.1-8, 8 p.
Publisher
Hindawi Publishing Corporation
Publication Date
2020-08-31
Country of Publication
Egypt
No. of Pages
8
Main Subjects
Abstract EN
For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution.
The current challenges are focused on the construction of a stable observation model.
To this end, we propose a 3D audio-visual speaker tracker assisted by deep metric learning on the two-layer particle filter framework.
Firstly, the audio-guided motion model is applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer.
Then, a stable observation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate particle weights.
The speaker position is estimated using an optimal particle set, which integrates the decisions from audio particles and visual particles.
Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during tracking.
Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison methods.
Efficient and robust tracking is achieved both in 3D space and on image plane.
American Psychological Association (APA)
Li, Yidi& Liu, Hong& Yang, Bing& Ding, Runwei& Chen, Yang. 2020. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity،Vol. 2020, no. 2020, pp.1-8.
https://search.emarefa.net/detail/BIM-1141667
Modern Language Association (MLA)
Li, Yidi…[et al.]. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity No. 2020 (2020), pp.1-8.
https://search.emarefa.net/detail/BIM-1141667
American Medical Association (AMA)
Li, Yidi& Liu, Hong& Yang, Bing& Ding, Runwei& Chen, Yang. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity. 2020. Vol. 2020, no. 2020, pp.1-8.
https://search.emarefa.net/detail/BIM-1141667
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references
Record ID
BIM-1141667